PDF RAG System

Access and query information from large PDF files seamlessly. Enhance your data retrieval capabilities with a powerful RAG system that integrates with Claude for intelligent querying. Simplify your workflow by leveraging advanced document processing and vector storage.

Installation

Installing for Claude Desktop

Manual Configuration Required

This MCP server requires manual configuration. Run the command below to open your configuration file:

npx mcpbar@latest edit -c claude

This will open your configuration file where you can add the PDF RAG System MCP server manually.

Visit PDF RAG System

PDF RAG System with MCP Server

This project implements a Retrieval-Augmented Generation (RAG) system with an MCP server that allows Claude to access and query information from large PDF files. It uses Chroma as the vector database.

Prerequisites

Node.js (v14 or higher)
npm (v6 or higher)
Python 3.9+ with ChromaDB installed
OpenAI API key (for embeddings)

Setup

Clone the repository
Install dependencies:
```
npm install
```
Install Python dependencies:
```
python3 -m pip install chromadb
```

Configure environment variables by editing the .env file:

OPENAI_API_KEY=your_openai_api_key
PORT=3000  # Port for the MCP server

Usage

1. Add PDF Files

Place your PDF files in the data/pdfs directory:

data/
  pdfs/
    your-file1.pdf
    your-file2.pdf

2. Start the Chroma Server

Start the Chroma database server:

./start-chroma.sh

Or manually:

python3 -m chromadb.cli.cli run --path ./data/chroma_db

This will start a Chroma server at http://localhost:8000.

3. Ingest PDFs

In a new terminal, process the PDFs and create the vector store:

npm run ingest

This will:

Extract text from the PDFs
Split the text into chunks
Create embeddings
Store the vectors in a Chroma database

4. Start the MCP Server

In another terminal, start the server:

npm run dev

The MCP server will be available at: http://localhost:3000/api/mcp/query

5. Query the MCP Server

You can query the system using Claude or a REST client:

POST http://localhost:3000/api/mcp/query
Content-Type: application/json

{
  "query": "What does the document say about...",
  "topK": 5  # Optional, number of results to return
}

Claude Integration

To use this with Claude via MCP:

Configure Claude to use the MCP endpoint
Ensure Claude has access to this server
Now Claude can query the content of your PDFs through the RAG system

Project Structure

src/
- index.ts - Main server file
- ingest.ts - Script for processing PDFs
- services/
  - documentProcessor.ts - PDF processing and Chroma database operations
  - mcpService.ts - MCP service for Claude
- routes/
  - mcpRoutes.ts - API routes for MCP
- utils/
  - env.ts - Environment variable utilities
data/
- pdfs/ - Directory for PDF files
- chroma_db/ - Directory for Chroma vector database
start-chroma.sh - Script to start the Chroma server

License

MIT

Share:

Details:

Stars
0
Forks
0
Last commit
3 months ago
Repository age
3 months

View Repository

Auto-fetched from GitHub 18 days ago.

MCP servers similar to PDF RAG System:

Stars
Forks
Last commit

Stars
Forks
Last commit

Stars
Forks
Last commit