by renl
mcp-rag-local is a Memory Server that provides an API for storing and retrieving text passages based on their semantic meaning, using Ollama for embeddings and ChromaDB for vector storage.
mcp-rag-local is a Memory Server designed to store and retrieve text passages based on their semantic meaning rather than just keywords. It leverages Ollama for generating text embeddings and ChromaDB for efficient vector storage and similarity search. This allows users to "memorize" any text and later retrieve the most relevant stored information for a given query.
To use mcp-rag-local, you first need to set up the environment. This involves cloning the repository, installing uv
(a fast Python package manager), and starting ChromaDB and Ollama services using Docker Compose. After the services are running, you need to pull the all-minilm:l6-v2
embedding model for Ollama. Finally, configure your MCP server to include mcp-rag-local
with the specified command, arguments, and environment variables for CHROMADB_PORT
and OLLAMA_PORT
.
Once set up, you can interact with the server through an LLM (Large Language Model) to:
memorize_pdf_file
tool to have the MCP read, chunk, and store the contents of a PDF. This process handles large PDFs by reading them in 20-page increments.http://localhost:8322
for inspecting and managing the stored memory.Q: What is the purpose of mcp-rag-local? A: It's a Memory Server that allows you to store and retrieve text passages based on their semantic meaning, enhancing the capabilities of LLMs by providing a persistent and searchable memory.
Q: How does it handle large texts or PDF files?
A: For large texts, the LLM can conversationally chunk and store them. For PDF files, the memorize_pdf_file
tool reads the PDF in 20-page increments, allowing the LLM to chunk and store the content progressively.
Q: What technologies does mcp-rag-local use for embeddings and storage? A: It uses Ollama for generating text embeddings and ChromaDB for vector storage and similarity search.
Q: Is there a way to view and manage the stored memory?
A: Yes, a web-based ChromaDB Admin GUI is available at http://localhost:8322
for easy inspection and management of the vector database contents.
Q: What is uv
and why is it used?
A: uv
is a fast Python package manager used for installing dependencies and running the project.
This MCP server provides a simple API for storing and retrieving text passages based on their semantic meaning, not just keywords. It uses Ollama for generating text embeddings and ChromaDB for vector storage and similarity search. You can "memorize" any text and later retrieve the most relevant stored texts for a given query.
You can simply ask the LLM to memorize a text for you in natural language:
User: Memorize this text: "Singapore is an island country in Southeast Asia."
LLM: Text memorized successfully.
You can also ask the LLM to memorize several texts at once:
User: Memorize these texts:
LLM: All texts memorized successfully.
This will store all provided texts for later semantic retrieval.
You can also ask the LLM to memorize the contents of a PDF file via memorize_pdf_file
. The MCP tool will read up to 20 pages at a time from the PDF, return the extracted text, and have the LLM chunk it into meaningful segments. The LLM then uses the memorize_multiple_texts
tool to store these chunks.
This process is repeated: the MCP tool continues to read the next 20 pages, the LLM chunks and memorizes them, and so on, until the entire PDF is processed and memorized.
User:
Memorize this PDF file: C:\path\to\document.pdf
LLM: Reads the first 20 pages, chunks the text, stores the chunks, and continues with the next 20 pages until the whole document is memorized.
You can also specify a starting page if you want to begin from a specific page:
MCP to LLM:
Memorize this PDF file starting from page 40: C:\path\to\document.pdf
LLM: Reads pages 40–59, chunks and stores the text, then continues with the next set of pages until the end of the document.
If you have a long text, you can ask the LLM to help you split it into short, meaningful chunks and store them. For example:
User: Please chunk the following long text and memorize all the chunks.
{large body of text}
LLM:
Splits the text into short, relevant segments and calls memorize_multiple_texts
to store them. If the text is too long to store in one go, the LLM will continue chunking and storing until the entire text is memorized.
User: Are all the text chunks stored?
LLM: Checks and, if not all are stored, continues until the process is complete.
This conversational approach ensures that even very large texts are fully chunked and memorized, with the LLM handling the process interactively.
To recall information, just ask the LLM a question:
User: What is Singapore?
LLM: Returns the most relevant stored texts along with a human-readable description of their relevance.
First, clone this git repository and change into the cloned directory:
git clone <repository-url>
cd mcp-rag-local
Install uv (a fast Python package manager):
curl -LsSf https://astral.sh/uv/install.sh | sh
If you are on Windows, install uv using PowerShell:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
Run the following command to start ChromaDB and Ollama using Docker Compose:
docker-compose up
After the containers are running, pull the embedding model for Ollama:
docker exec -it ollama ollama pull all-minilm:l6-v2
Add the following to your MCP server configuration:
"mcp-rag-local": {
"command": "uv",
"args": [
"--directory",
"path\\to\\mcp-rag-local",
"run",
"main.py"
],
"env": {
"CHROMADB_PORT": "8321",
"OLLAMA_PORT": "11434"
}
}
A web-based GUI for ChromaDB(Memory Server's db) is included for easy inspection and management of stored memory.
Please log in to share your review and rating for this MCP.
Discover more MCP servers with similar functionality and use cases
by topoteretes
Enables AI agents to store, retrieve, and reason over past conversations, documents, images, and audio transcriptions by loading data into graph and vector databases with minimal code.
by basicmachines-co
Basic Memory is a local-first knowledge management system that allows users to build a persistent semantic graph from conversations with AI assistants. It addresses the ephemeral nature of most LLM interactions by providing a structured, bi-directional knowledge base that both humans and LLMs can read and write to.
by smithery-ai
mcp-obsidian is a connector that allows Claude Desktop to read and search an Obsidian vault or any directory containing Markdown notes.
by qdrant
Provides a semantic memory layer on top of the Qdrant vector search engine, enabling storage and retrieval of information via the Model Context Protocol.
by GreatScottyMac
A database‑backed MCP server that stores project decisions, progress, architecture, custom data, and vector embeddings, allowing AI assistants in IDEs to retrieve precise, up‑to‑date context for generation tasks.
by StevenStavrakis
Enables AI assistants to read, create, edit, move, delete, and organize notes and tags within an Obsidian vault.
by mem0ai
Provides tools to store, retrieve, and semantically search coding preferences via an SSE endpoint for integration with MCP clients.
by graphlit
Enables integration between MCP clients and the Graphlit platform, providing ingestion, retrieval, RAG, and publishing capabilities across a wide range of data sources and tools.
by chroma-core
Provides vector, full‑text, and metadata‑based retrieval powered by Chroma for LLM applications, supporting in‑memory, persistent, HTTP, and cloud clients as well as multiple embedding functions.