Qdrant MCP Server
Official MCP server from Qdrant that turns the vector database into a semantic memory layer for storing and retrieving information by meaning.
The Qdrant MCP server is the official Model Context Protocol implementation maintained by the Qdrant team. It acts as a semantic memory layer on top of the Qdrant vector search engine, letting LLM agents store snippets of information along with metadata and retrieve them later through natural language queries rather than exact keyword matches.
The server exposes two simple tools, qdrant-store and qdrant-find, that wrap upserts and similarity search against a Qdrant collection. Embeddings are generated locally via FastEmbed (default model sentence-transformers/all-MiniLM-L6-v2), so the server works without an external embedding API. It can run against Qdrant Cloud, a self-hosted Qdrant instance, or a fully local on-disk database via QDRANT_LOCAL_PATH.
It supports multiple transports (stdio, SSE, and streamable HTTP), automatic collection creation, a read-only mode for safe retrieval-only deployments, and customizable tool descriptions so the same server can be repurposed as, for example, a code snippet memory or a personal note store.
Tools
| Tool | Description |
|---|---|
qdrant-store |
Store a piece of information (and optional JSON metadata) into a Qdrant collection. Embeds the text and upserts it as a point. |
qdrant-find |
Retrieve relevant information from a Qdrant collection using semantic similarity search over embeddings. |
Prerequisites
- A running Qdrant instance (Qdrant Cloud, self-hosted, or local on-disk mode)
uv/uvxinstalled (or Docker)- For remote Qdrant: a
QDRANT_API_KEY
Run with uvx
QDRANT_URL="http://localhost:6333" \
COLLECTION_NAME="my-collection" \
uvx mcp-server-qdrant
Claude Desktop config (remote Qdrant)
{
"mcpServers": {
"qdrant": {
"command": "uvx",
"args": ["mcp-server-qdrant"],
"env": {
"QDRANT_URL": "https://xyz-example.eu-central.aws.cloud.qdrant.io:6333",
"QDRANT_API_KEY": "your_api_key",
"COLLECTION_NAME": "your-collection-name",
"EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2"
}
}
}
}
Claude Desktop config (local on-disk mode)
{
"mcpServers": {
"qdrant": {
"command": "uvx",
"args": ["mcp-server-qdrant"],
"env": {
"QDRANT_LOCAL_PATH": "/path/to/qdrant/database",
"COLLECTION_NAME": "your-collection-name",
"EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2"
}
}
}
}
Cursor / VS Code (SSE transport)
QDRANT_URL="http://localhost:6333" \
COLLECTION_NAME="code-snippets" \
uvx mcp-server-qdrant --transport sse
Then point your MCP client at http://localhost:8000/sse.
Docker
docker build -t mcp-server-qdrant .
docker run -p 8000:8000 \
-e QDRANT_URL="http://your-qdrant-server:6333" \
-e COLLECTION_NAME="your-collection" \
mcp-server-qdrant
Key environment variables
QDRANT_URL: remote Qdrant endpointQDRANT_LOCAL_PATH: path for local on-disk Qdrant (mutually exclusive withQDRANT_URL)QDRANT_API_KEY: API key for authenticated Qdrant instancesCOLLECTION_NAME: default collection used by the toolsEMBEDDING_MODEL: FastEmbed model, defaults tosentence-transformers/all-MiniLM-L6-v2EMBEDDING_PROVIDER: defaults tofastembedQDRANT_SEARCH_LIMIT: max results returned, default10QDRANT_READ_ONLY: set totrueto disableqdrant-storeFASTMCP_SERVER_PORT: HTTP/SSE port, default8000
- Give an agent a long-term semantic memory: store user preferences, prior decisions, and conversation summaries, then recall them by meaning across sessions.
- Build a code snippet librarian: index reusable code blocks with language and project metadata, then fetch the right snippet by intent rather than filename.
- Internal knowledge retrieval: index runbooks, ADRs, or meeting notes and let an LLM answer questions by pulling the most relevant chunks.
- Personal note search across Markdown or Obsidian vaults using semantic similarity instead of keyword search.
- Deploy a read-only retrieval endpoint (
QDRANT_READ_ONLY=true) that lets multiple agents query an existing curated collection without writing to it.
- "Remember that our production database uses Postgres 15 on AWS RDS with point-in-time recovery enabled."
- "Find anything we previously stored about rate limiting strategies for the public API."
- "Store this snippet under collection
python-utilswith metadata{\"language\": \"python\", \"topic\": \"retry\"}." - "Search the
meeting-notescollection for decisions about the Q3 roadmap." - "What do we know about the customer Acme Corp from past notes?"
- Official server maintained by the Qdrant team, kept in sync with the database
- Works fully offline: embeddings via FastEmbed and an optional local on-disk Qdrant mode mean no external API calls required
- Supports stdio, SSE, and streamable HTTP transports, so it plugs into Claude Desktop, Cursor, VS Code, and remote agent setups
- Customizable tool descriptions and a read-only mode let you repurpose the same binary for very different memory use cases
- Only two tools (
qdrant-storeandqdrant-find); no built-in operations for deleting points, listing collections, or managing payload indexes - Embedding provider is effectively limited to FastEmbed models, so using OpenAI or Cohere embeddings requires custom work
- Requires a Qdrant instance (cloud, self-hosted, or local path) and some understanding of collections and embedding models to use well
- Chroma MCP server for a similar embedding-backed memory layer on Chroma
- Pinecone MCP server for managed vector storage on Pinecone
- Weaviate MCP server for semantic memory backed by Weaviate