Qdrant MCP Server

Official MCP server from Qdrant that turns the vector database into a semantic memory layer for storing and retrieving information by meaning.

Database by Qdrant API Key active

GitHub Docs

Overview

The Qdrant MCP server is the official Model Context Protocol implementation maintained by the Qdrant team. It acts as a semantic memory layer on top of the Qdrant vector search engine, letting LLM agents store snippets of information along with metadata and retrieve them later through natural language queries rather than exact keyword matches.

The server exposes two simple tools, qdrant-store and qdrant-find, that wrap upserts and similarity search against a Qdrant collection. Embeddings are generated locally via FastEmbed (default model sentence-transformers/all-MiniLM-L6-v2), so the server works without an external embedding API. It can run against Qdrant Cloud, a self-hosted Qdrant instance, or a fully local on-disk database via QDRANT_LOCAL_PATH.

It supports multiple transports (stdio, SSE, and streamable HTTP), automatic collection creation, a read-only mode for safe retrieval-only deployments, and customizable tool descriptions so the same server can be repurposed as, for example, a code snippet memory or a personal note store.

Tools

Tool	Description
`qdrant-store`	Store a piece of information (and optional JSON metadata) into a Qdrant collection. Embeds the text and upserts it as a point.
`qdrant-find`	Retrieve relevant information from a Qdrant collection using semantic similarity search over embeddings.

Setup Guide

Prerequisites

A running Qdrant instance (Qdrant Cloud, self-hosted, or local on-disk mode)
uv / uvx installed (or Docker)
For remote Qdrant: a QDRANT_API_KEY

Run with uvx

QDRANT_URL="http://localhost:6333" \
COLLECTION_NAME="my-collection" \
uvx mcp-server-qdrant

Claude Desktop config (remote Qdrant)

{
  "mcpServers": {
    "qdrant": {
      "command": "uvx",
      "args": ["mcp-server-qdrant"],
      "env": {
        "QDRANT_URL": "https://xyz-example.eu-central.aws.cloud.qdrant.io:6333",
        "QDRANT_API_KEY": "your_api_key",
        "COLLECTION_NAME": "your-collection-name",
        "EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2"
      }
    }
  }
}

Claude Desktop config (local on-disk mode)

{
  "mcpServers": {
    "qdrant": {
      "command": "uvx",
      "args": ["mcp-server-qdrant"],
      "env": {
        "QDRANT_LOCAL_PATH": "/path/to/qdrant/database",
        "COLLECTION_NAME": "your-collection-name",
        "EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2"
      }
    }
  }
}

Cursor / VS Code (SSE transport)

QDRANT_URL="http://localhost:6333" \
COLLECTION_NAME="code-snippets" \
uvx mcp-server-qdrant --transport sse

Then point your MCP client at http://localhost:8000/sse.

Docker

docker build -t mcp-server-qdrant .
docker run -p 8000:8000 \
  -e QDRANT_URL="http://your-qdrant-server:6333" \
  -e COLLECTION_NAME="your-collection" \
  mcp-server-qdrant

Key environment variables

QDRANT_URL: remote Qdrant endpoint
QDRANT_LOCAL_PATH: path for local on-disk Qdrant (mutually exclusive with QDRANT_URL)
QDRANT_API_KEY: API key for authenticated Qdrant instances
COLLECTION_NAME: default collection used by the tools
EMBEDDING_MODEL: FastEmbed model, defaults to sentence-transformers/all-MiniLM-L6-v2
EMBEDDING_PROVIDER: defaults to fastembed
QDRANT_SEARCH_LIMIT: max results returned, default 10
QDRANT_READ_ONLY: set to true to disable qdrant-store
FASTMCP_SERVER_PORT: HTTP/SSE port, default 8000

Use Cases

Give an agent a long-term semantic memory: store user preferences, prior decisions, and conversation summaries, then recall them by meaning across sessions.
Build a code snippet librarian: index reusable code blocks with language and project metadata, then fetch the right snippet by intent rather than filename.
Internal knowledge retrieval: index runbooks, ADRs, or meeting notes and let an LLM answer questions by pulling the most relevant chunks.
Personal note search across Markdown or Obsidian vaults using semantic similarity instead of keyword search.
Deploy a read-only retrieval endpoint (QDRANT_READ_ONLY=true) that lets multiple agents query an existing curated collection without writing to it.

Example Prompts

"Remember that our production database uses Postgres 15 on AWS RDS with point-in-time recovery enabled."
"Find anything we previously stored about rate limiting strategies for the public API."
"Store this snippet under collection python-utils with metadata {\"language\": \"python\", \"topic\": \"retry\"}."
"Search the meeting-notes collection for decisions about the Q3 roadmap."
"What do we know about the customer Acme Corp from past notes?"

Pros

Official server maintained by the Qdrant team, kept in sync with the database
Works fully offline: embeddings via FastEmbed and an optional local on-disk Qdrant mode mean no external API calls required
Supports stdio, SSE, and streamable HTTP transports, so it plugs into Claude Desktop, Cursor, VS Code, and remote agent setups
Customizable tool descriptions and a read-only mode let you repurpose the same binary for very different memory use cases

Limitations

Only two tools (qdrant-store and qdrant-find); no built-in operations for deleting points, listing collections, or managing payload indexes
Embedding provider is effectively limited to FastEmbed models, so using OpenAI or Cohere embeddings requires custom work
Requires a Qdrant instance (cloud, self-hosted, or local path) and some understanding of collections and embedding models to use well

Alternatives

Chroma MCP server for a similar embedding-backed memory layer on Chroma
Pinecone MCP server for managed vector storage on Pinecone
Weaviate MCP server for semantic memory backed by Weaviate