Replicate MCP Server
Official Replicate MCP server. Discover, compare, and run thousands of hosted AI models through natural language, with full access to the Replicate HTTP API.
The Replicate MCP server is the official Model Context Protocol integration maintained by Replicate. It exposes Replicate's full HTTP API to MCP-compatible clients like Claude Desktop, Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, OpenAI Codex CLI, and Google Gemini CLI. With it, an AI agent can search Replicate's model catalog, fetch model schemas, run predictions on thousands of hosted models (image, video, audio, text, embeddings), check prediction status, and manage account resources.
The server is available in two forms. The recommended option is the remote hosted server at https://mcp.replicate.com/sse, which handles authentication via a web based OAuth-style flow where users paste a Replicate API token. The second option is the npm package replicate-mcp, a local stdio server you run with npx -y replicate-mcp and authenticate using the REPLICATE_API_TOKEN environment variable. The package is automatically updated whenever Replicate adds new API operations, so tool coverage stays in sync with the HTTP API.
Notable features include a --tools=dynamic mode that replaces the per-endpoint tool list with three meta-tools (list_api_endpoints, get_api_endpoint_schema, invoke_api_endpoint) to save context window for large APIs, and an experimental --tools=code mode that gives the agent an SDK documentation search tool plus a TypeScript code execution sandbox running on Deno. The server also publishes a /.well-known/mcp/server.json endpoint, making it auto-discoverable through the official MCP Registry.
Tools
| Tool | Description |
|---|---|
search_models |
Search Replicate's model catalog by query string to find suitable AI models. |
models.list |
List available models, useful for comparing options. |
models.get |
Fetch detailed metadata and schema for a specific model. |
list_models_examples |
List example predictions saved by the model author as illustrative usage examples. |
create_predictions |
Run a model by creating a prediction with the given input. |
get_prediction |
Fetch the status and output of a running or completed prediction. |
list_hardware |
List available hardware options for running models. |
list_api_endpoints |
Dynamic mode only. Discovers available HTTP API endpoints with optional search filtering. |
get_api_endpoint_schema |
Dynamic mode only. Returns the schema for a specific API endpoint. |
invoke_api_endpoint |
Dynamic mode only. Executes any Replicate API endpoint with the appropriate parameters. |
Prerequisites
- A Replicate account and API token from replicate.com/account/api-tokens
- For the local server: Node.js installed (for
npx) - For
--tools=codemode: Deno installed (used as the TypeScript sandbox)
Option 1: Remote server (recommended)
Point your MCP client at the hosted URL and complete the web based auth flow when prompted.
Claude Desktop / Claude.ai: Settings, Connectors, Add custom connector, paste https://mcp.replicate.com/sse, then authenticate with your Replicate API token.
Claude Code:
claude mcp add replicate https://mcp.replicate.com/sse --transport sse --scope user
Then run /mcp and authenticate in the browser window.
Cursor: Use the one-click install link from the docs, or edit ~/.cursor/mcp.json:
{
"mcpServers": {
"replicate": {
"url": "https://mcp.replicate.com/sse"
}
}
}
VS Code (GitHub Copilot): Add to .vscode/mcp.json or user settings using the same URL.
Google Gemini CLI: Add to ~/.gemini/settings.json, then run /mcp auth replicate.
Option 2: Local server via npm
Use the published replicate-mcp package with your API token in an env var:
{
"mcpServers": {
"replicate": {
"command": "npx",
"args": ["-y", "replicate-mcp"],
"env": {
"REPLICATE_API_TOKEN": "r8_your_token_here"
}
}
}
}
Optional flags
--tools=all(default): expose one tool per API endpoint--tools=dynamic: expose onlylist_api_endpoints,get_api_endpoint_schema, andinvoke_api_endpointto conserve context--tools=code(experimental): expose an SDK docs search tool and a TypeScript Deno sandbox
Example for code mode:
npx -y replicate-mcp@alpha --tools=code
- Generate images, video, or audio on demand by asking an agent to find an appropriate model on Replicate and run it with your prompt and parameters.
- Compare multiple models for the same task (for example several text-to-image models) by listing candidates, fetching schemas, and running predictions side by side.
- Build a model evaluation loop where the agent searches for new models, runs example inputs, and reports outputs back without leaving the chat.
- Automate batch inference workflows: kick off predictions, poll
get_predictionfor completion, and chain outputs into downstream steps. - Inspect a model's input schema and example predictions before integrating it into application code.
- "Search Replicate for the best open source text-to-video models and show me their pricing."
- "Run black-forest-labs/flux-schnell with the prompt 'a neon koi swimming through clouds' and return the image URL."
- "Find a speech-to-text model on Replicate, run it on this audio file URL, and give me the transcript."
- "List the example predictions for stability-ai/sdxl so I can see what kinds of prompts work well."
- "Check the status of prediction abc123 and once it finishes, summarize the output."
- Official, maintained by Replicate, and auto-updated whenever new HTTP API operations ship.
- Two deployment modes: hosted remote server with browser auth, or local npm package with an env var token.
- Dynamic and code tool modes help manage context window usage for large APIs and complex workflows.
- Auto-discoverable through the MCP Registry via
/.well-known/mcp/server.json.
- Inference costs money on Replicate, so agents running predictions consume billable compute against your account.
- The
--tools=codemode is experimental and requires Deno installed locally. - The full list of exposed tools is not enumerated in one place in the public docs; it tracks the HTTP API surface, which can shift over time.
- Hugging Face MCP server for browsing and running models hosted on the Hugging Face Hub.
- deepfates/mcp-replicate, an earlier community implementation that is no longer in active development now that an official server exists.
- Direct use of the Replicate JavaScript or Python SDK inside a custom MCP server if you need bespoke tools.