ElevenLabs Player MCP Server
Official ElevenLabs MCP server for text-to-speech, voice cloning, audio transcription, sound effects, and conversational AI agents.
The ElevenLabs MCP server is the official Model Context Protocol integration for ElevenLabs' audio AI platform. It exposes the full suite of ElevenLabs APIs to MCP clients like Claude Desktop, Cursor, Windsurf, and OpenAI Agents, letting an LLM generate speech from text, clone voices from audio samples, transcribe audio with speaker diarization, isolate voices from noisy recordings, generate sound effects, and design new voices from text descriptions.
Beyond core audio generation, the server also covers ElevenLabs' Conversational AI stack: creating voice agents, attaching knowledge bases, listing conversations, retrieving transcripts, and even initiating outbound phone calls through Twilio or SIP trunk integrations. Output can be saved to disk, returned as base64 resources, or both, controlled via environment variables.
The server is maintained directly by ElevenLabs (the elevenlabs GitHub org) and published to PyPI as elevenlabs-mcp. It uses an API key for authentication and works with the free tier, which includes 10,000 credits per month.
Tools
| Tool | Description |
|---|---|
text_to_speech |
Convert text to speech using a specified voice and model. |
speech_to_text |
Transcribe an audio file to text, with optional speaker diarization. |
text_to_sound_effects |
Generate a sound effect from a text description. |
text_to_voice |
Create voice previews from a text prompt, returning three variations. |
voice_clone |
Create an instant voice clone from one or more audio files. |
speech_to_speech |
Transform audio from one voice to another while preserving delivery. |
isolate_audio |
Isolate voice from background noise in an audio file. |
search_voices |
Search voices in the user's library. |
get_voice |
Retrieve details about a specific voice by ID. |
list_models |
List all available ElevenLabs synthesis models. |
play_audio |
Play an audio file locally on the host machine. |
create_agent |
Create a Conversational AI agent with custom voice, prompt, and LLM config. |
list_agents |
List all conversational AI agents in the account. |
get_agent |
Get details about a specific conversational AI agent. |
add_knowledge_base_to_agent |
Attach a knowledge base (URL, file, or text) to an agent. |
list_conversations |
List conversations from agents with filtering options. |
get_conversation |
Retrieve full conversation details including transcript. |
make_outbound_call |
Initiate an outbound phone call using an ElevenLabs agent over Twilio or SIP. |
list_phone_numbers |
List phone numbers available in the ElevenLabs account. |
check_subscription |
Check current ElevenLabs subscription status and credit usage. |
Prerequisites
- ElevenLabs account and API key (free tier includes 10,000 credits per month)
uvPython package manager installed- An MCP-compatible client (Claude Desktop, Cursor, Windsurf, etc.)
Installation
For Claude Desktop, add the following to claude_desktop_config.json:
{
"mcpServers": {
"ElevenLabs": {
"command": "uvx",
"args": ["elevenlabs-mcp"],
"env": {
"ELEVENLABS_API_KEY": "<insert-your-api-key-here>"
}
}
}
}
For Cursor, Windsurf, or other clients, install via pip and run:
pip install elevenlabs-mcp
python -m elevenlabs_mcp --api-key=YOUR_KEY --print
Optional environment variables
ELEVENLABS_API_KEY(required): your ElevenLabs API keyELEVENLABS_MCP_BASE_PATH: default directory for saved audio output (defaults to~/Desktop)ELEVENLABS_MCP_OUTPUT_MODE:files,resources, orboth(defaultfiles)ELEVENLABS_API_RESIDENCY: data residency region for enterprise accounts (defaultus)
Windows notes
Enable Developer Mode in Claude Desktop. If uvx fails to resolve, use the absolute path to the executable in the command field.
- Generate narrated audio for videos, podcasts, or documentation directly from a script in the editor
- Clone a voice from a short sample, then synthesize new lines in that voice for prototyping voiceovers
- Transcribe meeting or interview recordings to text with speaker diarization
- Spin up and configure a Conversational AI agent with a knowledge base, then trigger outbound phone calls
- Pull conversation transcripts and analytics from existing ElevenLabs voice agents for review
- "Generate a 30 second narration of this paragraph using the voice 'Rachel' and save it to my Desktop."
- "Transcribe ~/recordings/interview.mp3 with speaker diarization and save the transcript to a file."
- "Clone my voice from these three wav files and call the new voice 'Demo Clone'."
- "Create a Conversational AI agent named 'Support Bot' with this system prompt, attach our docs site as a knowledge base, and list its phone numbers."
- "Show me the last 5 conversations from agent abc123 and pull the full transcript for the most recent one."
- Officially maintained by ElevenLabs under the
elevenlabsGitHub org - Broad coverage spanning TTS, STT, voice cloning, sound effects, and Conversational AI
- Flexible output modes (files, base64 resources, or both) with configurable save path
- Works with all major MCP clients and supports outbound phone calls via Twilio or SIP
- All operations consume ElevenLabs credits; heavy usage requires a paid plan
- Voice design and audio isolation can time out in MCP Inspector even though the underlying job succeeds
- Windows users may hit
uvxpath resolution issues and need to hardcode the executable path
- OpenAI MCP integrations for TTS via the OpenAI Audio API
- Cartesia and community MCP wrappers for low-latency speech synthesis
- Deepgram MCP community servers for transcription-focused workflows