ElevenLabs MCP Server

Official ElevenLabs MCP server for text-to-speech, voice cloning, transcription, sound effects, music generation, and conversational AI agents.

Communication by ElevenLabs API Key active

GitHub Docs

Overview

The ElevenLabs MCP server is the official Model Context Protocol integration for ElevenLabs, exposing the company's audio and voice AI APIs to MCP clients like Claude Desktop, Cursor, Windsurf, and OpenAI Agents. Through a single server, agents can generate speech, clone voices, transcribe recordings, design new voices, isolate audio, compose music, and manage conversational AI agents that can even place outbound phone calls via Twilio or SIP.

The server is implemented in Python and distributed via PyPI as elevenlabs-mcp. It is typically launched with uvx and authenticated using an ElevenLabs API key. Generated audio output can be saved to disk, returned as base64-encoded MCP resources, or both, configurable via environment variables. The default output directory is the user's Desktop.

Notable capabilities include conversational agent management (create agents, attach knowledge bases, list conversations), outbound calling, instant voice cloning from sample audio, and music composition with multi-step planning. The free ElevenLabs tier provides 10,000 credits per month, sufficient for evaluation.

Tools

Tool	Description
`text_to_speech`	Convert text to speech using a chosen voice and model with parameters such as stability and speed.
`speech_to_text`	Transcribe an audio file to text with optional speaker diarization.
`text_to_sound_effects`	Generate a sound effect from a text description, 0.5 to 5 seconds.
`search_voices`	Search the user's voice library by name, description, labels, and category.
`list_models`	List available speech synthesis models and their supported languages.
`get_voice`	Retrieve detailed information about a specific voice by ID.
`voice_clone`	Create an instant voice clone from provided audio sample files.
`isolate_audio`	Extract and isolate speech from mixed audio containing background noise.
`check_subscription`	Return current ElevenLabs subscription status and API usage.
`create_agent`	Create a conversational AI agent with system prompt, voice, and language settings.
`add_knowledge_base_to_agent`	Attach documents, URLs, or text to an agent's knowledge base.
`list_agents`	List all conversational AI agents on the account.
`get_agent`	Retrieve configuration and metadata for a specific agent.
`get_conversation`	Fetch a conversation record including transcript and analysis.
`list_conversations`	List agent conversations filtered by date, agent, and pagination.
`speech_to_speech`	Convert audio from one voice to another while preserving content.
`text_to_voice`	Generate three voice preview variations from a text description.
`create_voice_from_preview`	Save a generated preview voice to the permanent voice library.
`make_outbound_call`	Place an outbound phone call using an agent via Twilio or SIP trunk.
`search_voice_library`	Search ElevenLabs' shared voice library by gender, accent, and language.
`list_phone_numbers`	List phone numbers connected to the account and their assigned agents.
`play_audio`	Play a WAV or MP3 audio file locally.
`compose_music`	Generate instrumental music from a text prompt or composition plan.
`create_composition_plan`	Build a structured music generation plan without consuming credits.

Setup Guide

Prerequisites

An ElevenLabs API key from elevenlabs.io/app/settings/api-keys. The free tier includes 10,000 credits per month.
uv installed: curl -LsSf https://astral.sh/uv/install.sh | sh

Claude Desktop

Add the following to your claude_desktop_config.json:

{
  "mcpServers": {
    "ElevenLabs": {
      "command": "uvx",
      "args": ["elevenlabs-mcp"],
      "env": {
        "ELEVENLABS_API_KEY": "<your-api-key>"
      }
    }
  }
}

Windows users may need to enable Developer Mode in Claude Desktop via the Help menu so the server can write audio files.

Other MCP clients (Cursor, Windsurf, OpenAI Agents)

pip install elevenlabs-mcp
python -m elevenlabs_mcp --api-key=YOUR_KEY --print

The --print flag outputs the JSON snippet you can paste into your client's MCP config.

Optional environment variables

ELEVENLABS_MCP_BASE_PATH: directory where audio files are written (default: ~/Desktop)
ELEVENLABS_MCP_OUTPUT_MODE: files, resources, or both
ELEVENLABS_API_RESIDENCY: data region (enterprise plans only)

Local development

git clone https://github.com/elevenlabs/elevenlabs-mcp
cd elevenlabs-mcp
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
cp .env.example .env
./scripts/test.sh

Use Cases

Generate narrated audio (voiceovers, podcast intros, audiobook samples) from drafted scripts directly inside an editor like Cursor.
Clone a stakeholder or actor's voice from sample recordings, then produce multilingual voiced content with the cloned voice.
Transcribe meeting recordings or interviews to text with speaker diarization for downstream summarization.
Build and manage ElevenLabs conversational AI agents, attach knowledge bases, and place outbound phone calls via Twilio.
Compose background music and sound effects on demand for video edits, game prototypes, or ad creatives.

Example Prompts

"Read this blog post aloud in Rachel's voice and save the MP3 to my Desktop."
"Clone a voice from the three WAV files in ~/samples and name it 'CEO Voice'."
"Transcribe meeting.mp3 with speaker diarization and give me a summary."
"Create a conversational agent named 'Support Bot' with a friendly system prompt and attach our docs site as a knowledge base."
"Compose a 30 second upbeat electronic track for a product launch video."

Pros

Official server maintained by ElevenLabs covering the full product surface: TTS, STT, voice cloning, voice design, sound effects, music, and conversational agents.
Broad client support: Claude Desktop, Cursor, Windsurf, and OpenAI Agents are explicitly documented.
Flexible output handling (files, base64 resources, or both) and configurable output directory.
Free tier with 10,000 credits per month is enough to evaluate without payment.

Limitations

Some operations like voice design and audio isolation can hit timeouts, especially in development mode.
Generation consumes ElevenLabs credits, so heavy usage requires a paid subscription.
Outbound calling features require additional setup (Twilio or SIP trunk plus a configured phone number).

Alternatives

OpenAI MCP integrations for TTS via the OpenAI audio API.
Cartesia MCP server for low-latency voice synthesis.
Community Whisper-based MCP servers for self-hosted speech-to-text.