ElevenLabs Player MCP Server

Official ElevenLabs MCP server for text-to-speech, voice cloning, audio transcription, sound effects, and conversational AI agents.

AI/ML by ElevenLabs API Key active

GitHub Docs

Overview

The ElevenLabs MCP server is the official Model Context Protocol integration for ElevenLabs' audio AI platform. It exposes the full suite of ElevenLabs APIs to MCP clients like Claude Desktop, Cursor, Windsurf, and OpenAI Agents, letting an LLM generate speech from text, clone voices from audio samples, transcribe audio with speaker diarization, isolate voices from noisy recordings, generate sound effects, and design new voices from text descriptions.

Beyond core audio generation, the server also covers ElevenLabs' Conversational AI stack: creating voice agents, attaching knowledge bases, listing conversations, retrieving transcripts, and even initiating outbound phone calls through Twilio or SIP trunk integrations. Output can be saved to disk, returned as base64 resources, or both, controlled via environment variables.

The server is maintained directly by ElevenLabs (the elevenlabs GitHub org) and published to PyPI as elevenlabs-mcp. It uses an API key for authentication and works with the free tier, which includes 10,000 credits per month.

Tools

Tool	Description
`text_to_speech`	Convert text to speech using a specified voice and model.
`speech_to_text`	Transcribe an audio file to text, with optional speaker diarization.
`text_to_sound_effects`	Generate a sound effect from a text description.
`text_to_voice`	Create voice previews from a text prompt, returning three variations.
`voice_clone`	Create an instant voice clone from one or more audio files.
`speech_to_speech`	Transform audio from one voice to another while preserving delivery.
`isolate_audio`	Isolate voice from background noise in an audio file.
`search_voices`	Search voices in the user's library.
`get_voice`	Retrieve details about a specific voice by ID.
`list_models`	List all available ElevenLabs synthesis models.
`play_audio`	Play an audio file locally on the host machine.
`create_agent`	Create a Conversational AI agent with custom voice, prompt, and LLM config.
`list_agents`	List all conversational AI agents in the account.
`get_agent`	Get details about a specific conversational AI agent.
`add_knowledge_base_to_agent`	Attach a knowledge base (URL, file, or text) to an agent.
`list_conversations`	List conversations from agents with filtering options.
`get_conversation`	Retrieve full conversation details including transcript.
`make_outbound_call`	Initiate an outbound phone call using an ElevenLabs agent over Twilio or SIP.
`list_phone_numbers`	List phone numbers available in the ElevenLabs account.
`check_subscription`	Check current ElevenLabs subscription status and credit usage.

Setup Guide

Prerequisites

ElevenLabs account and API key (free tier includes 10,000 credits per month)
uv Python package manager installed
An MCP-compatible client (Claude Desktop, Cursor, Windsurf, etc.)

Installation

For Claude Desktop, add the following to claude_desktop_config.json:

{
  "mcpServers": {
    "ElevenLabs": {
      "command": "uvx",
      "args": ["elevenlabs-mcp"],
      "env": {
        "ELEVENLABS_API_KEY": "<insert-your-api-key-here>"
      }
    }
  }
}

For Cursor, Windsurf, or other clients, install via pip and run:

pip install elevenlabs-mcp
python -m elevenlabs_mcp --api-key=YOUR_KEY --print

Optional environment variables

ELEVENLABS_API_KEY (required): your ElevenLabs API key
ELEVENLABS_MCP_BASE_PATH: default directory for saved audio output (defaults to ~/Desktop)
ELEVENLABS_MCP_OUTPUT_MODE: files, resources, or both (default files)
ELEVENLABS_API_RESIDENCY: data residency region for enterprise accounts (default us)

Windows notes

Enable Developer Mode in Claude Desktop. If uvx fails to resolve, use the absolute path to the executable in the command field.

Use Cases

Generate narrated audio for videos, podcasts, or documentation directly from a script in the editor
Clone a voice from a short sample, then synthesize new lines in that voice for prototyping voiceovers
Transcribe meeting or interview recordings to text with speaker diarization
Spin up and configure a Conversational AI agent with a knowledge base, then trigger outbound phone calls
Pull conversation transcripts and analytics from existing ElevenLabs voice agents for review

Example Prompts

"Generate a 30 second narration of this paragraph using the voice 'Rachel' and save it to my Desktop."
"Transcribe ~/recordings/interview.mp3 with speaker diarization and save the transcript to a file."
"Clone my voice from these three wav files and call the new voice 'Demo Clone'."
"Create a Conversational AI agent named 'Support Bot' with this system prompt, attach our docs site as a knowledge base, and list its phone numbers."
"Show me the last 5 conversations from agent abc123 and pull the full transcript for the most recent one."

Pros

Officially maintained by ElevenLabs under the elevenlabs GitHub org
Broad coverage spanning TTS, STT, voice cloning, sound effects, and Conversational AI
Flexible output modes (files, base64 resources, or both) with configurable save path
Works with all major MCP clients and supports outbound phone calls via Twilio or SIP

Limitations

All operations consume ElevenLabs credits; heavy usage requires a paid plan
Voice design and audio isolation can time out in MCP Inspector even though the underlying job succeeds
Windows users may hit uvx path resolution issues and need to hardcode the executable path

Alternatives

OpenAI MCP integrations for TTS via the OpenAI Audio API
Cartesia and community MCP wrappers for low-latency speech synthesis
Deepgram MCP community servers for transcription-focused workflows