Cloudglue MCP Server
Official MCP server for the Cloudglue API. Turns videos into structured, LLM-ready data with transcripts, entities, chapters, and semantic search.
Cloudglue MCP Server is the official Model Context Protocol server for the Cloudglue API, which converts videos into structured data ready for LLMs. It lets AI assistants like Cursor and Claude Desktop browse video collections, transcribe and describe video content, extract structured entities from custom prompts, and perform semantic search across video moments and summaries.
The server supports multiple video sources including Cloudglue platform files (cloudglue://files/file-id), YouTube URLs, public HTTP video files, and data connector integrations for Dropbox, Google Drive, and Zoom. Tools are paginated (typically 25 items per page or 5-minute video segments) and include automatic cost-optimization checks so the agent reuses precomputed results when available rather than reprocessing video data.
The repository is maintained by Aviary Inc. under the aviaryhq GitHub org and is published to npm as @cloudglue/cloudglue-mcp-server. It is the canonical, vendor-maintained MCP integration for Cloudglue.
Tools
| Tool | Description |
|---|---|
list_collections |
List available video collections with their IDs, types, and video counts. |
list_videos |
Search and filter video metadata across collections. |
describe_video |
Extract transcripts and rich descriptions for a video, paginated in 5-minute segments. Works with Cloudglue files, YouTube, and public HTTP videos. |
extract_video_entities |
Extract structured data from a video using a custom prompt or schema. |
get_video_metadata |
Retrieve technical specifications such as resolution, duration, and codec for a video. |
segment_video_camera_shots |
Detect camera shot boundaries within a video. |
segment_video_chapters |
Identify chapter divisions within a video. |
retrieve_summaries |
Bulk retrieve video summaries from a collection. |
search_video_moments |
Semantic search to find specific moments inside video segments. |
search_video_summaries |
Find relevant videos in a collection by content similarity to a query. |
Prerequisites
- A Cloudglue account and API key from cloudglue.dev
- Node.js installed locally (the server runs via
npx) - An MCP-compatible client (Claude Desktop, Cursor, etc.)
Option A: JSON configuration (Cursor, Claude Desktop, generic MCP clients)
Add the following to your MCP client config (for Claude Desktop: claude_desktop_config.json):
{
"mcpServers": {
"cloudglue": {
"command": "npx",
"args": [
"-y",
"@cloudglue/cloudglue-mcp-server@latest",
"--api-key",
"<CLOUDGLUE-YOUR-API-KEY>"
]
}
}
}
Replace <CLOUDGLUE-YOUR-API-KEY> with your real Cloudglue API key.
Option B: Claude Desktop extension (.dxt)
Download the prebuilt extension from the project's GitHub Releases page, double-click the .dxt file to install in Claude Desktop, and paste your Cloudglue API key when prompted.
Option C: Local development build
git clone https://github.com/aviaryhq/cloudglue-mcp-server.git
cd cloudglue-mcp-server
npm install
npm run build
Then point your MCP client to the absolute path of the compiled build/index.js with --api-key passed as an argument.
- Build a video knowledge base agent that browses Cloudglue collections and answers natural-language questions like "find every clip where the CEO mentions pricing."
- Auto-generate chapter markers, shot lists, and transcripts for a YouTube backlog so editors can repurpose footage faster.
- Extract structured entities (products mentioned, speakers, locations, timestamps) from podcast or webinar recordings into a JSON schema for downstream analytics.
- Power a "what video should I watch?" assistant that semantically searches summaries across a media library stored in Dropbox, Google Drive, or Zoom recordings.
- Pull metadata and technical specs (duration, resolution, codec) across a video collection for QA or ingestion pipelines.
- "List my Cloudglue collections and tell me which one has the most videos."
- "Transcribe this YouTube URL and pull out every product name mentioned with its timestamp."
- "Search my
webinarscollection for moments where someone discusses pricing objections." - "Generate chapter markers and a shot list for
cloudglue://files/abc123." - "Find the top 5 videos in my
customer-interviewscollection most relevant to onboarding friction."
- Official, vendor-maintained server published by Aviary (the company behind Cloudglue).
- Broad tool coverage: discovery, transcription, entity extraction, segmentation, and semantic search in one server.
- Supports multiple input sources out of the box: Cloudglue files, YouTube, public HTTP video, plus Dropbox, Google Drive, and Zoom connectors.
- Built-in pagination and cost-optimization checks so agents reuse cached descriptions instead of reprocessing video.
- Requires a paid Cloudglue account and API key; video processing usage is billed separately.
- Functionality is gated to whatever the Cloudglue API supports; no local/offline video analysis fallback.
- API key is passed as a CLI argument in the documented config, which can leak into shell history or shared config files if not handled carefully.
- TwelveLabs video understanding API (no official MCP, but community wrappers exist).
- VideoDB MCP server for video indexing and retrieval.
- Roll-your-own MCP wrapping OpenAI/Gemini multimodal video APIs for transcription and analysis.