Back to MCP Servers

Cloudglue MCP Server

Official MCP server for the Cloudglue API. Turns videos into structured, LLM-ready data with transcripts, entities, chapters, and semantic search.

AI/ML by Aviary (Cloudglue) API Key active
Overview

Cloudglue MCP Server is the official Model Context Protocol server for the Cloudglue API, which converts videos into structured data ready for LLMs. It lets AI assistants like Cursor and Claude Desktop browse video collections, transcribe and describe video content, extract structured entities from custom prompts, and perform semantic search across video moments and summaries.

The server supports multiple video sources including Cloudglue platform files (cloudglue://files/file-id), YouTube URLs, public HTTP video files, and data connector integrations for Dropbox, Google Drive, and Zoom. Tools are paginated (typically 25 items per page or 5-minute video segments) and include automatic cost-optimization checks so the agent reuses precomputed results when available rather than reprocessing video data.

The repository is maintained by Aviary Inc. under the aviaryhq GitHub org and is published to npm as @cloudglue/cloudglue-mcp-server. It is the canonical, vendor-maintained MCP integration for Cloudglue.

Tools

Tool Description
list_collections List available video collections with their IDs, types, and video counts.
list_videos Search and filter video metadata across collections.
describe_video Extract transcripts and rich descriptions for a video, paginated in 5-minute segments. Works with Cloudglue files, YouTube, and public HTTP videos.
extract_video_entities Extract structured data from a video using a custom prompt or schema.
get_video_metadata Retrieve technical specifications such as resolution, duration, and codec for a video.
segment_video_camera_shots Detect camera shot boundaries within a video.
segment_video_chapters Identify chapter divisions within a video.
retrieve_summaries Bulk retrieve video summaries from a collection.
search_video_moments Semantic search to find specific moments inside video segments.
search_video_summaries Find relevant videos in a collection by content similarity to a query.
Setup Guide

Prerequisites

  • A Cloudglue account and API key from cloudglue.dev
  • Node.js installed locally (the server runs via npx)
  • An MCP-compatible client (Claude Desktop, Cursor, etc.)

Option A: JSON configuration (Cursor, Claude Desktop, generic MCP clients)

Add the following to your MCP client config (for Claude Desktop: claude_desktop_config.json):

{
  "mcpServers": {
    "cloudglue": {
      "command": "npx",
      "args": [
        "-y",
        "@cloudglue/cloudglue-mcp-server@latest",
        "--api-key",
        "<CLOUDGLUE-YOUR-API-KEY>"
      ]
    }
  }
}

Replace <CLOUDGLUE-YOUR-API-KEY> with your real Cloudglue API key.

Option B: Claude Desktop extension (.dxt)

Download the prebuilt extension from the project's GitHub Releases page, double-click the .dxt file to install in Claude Desktop, and paste your Cloudglue API key when prompted.

Option C: Local development build

git clone https://github.com/aviaryhq/cloudglue-mcp-server.git
cd cloudglue-mcp-server
npm install
npm run build

Then point your MCP client to the absolute path of the compiled build/index.js with --api-key passed as an argument.

Use Cases
  • Build a video knowledge base agent that browses Cloudglue collections and answers natural-language questions like "find every clip where the CEO mentions pricing."
  • Auto-generate chapter markers, shot lists, and transcripts for a YouTube backlog so editors can repurpose footage faster.
  • Extract structured entities (products mentioned, speakers, locations, timestamps) from podcast or webinar recordings into a JSON schema for downstream analytics.
  • Power a "what video should I watch?" assistant that semantically searches summaries across a media library stored in Dropbox, Google Drive, or Zoom recordings.
  • Pull metadata and technical specs (duration, resolution, codec) across a video collection for QA or ingestion pipelines.
Example Prompts
  • "List my Cloudglue collections and tell me which one has the most videos."
  • "Transcribe this YouTube URL and pull out every product name mentioned with its timestamp."
  • "Search my webinars collection for moments where someone discusses pricing objections."
  • "Generate chapter markers and a shot list for cloudglue://files/abc123."
  • "Find the top 5 videos in my customer-interviews collection most relevant to onboarding friction."
Pros
  • Official, vendor-maintained server published by Aviary (the company behind Cloudglue).
  • Broad tool coverage: discovery, transcription, entity extraction, segmentation, and semantic search in one server.
  • Supports multiple input sources out of the box: Cloudglue files, YouTube, public HTTP video, plus Dropbox, Google Drive, and Zoom connectors.
  • Built-in pagination and cost-optimization checks so agents reuse cached descriptions instead of reprocessing video.
Limitations
  • Requires a paid Cloudglue account and API key; video processing usage is billed separately.
  • Functionality is gated to whatever the Cloudglue API supports; no local/offline video analysis fallback.
  • API key is passed as a CLI argument in the documented config, which can leak into shell history or shared config files if not handled carefully.
Alternatives
  • TwelveLabs video understanding API (no official MCP, but community wrappers exist).
  • VideoDB MCP server for video indexing and retrieval.
  • Roll-your-own MCP wrapping OpenAI/Gemini multimodal video APIs for transcription and analysis.