ToolUniverse MCP Server

Open-source MCP server exposing 1,000+ scientific tools, ML models, datasets, and APIs for drug discovery, literature search, and biomedical research.

AI/ML by Harvard Medical School (Zitnik Lab / mims-harvard) None active

GitHub Docs

Overview

ToolUniverse is an open-source ecosystem from Harvard Medical School's Zitnik Lab for building AI scientist systems on top of any large language model. It integrates more than 1,000 machine learning models, scientific datasets, public APIs, and research packages behind a single AI-Tool Interaction Protocol, then exposes them to AI agents through a native MCP server. Tools span drug discovery, precision oncology, rare disease diagnosis, pharmacovigilance, protein docking, molecular simulation, and literature search across PubMed, Semantic Scholar, ArXiv, and BioRxiv.

The MCP layer is implemented as a "Scientific MCP" (SMCP) and ships two entry points: tooluniverse-smcp (configurable HTTP, SSE, or stdio transport) and tooluniverse-smcp-stdio (a stdio-only variant optimized for desktop AI clients like Claude Desktop, Claude Code, Gemini CLI, and Codex). It supports MCP Tasks for non-blocking long-running operations (for example, ProteinsPlus binding site prediction or SwissDock docking that can take 10 to 60 minutes), tool composition for parallel and sequential workflows, and a "Compact Mode" that collapses 1,000+ tools into 4 to 5 discovery tools to reduce context usage by roughly 99%.

The project is actively maintained, Apache-2.0 licensed, and published to PyPI as tooluniverse. It is registered in the official MCP Registry and is intended as research infrastructure rather than a hosted SaaS, so users run it locally via uvx or uv pip install.

Tools

Tool	Description
`tooluniverse-smcp`	Full-featured Scientific MCP server with configurable transport (HTTP, SSE, or stdio). Exposes the full 1,000+ tool catalog with optional category-, name-, or type-based filtering.
`tooluniverse-smcp-stdio`	Stdio-only variant of the SMCP server, optimized for desktop AI applications such as Claude Desktop and Claude Code.
`Compact Mode discovery tools`	Set of 4 to 5 meta-tools that let an agent search, inspect, and invoke any of the 1,000+ underlying scientific tools without loading them all into context.
`Literature search tools`	Tools for querying PubMed, Semantic Scholar, ArXiv, BioRxiv, and other corpora for biomedical and scientific literature.
`ProteinsPlus_predict_binding_sites`	Long-running protein structure analysis tool that predicts ligand binding sites (5 to 15 minutes). Uses MCP Tasks for non-blocking execution.
`SwissDock_dock_ligand`	Molecular docking simulation between a protein and a ligand (10 to 30 minutes). Uses MCP Tasks for background execution with progress polling.
`Tool composition / workflows`	Chain multiple ToolUniverse tools into sequential or parallel pipelines and reuse them as agent skills (68 pre-built research workflows ship with the package).

Setup Guide

Prerequisites

Python 3.10 or newer
uv / uvx installed (recommended) or pip
An MCP-compatible client (Claude Desktop, Claude Code, Cursor, Gemini CLI, Codex, etc.)

Install

Install the Python package:

uv pip install tooluniverse

Optional extras for specialized domains:

uv pip install "tooluniverse[bioinformatics,ml,graph,singlecell,visualization]"

MCP client configuration

Add this to your claude_desktop_config.json (or equivalent MCP config):

{
  "mcpServers": {
    "tooluniverse": {
      "command": "uvx",
      "args": ["--refresh", "tooluniverse"],
      "env": {"PYTHONIOENCODING": "utf-8"}
    }
  }
}

Running the server directly

For stdio (desktop clients):

tooluniverse-smcp-stdio

For HTTP or SSE transport (web apps, remote agents):

tooluniverse-smcp --transport http --host 0.0.0.0 --port 8000

Optional: install agent skills

npx skills add mims-harvard/ToolUniverse

This adds the 68 pre-built research skills (drug discovery, rare disease diagnosis, etc.).

Auth

The core server requires no API key. Some integrated external services (e.g. certain literature or biomedical databases) may need their own credentials, configured via environment variables on the host running the server.

Use Cases

Run literature reviews across PubMed, Semantic Scholar, ArXiv, and BioRxiv from a single agent prompt and synthesize findings
Predict protein-ligand binding sites and run docking simulations (ProteinsPlus, SwissDock) as background MCP Tasks while the agent continues other work
Build drug-discovery or precision-oncology agent workflows that chain ML models, knowledge graphs, and dataset queries
Triage rare disease cases by composing phenotype lookup, gene panel, and literature tools into a reusable skill
Give a desktop coding agent (Claude Code, Codex, Gemini CLI) access to 1,000+ scientific tools without ballooning context by using Compact Mode

Example Prompts

"Find recent papers on KRAS G12C inhibitors in PubMed and BioRxiv, then summarize mechanism-of-action differences."
"Predict binding sites for PDB 1ATP using ProteinsPlus and report progress as a background task."
"Dock this SMILES ligand against EGFR with SwissDock and notify me when results are ready."
"Using Compact Mode, discover which ToolUniverse tools can score drug-target interactions, then run the top one on imatinib and BCR-ABL."
"Build a workflow that searches Semantic Scholar for rare disease phenotypes, maps them to candidate genes, and outputs a ranked diagnostic shortlist."

Pros

Officially maintained by Harvard Medical School's Zitnik Lab, Apache-2.0 licensed, and listed in the MCP Registry
Unusually broad scientific coverage: 1,000+ tools spanning ML models, datasets, APIs, and simulation packages
First-class support for MCP Tasks, so long simulations (5 to 60 minutes) run non-blocking with progress polling
Compact Mode and tool filtering keep context costs manageable even with the huge catalog

Limitations

Targeted at biomedical and life-science research; limited value for general-purpose or business use cases
Requires Python 3.10+, uv/uvx, and local installation (no provider-hosted remote server)
Some integrated external services have their own rate limits or credentials that must be configured separately
Documentation is spread across the README, the bioagent guide site, and the MCP Tasks guide, which can make discovery of specific tool names harder

Alternatives

BioMCP for clinical trials, variants, and biomedical literature search
PubMed MCP servers for lighter, literature-only access
Salesforce MCP-Universe as a general tool-use benchmarking and training framework (different scope, not biomedical-specific)