Stats Compass MCP Server

MCP server that turns LLMs into data analysts with tools for loading, cleaning, transforming, visualizing, and modeling data via pandas and scikit-learn.

Data & Enrichment by Olatunji Ogunbiyi (community) None active

GitHub Docs

Overview

Stats Compass is an open-source MCP server that exposes the stats-compass-core data science toolkit to LLM clients like Claude Desktop, Claude Code, Cursor, and VS Code Copilot. It wraps pandas, scikit-learn, and a curated tool registry so AI agents can perform reproducible, deterministic data analysis without writing or executing arbitrary code.

The server organizes capabilities into six domains: data loading (CSV, Excel, JSON, Parquet, sample datasets), cleaning (null handling, deduplication, outlier management), transformations (filter, groupby, pivot, encoding), exploratory data analysis (descriptive stats, correlations, hypothesis testing), visualization (histograms, scatter plots, ROC curves, confusion matrices, heatmaps), and machine learning (classification, regression, time series forecasting). It uses a "parent tool" pattern where each domain exposes a describe_*_tools and execute_*_tool pair, plus higher-level workflow tools that chain steps into reports.

The package is maintained by Olatunji Ogunbiyi (single-author, Alpha status, MIT licensed) and distributed via PyPI as stats-compass-mcp. It supports both local mode (direct file access on the user's machine) and a remote HTTP server mode with browser-based file upload, deployable via Docker.

Tools

Tool	Description
`load_csv`	Load a CSV file from a local path into a session DataFrame.
`load_excel`	Load an Excel workbook from a local path.
`load_dataset`	Load a built-in sample dataset.
`list_dataframes`	List DataFrames in the current session.
`get_sample`	Return sample rows from a DataFrame (head, tail, or random).
`get_schema`	Return schema metadata and column statistics for a DataFrame.
`save_csv`	Persist a DataFrame to a CSV file on disk.
`save_model`	Save a trained model to disk by model id.
`describe_eda_tools / execute_eda_tool`	Discover and run EDA sub-tools such as describe, correlations, and hypothesis_test (t_test, z_test).
`describe_cleaning_tools / execute_cleaning_tool`	Discover and run cleaning sub-tools such as drop_na, impute, and dedupe.
`describe_transform_tools / execute_transform_tool`	Discover and run transformations such as filter, groupby, and pivot.
`describe_ml_tools / execute_ml_tool`	Discover and run ML sub-tools such as train_model and predict.
`describe_plot_tools / execute_plot_tool`	Discover and run plotting sub-tools (histogram, scatter, heatmap) and export PNG/SVG.
`run_eda_report_workflow`	End-to-end EDA report: descriptive stats, correlations, missing data analysis, auto-generated visualizations.
`run_preprocessing_workflow`	Run a cleaning pipeline covering missing values, imputation, outliers, and deduplication.
`run_classification_workflow`	Train a classification model with confusion matrix, ROC curves, and feature importance.
`run_regression_workflow`	Train a regression model with RMSE, MAE, R², and feature importance.
`run_timeseries_workflow`	ARIMA time series forecasting with optional stationarity testing and parameter optimization.
`list_files`	List files in a directory for discovery.
`get_upload_url / register_uploaded_file`	Remote-only: generate a browser upload URL and register the uploaded file as a DataFrame.
`session_info / delete_session / server_stats / ping`	Session and server management utilities.

Setup Guide

Prerequisites

Python 3.11+
uvx or pip to install the package
An MCP-compatible client (Claude Desktop, Claude Code, Cursor, VS Code Copilot)

Install

pip install stats-compass-mcp

Auto-configure your client

The CLI can write the MCP config for you:

# Claude Desktop
stats-compass-mcp install --client claude

# VS Code (GitHub Copilot)
stats-compass-mcp install --client vscode

# Claude Code CLI
claude mcp add stats-compass -- uvx stats-compass-mcp run

Manual config (Claude Desktop)

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "stats-compass": {
      "command": "uvx",
      "args": ["stats-compass-mcp", "run"]
    }
  }
}

Remote / Docker mode

Run the server over HTTP and proxy it from your client:

stats-compass-mcp serve --port 8000

{
  "mcpServers": {
    "stats-compass": {
      "command": "uvx",
      "args": ["mcp-proxy", "--transport", "streamablehttp", "http://localhost:8000/mcp"]
    }
  }
}

Useful environment variables:

STATS_COMPASS_PORT (default 8000)
STATS_COMPASS_SERVER_URL (default http://localhost:8000)
STATS_COMPASS_MAX_UPLOAD_MB (default 50)

Usage tip

Prefix your prompts with "Use stats compass to..." so the model picks these tools instead of generating ad-hoc Python.

Use Cases

Load a local CSV, profile the schema, and generate an automated EDA report with descriptive stats, correlations, and missing data analysis.
Run a preprocessing pipeline (impute nulls, handle outliers, dedupe) and save a cleaned CSV for downstream use.
Train a classification or regression model on a tabular dataset and return confusion matrix, ROC curve, and feature importance plots.
Build ARIMA time series forecasts from a date-indexed dataset, with optional stationarity testing.
Produce reproducible chart exports (PNG/SVG) for histograms, scatter plots, and heatmaps from agent-driven analysis sessions.

Example Prompts

"Use stats compass to load sales.csv and run an EDA report."
"Use stats compass to impute missing values in the customers DataFrame and save the cleaned version to customers_clean.csv."
"Use stats compass to train a classification model on the churn dataset with churned as the target, and show me the ROC curve and feature importances."
"Use stats compass to forecast the next 30 days of daily_revenue with ARIMA, and plot the result."
"Use stats compass to compute correlations between numeric columns in the active DataFrame and render a heatmap."

Pros

Broad coverage across the data science lifecycle: loading, cleaning, EDA, visualization, and ML in one server.
Deterministic tool calls backed by pandas and scikit-learn, with no arbitrary code execution by the model.
Ships a CLI that auto-installs the MCP config for Claude Desktop, VS Code, and Claude Code.
Supports both local file mode and remote HTTP/Docker deployment with browser uploads.

Limitations

Community project from a single maintainer, marked Alpha on PyPI, so APIs may change.
Tool surface uses a describe_* / execute_* parent pattern that adds an extra call per operation and can be less discoverable than flat tool lists.
Partial support for non-Claude clients (GPT/Gemini), and time series workflow caps datasets at 500 rows for performance.

Alternatives

Jupyter MCP Server: execute notebooks and Python code from an MCP client.
mcp-server-data-exploration: lightweight pandas-based exploration tools over MCP.
DuckDB MCP server: SQL-first analytics over local files and MotherDuck.