Back to MCP Servers

Stats Compass MCP Server

MCP server that turns LLMs into data analysts with tools for loading, cleaning, transforming, visualizing, and modeling data via pandas and scikit-learn.

Data & Enrichment by Olatunji Ogunbiyi (community) None active
Overview

Stats Compass is an open-source MCP server that exposes the stats-compass-core data science toolkit to LLM clients like Claude Desktop, Claude Code, Cursor, and VS Code Copilot. It wraps pandas, scikit-learn, and a curated tool registry so AI agents can perform reproducible, deterministic data analysis without writing or executing arbitrary code.

The server organizes capabilities into six domains: data loading (CSV, Excel, JSON, Parquet, sample datasets), cleaning (null handling, deduplication, outlier management), transformations (filter, groupby, pivot, encoding), exploratory data analysis (descriptive stats, correlations, hypothesis testing), visualization (histograms, scatter plots, ROC curves, confusion matrices, heatmaps), and machine learning (classification, regression, time series forecasting). It uses a "parent tool" pattern where each domain exposes a describe_*_tools and execute_*_tool pair, plus higher-level workflow tools that chain steps into reports.

The package is maintained by Olatunji Ogunbiyi (single-author, Alpha status, MIT licensed) and distributed via PyPI as stats-compass-mcp. It supports both local mode (direct file access on the user's machine) and a remote HTTP server mode with browser-based file upload, deployable via Docker.

Tools

Tool Description
load_csv Load a CSV file from a local path into a session DataFrame.
load_excel Load an Excel workbook from a local path.
load_dataset Load a built-in sample dataset.
list_dataframes List DataFrames in the current session.
get_sample Return sample rows from a DataFrame (head, tail, or random).
get_schema Return schema metadata and column statistics for a DataFrame.
save_csv Persist a DataFrame to a CSV file on disk.
save_model Save a trained model to disk by model id.
describe_eda_tools / execute_eda_tool Discover and run EDA sub-tools such as describe, correlations, and hypothesis_test (t_test, z_test).
describe_cleaning_tools / execute_cleaning_tool Discover and run cleaning sub-tools such as drop_na, impute, and dedupe.
describe_transform_tools / execute_transform_tool Discover and run transformations such as filter, groupby, and pivot.
describe_ml_tools / execute_ml_tool Discover and run ML sub-tools such as train_model and predict.
describe_plot_tools / execute_plot_tool Discover and run plotting sub-tools (histogram, scatter, heatmap) and export PNG/SVG.
run_eda_report_workflow End-to-end EDA report: descriptive stats, correlations, missing data analysis, auto-generated visualizations.
run_preprocessing_workflow Run a cleaning pipeline covering missing values, imputation, outliers, and deduplication.
run_classification_workflow Train a classification model with confusion matrix, ROC curves, and feature importance.
run_regression_workflow Train a regression model with RMSE, MAE, R², and feature importance.
run_timeseries_workflow ARIMA time series forecasting with optional stationarity testing and parameter optimization.
list_files List files in a directory for discovery.
get_upload_url / register_uploaded_file Remote-only: generate a browser upload URL and register the uploaded file as a DataFrame.
session_info / delete_session / server_stats / ping Session and server management utilities.
Setup Guide

Prerequisites

  • Python 3.11+
  • uvx or pip to install the package
  • An MCP-compatible client (Claude Desktop, Claude Code, Cursor, VS Code Copilot)

Install

pip install stats-compass-mcp

Auto-configure your client

The CLI can write the MCP config for you:

# Claude Desktop
stats-compass-mcp install --client claude

# VS Code (GitHub Copilot)
stats-compass-mcp install --client vscode

# Claude Code CLI
claude mcp add stats-compass -- uvx stats-compass-mcp run

Manual config (Claude Desktop)

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "stats-compass": {
      "command": "uvx",
      "args": ["stats-compass-mcp", "run"]
    }
  }
}

Remote / Docker mode

Run the server over HTTP and proxy it from your client:

stats-compass-mcp serve --port 8000
{
  "mcpServers": {
    "stats-compass": {
      "command": "uvx",
      "args": ["mcp-proxy", "--transport", "streamablehttp", "http://localhost:8000/mcp"]
    }
  }
}

Useful environment variables:

  • STATS_COMPASS_PORT (default 8000)
  • STATS_COMPASS_SERVER_URL (default http://localhost:8000)
  • STATS_COMPASS_MAX_UPLOAD_MB (default 50)

Usage tip

Prefix your prompts with "Use stats compass to..." so the model picks these tools instead of generating ad-hoc Python.

Use Cases
  • Load a local CSV, profile the schema, and generate an automated EDA report with descriptive stats, correlations, and missing data analysis.
  • Run a preprocessing pipeline (impute nulls, handle outliers, dedupe) and save a cleaned CSV for downstream use.
  • Train a classification or regression model on a tabular dataset and return confusion matrix, ROC curve, and feature importance plots.
  • Build ARIMA time series forecasts from a date-indexed dataset, with optional stationarity testing.
  • Produce reproducible chart exports (PNG/SVG) for histograms, scatter plots, and heatmaps from agent-driven analysis sessions.
Example Prompts
  • "Use stats compass to load sales.csv and run an EDA report."
  • "Use stats compass to impute missing values in the customers DataFrame and save the cleaned version to customers_clean.csv."
  • "Use stats compass to train a classification model on the churn dataset with churned as the target, and show me the ROC curve and feature importances."
  • "Use stats compass to forecast the next 30 days of daily_revenue with ARIMA, and plot the result."
  • "Use stats compass to compute correlations between numeric columns in the active DataFrame and render a heatmap."
Pros
  • Broad coverage across the data science lifecycle: loading, cleaning, EDA, visualization, and ML in one server.
  • Deterministic tool calls backed by pandas and scikit-learn, with no arbitrary code execution by the model.
  • Ships a CLI that auto-installs the MCP config for Claude Desktop, VS Code, and Claude Code.
  • Supports both local file mode and remote HTTP/Docker deployment with browser uploads.
Limitations
  • Community project from a single maintainer, marked Alpha on PyPI, so APIs may change.
  • Tool surface uses a describe_* / execute_* parent pattern that adds an extra call per operation and can be less discoverable than flat tool lists.
  • Partial support for non-Claude clients (GPT/Gemini), and time series workflow caps datasets at 500 rows for performance.
Alternatives