Back to MCP Servers

Windows-MCP MCP Server

Lightweight open-source MCP server that lets AI agents control Windows: UI automation, app launching, PowerShell, file ops, and browser DOM access.

Automation by CursorTouch (community) None (local stdio) / Bearer Token (network transports) active
Overview

Windows-MCP is an open-source MCP server (MIT licensed, maintained by CursorTouch) that bridges LLM agents and the Windows operating system. It exposes native Windows UI automation, keyboard and mouse input, application and window management, file system operations, PowerShell execution, clipboard access, registry operations, and browser DOM extraction across Chrome, Edge, and Firefox. Unlike vision-only computer-use agents, Windows-MCP works without specialized vision models by providing structured snapshots of the desktop with interactive element IDs, while still offering screenshots when needed.

The server runs as a local Python process via the uv package manager. It supports stdio (default for desktop clients), Server-Sent Events, and streamable HTTP transports, with optional bearer token auth, IP allowlisting, TLS, and OAuth 2.0 with PKCE for networked deployments. Action latency is typically 0.2 to 0.5 seconds, and the project ships an installable background service.

Windows-MCP is community-maintained, not an official Microsoft project, but it has seen wide adoption (reportedly 2M+ users via Claude Desktop Extensions) and is listed in the official MCP Registry. It is compatible with Claude Desktop, Claude Code, Cursor, Perplexity Desktop, Gemini CLI, and Qwen Code.

Tools

Tool Description
Click Click on screen coordinates or UI elements
Type Type text into focused or specified UI element
Scroll Perform vertical or horizontal scrolling
Move Move the pointer or perform drag operations
Shortcut Execute keyboard shortcut combinations (e.g., Ctrl+C)
Wait Pause execution for a specified duration
Screenshot Capture a fast visual screenshot with cursor and window data
Snapshot Return full desktop state with interactive element IDs and DOM extraction for browsers
App Launch applications and resize/move windows
PowerShell Execute PowerShell commands on the host
FileSystem Read, write, copy, move, delete, list, and search files
Scrape Extract content from a webpage (with SSRF protection)
MultiSelect Select multiple items in bulk
MultiEdit Enter text across multiple input fields in one call
Clipboard Read from or write to the Windows clipboard
Process List and manage running processes
Notification Send Windows toast notifications
Registry Read and modify Windows Registry keys and values
Setup Guide

Prerequisites

  • Windows 7, 8, 8.1, 10, or 11
  • Python 3.13 or newer
  • uv package manager: pip install uv
  • English as the default system language (recommended for the App tool)

Quick start

Run the server directly from PyPI:

uvx windows-mcp serve

Claude Desktop configuration

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "windows-mcp": {
      "command": "uvx",
      "args": ["windows-mcp", "serve"]
    }
  }
}

Run as a background service

windows-mcp install

This registers Windows-MCP as a scheduled task so it starts automatically.

Transports

  • stdio (default, recommended for local desktop clients)
  • sse (Server-Sent Events for network access)
  • streamable-http (recommended for production network deployments)

Optional config file location: ~/.windows-mcp/config.toml. CLI flags and environment variables override file settings.

Security (for non-stdio transports)

Supports bearer token auth, IP allowlist/CIDR restrictions, TLS/HTTPS, OAuth 2.0 with PKCE, CORS origin configuration, and tool-level access control. SSRF protection is enabled for the Scrape tool.

Source install

Clone the repo and point your MCP client at the local uv invocation:

git clone https://github.com/CursorTouch/Windows-MCP
cd Windows-MCP
uv sync
Use Cases
  • Drive end-to-end QA tests across native Windows apps and browsers using Snapshot + Click + Type instead of brittle coordinate-based scripts.
  • Automate repetitive desktop workflows like opening Outlook, copying values from Excel, and pasting into a line-of-business app.
  • Let an AI agent triage files: search the file system, read documents, and move or rename them based on content.
  • Run PowerShell-based system administration tasks (service restarts, disk checks, registry tweaks) through an LLM chat interface.
  • Scrape a webpage in Chrome/Edge/Firefox via DOM mode, then trigger a Windows toast Notification when conditions are met.
Example Prompts
  • "Open Excel, create a new workbook, and fill A1:A5 with the numbers 1 through 5."
  • "Take a Snapshot of the active window, find the Submit button, and click it."
  • "Use PowerShell to list all services that are stopped and report their names."
  • "Search my Documents folder for any PDF containing the word 'invoice' and move them to D:\Archive\Invoices."
  • "Scrape the top story from news.ycombinator.com and send me a Windows notification with the headline."
Pros
  • Broad capability surface: UI input, screenshots, structured snapshots, PowerShell, files, registry, and DOM all in one server.
  • Works with non-vision LLMs thanks to Snapshot returning interactive element IDs, reducing model cost and latency.
  • Multiple transports (stdio, SSE, streamable HTTP) plus production security options (bearer tokens, TLS, OAuth 2.0 PKCE, IP allowlists).
  • Active project with PyPI distribution, uvx one-line install, and listing in the official MCP Registry.
Limitations
  • Community-maintained, not an official Microsoft or Anthropic project, so production support is community-driven.
  • Requires Python 3.13, which is newer than many corporate baselines, and works only on Windows.
  • Highly privileged: PowerShell, Registry, and FileSystem tools give broad host access, so misconfiguration or prompt injection can be dangerous.
Alternatives