Startups > LMArena

LMArena

LMArena helps AI labs evaluate models through crowdsourced human preference data.

Series A $250M total Founded 2025 San Francisco, California 44 employees

LMArena is a crowdsourced AI evaluation platform where users compare anonymous model responses through blind pairwise voting, generating a public leaderboard ranked by human preference rather than static benchmarks. The platform serves AI labs, enterprises, and developers seeking production-ready model evaluations across domains including law and medicine. Its commercial AI Evaluations service generates $30M+ annualized run rate by providing enterprises access to community feedback data for model improvement. The platform has become the most-cited 'real user preference' leaderboard in AI, with 5M+ monthly users generating 60M+ conversations monthly.

Problem solved

Static AI benchmarks don't reflect real user preferences or production performance, leaving model makers without ground truth feedback on which models actually perform better for end users.

Target customer

AI model developers (OpenAI, Google, xAI, Anthropic), AI research labs, and enterprises building or evaluating large language models for production use cases

Website LinkedIn Crunchbase Twitter / X

Founders

Anastasios Angelopoulos

CEO

Postdoctoral Scholar at UC Berkeley EECS; Stanford University graduate; previously involved with UC Berkeley Large Model Systems Organization (LMSYS) academic project.

Wei-Lin Chiang

CTO

Co-founder of LMArena; expertise in large model systems and evaluation infrastructure.

Ion Stoica

Co-founder and Advisor

Co-founder of Databricks; previously worked with Laude Ventures team on multiple ventures.

Funding history

Seed $100M May 2025 Led by Andreessen Horowitz (a16z), UC Investments · Laude Ventures, Lightspeed, Felicis, Kleiner Perkins, The House Fund

Series A $150M January 2026 Led by Felicis, UC Investments · Andreessen Horowitz, The House Fund, LDVP, Kleiner Perkins, Lightspeed Venture Partners, Laude Ventures

Total raised: $250M

Industries

Artificial Intelligence (AI) Information Services Machine Learning Product Research

Pricing

Free public platform (no signup required). Enterprise tier available for AI Evaluations commercial service with contact-based custom pricing. AI Evaluations has $30M+ annualized consumption run rate.

Notable customers

OpenAI (tested GPT-5 as 'summit'), Google DeepMind (Gemini 2.5 Flash Image as 'Nano Banana'), xAI, Anthropic

Integrations

OpenAI, Google, Anthropic (model availability partnerships); AWS, Cloudflare, reCAPTCHA (infrastructure)

Tech stack

HTTP/3 Cloudflare Browser Insights (RUM) reCAPTCHA (Security) Google Workspace (Email) cdnjs (CDN) Amazon S3 (CDN) Cloudflare (CDN) Amazon Web Services (PaaS)

Website

lmarena.ai/

Competitors

Artificial Analysis

Broader benchmarking platform with cost and latency metrics; LMArena focuses specifically on human preference data for conversational and domain-specific tasks.

Hugging Face Open LLM Leaderboard

Static benchmark leaderboard; LMArena provides dynamic, crowd-sourced human preference rankings across multiple domains including production use cases.

Why this matters: LMArena has become the de facto standard for AI model evaluation based on real user preference, with rankings that literally move markets as companies like OpenAI pre-test flagship models on the platform before release. The $250M funding at a $1.7B valuation in under a year, combined with $30M+ commercial run rate, signals strong product-market fit in an essential infrastructure category for the rapidly expanding AI industry.

Best for: AI model developers and enterprises that need production-ready evaluation data grounded in real user preferences rather than synthetic benchmarks.

Use cases

Pre-release Model Validation

AI labs test unreleased models (e.g., OpenAI's GPT-5 as 'summit', Google's Gemini 2.5) on LMArena under anonymized codenames to gather human preference feedback before public launch. This validates production readiness and identifies areas for improvement based on actual user voting patterns rather than synthetic metrics.

Domain-Specific Model Performance Assessment

Enterprises in law, medicine, and other specialized fields use LMArena's evaluation service to benchmark models against their specific use cases and requirements. The platform provides both evaluation results and underlying feedback data to verify performance claims and understand real-world suitability.

Competitive Model Ranking and Market Positioning

AI model companies reference LMArena's public leaderboard rankings to position their models in the market, as the leaderboard has become the most-cited 'real user preference' benchmark in AI. Results directly influence purchasing decisions and perception of model quality among developers and enterprises.

Alternatives

Artificial Analysis Broader benchmarking platform covering cost-per-token and latency metrics; choose if you need comprehensive cost and performance analysis alongside human preference data.

Hugging Face Open LLM Leaderboard Free, static benchmark leaderboard using synthetic tasks; choose if you need open-source model rankings or don't require real user preference feedback.

FAQ

What does LMArena do? +

LMArena is a crowdsourced AI evaluation platform where users vote on anonymous pairwise model comparisons to generate a human-preference-based leaderboard. The platform also offers AI Evaluations, a commercial service for enterprises seeking production-ready model assessments with access to underlying feedback data. The public platform is free with 5M+ monthly users generating 60M+ conversations monthly.

How much does LMArena cost? +

The public leaderboard and comparison tool are completely free and require no signup. The enterprise AI Evaluations service uses contact-based custom pricing; current annualized consumption run rate exceeds $30M. Contact sales for specific enterprise pricing.

What are alternatives to LMArena? +

Artificial Analysis provides broader benchmarking including cost and latency metrics alongside model comparisons. Hugging Face Open LLM Leaderboard offers free static benchmarks using synthetic tasks. Both lack LMArena's focus on real user preference data and production domain-specific evaluation.

Who uses LMArena? +

AI model developers including OpenAI, Google DeepMind, xAI, and Anthropic use LMArena to evaluate and validate models. The 5M+ monthly user community spans 150 countries and includes enterprises evaluating models for production use in specialized domains like law and medicine.

How does LMArena compare to Hugging Face Open LLM Leaderboard? +

LMArena focuses on dynamic, crowdsourced human preference voting across domains (law, medicine, general conversation) with production-ready evaluation results. Hugging Face uses static synthetic benchmarks designed primarily for open-source model rankings. LMArena's results directly influence market perception and purchasing decisions, while Hugging Face serves primarily as a reference benchmark for open models.