LMArena
LMArena helps AI labs evaluate models through crowdsourced human preference data.
LMArena is a crowdsourced AI evaluation platform where users compare anonymous model responses through blind pairwise voting, generating a public leaderboard ranked by human preference rather than static benchmarks. The platform serves AI labs, enterprises, and developers seeking production-ready model evaluations across domains including law and medicine. Its commercial AI Evaluations service generates $30M+ annualized run rate by providing enterprises access to community feedback data for model improvement. The platform has become the most-cited 'real user preference' leaderboard in AI, with 5M+ monthly users generating 60M+ conversations monthly.
Problem solved
Static AI benchmarks don't reflect real user preferences or production performance, leaving model makers without ground truth feedback on which models actually perform better for end users.
Target customer
AI model developers (OpenAI, Google, xAI, Anthropic), AI research labs, and enterprises building or evaluating large language models for production use cases
Founders
A
Anastasios Angelopoulos
CEO
Postdoctoral Scholar at UC Berkeley EECS; Stanford University graduate; previously involved with UC Berkeley Large Model Systems Organization (LMSYS) academic project.
W
Wei-Lin Chiang
CTO
Co-founder of LMArena; expertise in large model systems and evaluation infrastructure.
I
Ion Stoica
Co-founder and Advisor
Co-founder of Databricks; previously worked with Laude Ventures team on multiple ventures.
Funding history
Seed
$100M
May 2025
Led by Andreessen Horowitz (a16z), UC Investments
· Laude Ventures, Lightspeed, Felicis, Kleiner Perkins, The House Fund
Series A
$150M
January 2026
Led by Felicis, UC Investments
· Andreessen Horowitz, The House Fund, LDVP, Kleiner Perkins, Lightspeed Venture Partners, Laude Ventures
Total raised:
$250M
Pricing
Free public platform (no signup required). Enterprise tier available for AI Evaluations commercial service with contact-based custom pricing. AI Evaluations has $30M+ annualized consumption run rate.
Notable customers
OpenAI (tested GPT-5 as 'summit'), Google DeepMind (Gemini 2.5 Flash Image as 'Nano Banana'), xAI, Anthropic
Integrations
OpenAI, Google, Anthropic (model availability partnerships); AWS, Cloudflare, reCAPTCHA (infrastructure)
Tech stack
HTTP/3
Cloudflare Browser Insights (RUM)
reCAPTCHA (Security)
Google Workspace (Email)
cdnjs (CDN)
Amazon S3 (CDN)
Cloudflare (CDN)
Amazon Web Services (PaaS)
Website
Competitors
Artificial Analysis
Broader benchmarking platform with cost and latency metrics; LMArena focuses specifically on human preference data for conversational and domain-specific tasks.
Hugging Face Open LLM Leaderboard
Static benchmark leaderboard; LMArena provides dynamic, crowd-sourced human preference rankings across multiple domains including production use cases.
Why this matters: LMArena has become the de facto standard for AI model evaluation based on real user preference, with rankings that literally move markets as companies like OpenAI pre-test flagship models on the platform before release. The $250M funding at a $1.7B valuation in under a year, combined with $30M+ commercial run rate, signals strong product-market fit in an essential infrastructure category for the rapidly expanding AI industry.
Best for: AI model developers and enterprises that need production-ready evaluation data grounded in real user preferences rather than synthetic benchmarks.
Use cases
Pre-release Model Validation
AI labs test unreleased models (e.g., OpenAI's GPT-5 as 'summit', Google's Gemini 2.5) on LMArena under anonymized codenames to gather human preference feedback before public launch. This validates production readiness and identifies areas for improvement based on actual user voting patterns rather than synthetic metrics.
Domain-Specific Model Performance Assessment
Enterprises in law, medicine, and other specialized fields use LMArena's evaluation service to benchmark models against their specific use cases and requirements. The platform provides both evaluation results and underlying feedback data to verify performance claims and understand real-world suitability.
Competitive Model Ranking and Market Positioning
AI model companies reference LMArena's public leaderboard rankings to position their models in the market, as the leaderboard has become the most-cited 'real user preference' benchmark in AI. Results directly influence purchasing decisions and perception of model quality among developers and enterprises.
Alternatives
Artificial Analysis
Broader benchmarking platform covering cost-per-token and latency metrics; choose if you need comprehensive cost and performance analysis alongside human preference data.
Hugging Face Open LLM Leaderboard
Free, static benchmark leaderboard using synthetic tasks; choose if you need open-source model rankings or don't require real user preference feedback.
FAQ
What does LMArena do? +
LMArena is a crowdsourced AI evaluation platform where users vote on anonymous pairwise model comparisons to generate a human-preference-based leaderboard. The platform also offers AI Evaluations, a commercial service for enterprises seeking production-ready model assessments with access to underlying feedback data. The public platform is free with 5M+ monthly users generating 60M+ conversations monthly.
How much does LMArena cost? +
The public leaderboard and comparison tool are completely free and require no signup. The enterprise AI Evaluations service uses contact-based custom pricing; current annualized consumption run rate exceeds $30M. Contact sales for specific enterprise pricing.
What are alternatives to LMArena? +
Artificial Analysis provides broader benchmarking including cost and latency metrics alongside model comparisons. Hugging Face Open LLM Leaderboard offers free static benchmarks using synthetic tasks. Both lack LMArena's focus on real user preference data and production domain-specific evaluation.
Who uses LMArena? +
AI model developers including OpenAI, Google DeepMind, xAI, and Anthropic use LMArena to evaluate and validate models. The 5M+ monthly user community spans 150 countries and includes enterprises evaluating models for production use in specialized domains like law and medicine.
How does LMArena compare to Hugging Face Open LLM Leaderboard? +
LMArena focuses on dynamic, crowdsourced human preference voting across domains (law, medicine, general conversation) with production-ready evaluation results. Hugging Face uses static synthetic benchmarks designed primarily for open-source model rankings. LMArena's results directly influence market perception and purchasing decisions, while Hugging Face serves primarily as a reference benchmark for open models.
Tags
AI evaluation
model benchmarking
crowdsourced feedback
LLM comparison
human preferences
production evaluation
leaderboards