Startups > Baseten

Baseten

Baseten helps ML teams deploy production AI models without managing infrastructure.

Venture Round $585M total Founded 2019 San Francisco, California 186 employees

Baseten is a serverless inference platform that converts machine learning models into production-ready APIs with auto-scaling GPU access across multiple cloud providers. It abstracts away infrastructure complexity—from GPU management and autoscaling to observability and billing—enabling ML teams to deploy and scale models without building custom infrastructure. The platform delivers up to 40% cost savings compared to in-house solutions and targets four to five nines of availability through model optimization and multi-cloud capacity management.

Problem solved

ML teams waste engineering cycles managing GPU infrastructure, autoscaling, observability, and cost optimization instead of focusing on model quality and user experience.

Target customer

ML engineers and AI teams at mid-market to enterprise companies building AI applications, including generative AI startups, ML-heavy SaaS platforms, and companies deploying open-source or proprietary models at scale.

Website LinkedIn Crunchbase Twitter / X

Founders

Tuhin Srivastava

CEO

Data scientist and full-stack engineer at Gumroad who built fraud detection and content moderation systems; holds PhD in Mathematics from University of Sydney.

Amir Haghighat

CTO

Led data platform engineering at Clover Health and was Head of Engineering at Gumroad.

Philip Howes

Chief Scientist

Co-founder of Shape; previously worked on ML at Gumroad and Skulpt.

Pankaj Gupta

Co-founder

Former Software Engineer at Uber.

Funding history

Seed $5-10M April 2022 Led by AI Fund, Caffeinated Capital · Unknown

Series A $20M 2022 Led by Greylock, South Park Commons · Unknown

Series B $40M March 2024 Led by IVP · Unknown

Series C $75M February 2025 Led by IVP, Spark Capital · Greylock, Conviction, South Park Commons, Basecase, Lachy Groom, Adam Bain, Dick Costolo (01a)

Series D $150M September 2025 Led by Bond · CapitalG, Premji, Scribble, Conviction, 01a, IVP, Spark, Greylock

Series E $300M January 2026 Led by IVP, CapitalG, NVIDIA · Unknown

Total raised: $585M

Industries

Artificial Intelligence (AI) AI Infrastructure Developer Tools Machine Learning Software Engineering Software

Pricing

Usage-based pricing model. Dedicated deployments: per-minute GPU pricing with scale-to-zero; no idle time charges. Model APIs: token-based pricing 50%+ cheaper than OpenAI. Plan tiers: Basic (pay-as-you-go), Pro (volume discounts), Enterprise (starts ~$5,000/month with custom terms).

Notable customers

Patreon, Pipe, Laurel, Rime, Stability AI, Writer, Prediction Guard, Motive

Integrations

AWS, Google Cloud, Azure, multiple GPU providers (NVIDIA, AMD); Stripe (payments), HubSpot, Vercel, analytics platforms

Tech stack

React (JavaScript frameworks) Next.js (Web servers) dc.js (JavaScripty graphics) core-js (JavaScript libraries) Webpack Open Graph DatoCMS (CMS) Vercel Analytics (Analytics) HubSpot Analytics (Analytics) PostHog (Analytics) Matomo Analytics (Analytics) Linkedin Insight Tag (Analytics) Leadfeeder (Analytics) Google Analytics (Analytics) HSTS (Security) Node.js (Programming languages) Google Workspace (Email) HubSpot (Marketing automation) Stripe (Payment processors) Google Tag Manager (Tag managers) Vercel (PaaS) Amazon Web Services (PaaS) Sendgrid (Email) Priority Hints (Performance)

Website

baseten.co/

Competitors

AWS SageMaker

Broader ML platform with training and data prep; Baseten focuses purely on inference scaling with better cost efficiency.

Modal

General-purpose serverless compute; Baseten specializes in GPU-accelerated inference with optimized multi-cloud routing.

Replicate

Model marketplace and API hosting; Baseten enables private deployment of custom or open-source models with full infrastructure control.

Together AI

Managed inference API for open-source models; Baseten offers self-hosted inference with multi-cloud flexibility and custom model support.

Why this matters: Baseten has become the critical infrastructure layer for the AI economy, raising $585M (including $300M Series E with NVIDIA) at a $5B valuation. With backing from top-tier VCs and NVIDIA itself, the company is positioned as a key beneficiary of enterprise AI adoption and infrastructure consolidation around specialized inference platforms.

Best for: ML teams and AI startups that need to deploy models to production quickly, scale inference workloads cost-efficiently, and avoid building custom infrastructure.

Use cases

Deploying Open-Source LLMs at Scale

Companies hosting Llama, Deepseek, or Mistral models can deploy them as APIs in minutes without managing GPU infrastructure. Baseten handles autoscaling, multi-cloud failover, and cost optimization transparently.

Cost-Optimized Inference for Generative AI Products

Startups building on top of open-source models reduce inference costs by 40% through Baseten's quantization, speculative decoding, and multi-cloud arbitrage. Scale-to-zero eliminates idle GPU charges.

Private Model Deployment for Enterprise ML Teams

Enterprise companies can deploy proprietary models with full data privacy, observability, and SLA guarantees (99.99%+ uptime) without running their own GPU clusters.

Burst Capacity for Peak Demand

Applications experiencing variable inference load (seasonal spikes, event-driven workloads) automatically scale across Baseten's multi-cloud infrastructure, paying only for used capacity.

Alternatives

Anyscale Broader distributed compute platform for training and inference; Baseten is more specialized and cost-optimized for inference workloads.

Hugging Face Inference Managed API for Hugging Face models; Baseten supports custom models and offers more granular cost control through scale-to-zero.

Lambda Labs Raw GPU cloud compute; Baseten adds inference-specific optimizations, autoscaling, and multi-cloud orchestration.

FAQ

What does Baseten do? +

Baseten is a serverless inference platform that deploys machine learning models as scalable APIs. It abstracts GPU infrastructure management, autoscaling, observability, and billing so ML teams can focus on model quality instead of ops. The platform spans multiple cloud providers and delivers up to 40% cost savings versus in-house infrastructure.

How much does Baseten cost? +

Pricing is usage-based. Dedicated deployments charge per-minute for GPU time with no idle charges; Model APIs use token-based pricing 50%+ cheaper than OpenAI. Plan tiers include Basic (pay-as-you-go), Pro (volume discounts), and Enterprise (custom, starting ~$5,000/month). Contact sales for exact pricing.

What are alternatives to Baseten? +

AWS SageMaker (broader ML platform but less inference-focused), Modal (general serverless compute), Replicate (model marketplace), Together AI (managed open-source LLM APIs), Anyscale (distributed compute), and raw GPU providers like Lambda Labs.

Who uses Baseten? +

ML engineers and AI teams at mid-market to enterprise companies, including Patreon, Stability AI, Writer, Prediction Guard, and Motive. Customers range from generative AI startups to enterprise teams deploying proprietary models.

How does Baseten compare to AWS SageMaker? +

SageMaker is a comprehensive ML platform covering training, data prep, and inference; Baseten specializes exclusively in inference scaling with better cost efficiency and faster deployment. Baseten's multi-cloud approach and scale-to-zero pricing are advantages for cost-conscious inference workloads. SageMaker offers deeper ecosystem integration within AWS.