Baseten

Baseten helps ML teams deploy production AI models without managing infrastructure.
Venture Round $585M total Founded 2019 San Francisco, California 186 employees
Baseten is a serverless inference platform that converts machine learning models into production-ready APIs with auto-scaling GPU access across multiple cloud providers. It abstracts away infrastructure complexity—from GPU management and autoscaling to observability and billing—enabling ML teams to deploy and scale models without building custom infrastructure. The platform delivers up to 40% cost savings compared to in-house solutions and targets four to five nines of availability through model optimization and multi-cloud capacity management.
Problem solved
ML teams waste engineering cycles managing GPU infrastructure, autoscaling, observability, and cost optimization instead of focusing on model quality and user experience.
Target customer
ML engineers and AI teams at mid-market to enterprise companies building AI applications, including generative AI startups, ML-heavy SaaS platforms, and companies deploying open-source or proprietary models at scale.
Founders
T
Tuhin Srivastava
CEO
Data scientist and full-stack engineer at Gumroad who built fraud detection and content moderation systems; holds PhD in Mathematics from University of Sydney.
A
Amir Haghighat
CTO
Led data platform engineering at Clover Health and was Head of Engineering at Gumroad.
P
Philip Howes
Chief Scientist
Co-founder of Shape; previously worked on ML at Gumroad and Skulpt.
P
Pankaj Gupta
Co-founder
Former Software Engineer at Uber.
Funding history
Seed $5-10M April 2022 Led by AI Fund, Caffeinated Capital · Unknown
Series A $20M 2022 Led by Greylock, South Park Commons · Unknown
Series B $40M March 2024 Led by IVP · Unknown
Series C $75M February 2025 Led by IVP, Spark Capital · Greylock, Conviction, South Park Commons, Basecase, Lachy Groom, Adam Bain, Dick Costolo (01a)
Series D $150M September 2025 Led by Bond · CapitalG, Premji, Scribble, Conviction, 01a, IVP, Spark, Greylock
Series E $300M January 2026 Led by IVP, CapitalG, NVIDIA · Unknown
Total raised: $585M
Pricing
Usage-based pricing model. Dedicated deployments: per-minute GPU pricing with scale-to-zero; no idle time charges. Model APIs: token-based pricing 50%+ cheaper than OpenAI. Plan tiers: Basic (pay-as-you-go), Pro (volume discounts), Enterprise (starts ~$5,000/month with custom terms).
Notable customers
Patreon, Pipe, Laurel, Rime, Stability AI, Writer, Prediction Guard, Motive
Integrations
AWS, Google Cloud, Azure, multiple GPU providers (NVIDIA, AMD); Stripe (payments), HubSpot, Vercel, analytics platforms
Tech stack
React (JavaScript frameworks) Next.js (Web servers) dc.js (JavaScripty graphics) core-js (JavaScript libraries) Webpack Open Graph DatoCMS (CMS) Vercel Analytics (Analytics) HubSpot Analytics (Analytics) PostHog (Analytics) Matomo Analytics (Analytics) Linkedin Insight Tag (Analytics) Leadfeeder (Analytics) Google Analytics (Analytics) HSTS (Security) Node.js (Programming languages) Google Workspace (Email) HubSpot (Marketing automation) Stripe (Payment processors) Google Tag Manager (Tag managers) Vercel (PaaS) Amazon Web Services (PaaS) Sendgrid (Email) Priority Hints (Performance)
Website
Competitors
AWS SageMaker
Broader ML platform with training and data prep; Baseten focuses purely on inference scaling with better cost efficiency.
Modal
General-purpose serverless compute; Baseten specializes in GPU-accelerated inference with optimized multi-cloud routing.
Replicate
Model marketplace and API hosting; Baseten enables private deployment of custom or open-source models with full infrastructure control.
Together AI
Managed inference API for open-source models; Baseten offers self-hosted inference with multi-cloud flexibility and custom model support.
Why this matters: Baseten has become the critical infrastructure layer for the AI economy, raising $585M (including $300M Series E with NVIDIA) at a $5B valuation. With backing from top-tier VCs and NVIDIA itself, the company is positioned as a key beneficiary of enterprise AI adoption and infrastructure consolidation around specialized inference platforms.
Best for: ML teams and AI startups that need to deploy models to production quickly, scale inference workloads cost-efficiently, and avoid building custom infrastructure.
Use cases
Deploying Open-Source LLMs at Scale
Companies hosting Llama, Deepseek, or Mistral models can deploy them as APIs in minutes without managing GPU infrastructure. Baseten handles autoscaling, multi-cloud failover, and cost optimization transparently.
Cost-Optimized Inference for Generative AI Products
Startups building on top of open-source models reduce inference costs by 40% through Baseten's quantization, speculative decoding, and multi-cloud arbitrage. Scale-to-zero eliminates idle GPU charges.
Private Model Deployment for Enterprise ML Teams
Enterprise companies can deploy proprietary models with full data privacy, observability, and SLA guarantees (99.99%+ uptime) without running their own GPU clusters.
Burst Capacity for Peak Demand
Applications experiencing variable inference load (seasonal spikes, event-driven workloads) automatically scale across Baseten's multi-cloud infrastructure, paying only for used capacity.
Alternatives
Anyscale Broader distributed compute platform for training and inference; Baseten is more specialized and cost-optimized for inference workloads.
Hugging Face Inference Managed API for Hugging Face models; Baseten supports custom models and offers more granular cost control through scale-to-zero.
Lambda Labs Raw GPU cloud compute; Baseten adds inference-specific optimizations, autoscaling, and multi-cloud orchestration.
FAQ
What does Baseten do? +
Baseten is a serverless inference platform that deploys machine learning models as scalable APIs. It abstracts GPU infrastructure management, autoscaling, observability, and billing so ML teams can focus on model quality instead of ops. The platform spans multiple cloud providers and delivers up to 40% cost savings versus in-house infrastructure.
How much does Baseten cost? +
Pricing is usage-based. Dedicated deployments charge per-minute for GPU time with no idle charges; Model APIs use token-based pricing 50%+ cheaper than OpenAI. Plan tiers include Basic (pay-as-you-go), Pro (volume discounts), and Enterprise (custom, starting ~$5,000/month). Contact sales for exact pricing.
What are alternatives to Baseten? +
AWS SageMaker (broader ML platform but less inference-focused), Modal (general serverless compute), Replicate (model marketplace), Together AI (managed open-source LLM APIs), Anyscale (distributed compute), and raw GPU providers like Lambda Labs.
Who uses Baseten? +
ML engineers and AI teams at mid-market to enterprise companies, including Patreon, Stability AI, Writer, Prediction Guard, and Motive. Customers range from generative AI startups to enterprise teams deploying proprietary models.
How does Baseten compare to AWS SageMaker? +
SageMaker is a comprehensive ML platform covering training, data prep, and inference; Baseten specializes exclusively in inference scaling with better cost efficiency and faster deployment. Baseten's multi-cloud approach and scale-to-zero pricing are advantages for cost-conscious inference workloads. SageMaker offers deeper ecosystem integration within AWS.
Tags
GPU inference serverless compute machine learning deployment multi-cloud cost optimization AI infrastructure MLOps