Groq

Groq builds AI inference chips delivering 10x faster throughput than GPUs.
Series E $1.75B total Founded 2016 Mountain View, California 560 employees
Groq builds Language Processing Units (LPUs)—purpose-built AI inference chips that deliver up to 10x the throughput of comparable GPUs with ultra-low latency. The company offers GroqCloud, a fully managed inference platform with OpenAI-compatible APIs, and GroqRack for on-premises deployments. Founded by former Google TPU engineers, Groq has raised $1.75B and reached a $6.9B valuation, with a landmark $20B licensing deal with Nvidia announced in December 2025.
Problem solved
AI inference at scale suffers from high latency and energy consumption on general-purpose GPUs, limiting real-time AI applications and increasing computational costs.
Target customer
AI infrastructure teams, LLM application developers, enterprises requiring low-latency inference, cloud service providers, and inference-heavy AI platforms.
Founders
J
Jonathan Ross
Founder & CEO
Former Google engineer and primary designer of Google's Tensor Processing Unit (TPU); studied mathematics and computer science at NYU's Courant Institute.
D
Douglas Wightman
Co-founder, First CEO
Entrepreneur and former engineer at Google X (X Development).
Funding history
Series A $10.3M December 2016 Led by Social Capital · Chamath Palihapitiya
Series C $300M April 2021 Led by Tiger Global Management · D1 Capital Partners
Series D $640M August 2024 Led by BlackRock Private Equity Partners · Various institutional investors
Series E $750M September 2025 Led by Disruptive · BlackRock, Neuberger Berman, Samsung, Cisco, D1, Altimeter, Infinitum, 1789 Capital
Total raised: $1.75B
Pricing
Freemium model: free tier with 14,400 API requests/day and 30 requests/minute rate limit. Pay-as-you-go for production: token-based pricing starting from $0.06 per 1M tokens (Llama 3.1 8B) up to $0.66 per 1M tokens for larger models. Enterprise GroqRack pricing available via custom quotes.
Notable customers
Not publicly disclosed; enterprise and startup customers using GroqCloud API and on-premises GroqRack deployments
Integrations
OpenAI-compatible REST API, HuggingFace models, major LLMs (Llama, Mistral, Meta), cloud integrations via GroqCloud platform
Tech stack
Google DNS Google Cloud DNS SPF Google Font API Google Apps for Business Google IPv6 GStatic Google Static Content LetsEncrypt IE Pinning Really Simple Discovery Jetpack Site Accelerator RSS Apple Mobile Web Clips Icon Gravatar Profiles Live Writer Support Wordpress Plugins WordPress IE SiteMode Viewport Meta IPhone / Mobile Compatible SSL by Default WebFont Loader jQuery Jetpack Twemoji CDN JS Content Delivery Network BootstrapCDN Font Awesome Contact Form 7 CookieYes for WordPress Modernizr Modernizr 2.7 OWL Carousel Google Universal Analytics Google Analytics Apache Apache 2.2 CentOS PHP PHP 7 yepnope html5shiv Google Cloud Careers GoDaddy SSL Beeswax Ultimate Social Media WebEx jsDelivr Popper.js Facebook for Websites Facebook SDK LinkedIn Platform API Twitter Platform OwlCarousel2 core-js Google Domains Facebook CDN MonsterInsights Google Search Appliance jQuery Migrate jQuery UI Cloudflare Cloudflare JS DreamsTime Google Tag Manager Global Site Tag Google Webmaster Twitter CDN WP Engine YouTube jQuery Masonry Packery Fizzy UI Utils Lightbox Yahoo Image CDN Hammer JS lodash Google Analytics 4 Google Analytics Opt-Out Privacy CookieYes AOS Okta Getty Images Thumbor Simple Banner for Wordpress ElasticPress anime.js Vue GSAP Hover Intent WP Rocket jQuery Waypoints Elementor Custom Skin Elementor Elementor Pro JetElements JetTabs Page Links To Person Schema Sitelinks Search Box RankMath Premium Addons for Elementor jQuery Form WordPress Download Manager Webpack Intersection Observer reCAPTCHA Hello Elementor Slick JS imagesLoaded Events Page Custom Twitter Feeds Zendesk DNSSEC Marketo Marketo Mail PayPal DMARC US Privacy User Signal Mechanism LinkedIn Twitter Cloudflare CDN Facebook Euro RegFox Instagram COVID-19 DocuSign Atlassian Cloud Site Kit Microsoft Azure DNS Marketo Forms CrUX Dataset CrUX Top 5m Slack HSTS IncludeSubdomains PreLoad HSTS Cloudflare SSL Google Conversion Linker DoubleClick.Net Google AdWords Conversion jQuery 3.7.0 Bootstrap.js Cloudflare Hosting U.S. Server Location Cloudflare Bot Manager Cloudflare Radar Cloudflare Radar Top 500k API Developer Angular Azure Active Directory Hubspot HubSpot WordPress Plugin HubSpot Analytics Google Remarketing Hubspot Ads Hubspot Forms Flutter Periodic Dart Apple Mobile Web App Capable Apple Mobile Web App Status Bar Style Mobile Non Scaleable Content Google Identity Platform Google Cloud Storage Google Cloud Global Multi-Region Google Cloud CDN Cloudflare Radar Top 200k Next.js Vercel React Sentry My Salesforce Amazon HackerOne reCAPTCHA Enterprise WebAuthn Cloudflare Radar Top 50k Cloudflare Radar Top 100k YouTube Link Cloudflare Challenge Cloudflare Automatic Challenge FLoC Opt-Out 403 Error CrUX Top 500k Intercom Artificial Intelligence Threads X jQuery 3.7.1 NitroPack TablePress Cookie Policy Plans and Pricing About Cookies All About Cookies EDAA English - Inferred matchMedia jQuery 3.5.1 jQuery 3.6.0 jQuery 3.6.1 DMARC Reject Global Privacy Control Edge Network Adobe Enterprise Cloud Styled Components Moment JS CommonCrawl Top 500k Google Cloud South Carolina Copyright Year 2020 The Trade Desk Amazon S3 Salesforce Box Dropbox Business PostHog KaTeX OpenTelemetry shadcn ui Sanity Cloudflare Insights 10 to 14 ccTLD Redirects Stripe OpenAI Cogeco Peer 1 Google Adsense Kasada nginx WordPress.com Hosting BatCache Hubspot Hosted HubSpot CMS Hub Cloudflare Network Error Logging Envoy jQuery 2.1.4 Domain Not Resolving Bamboo HR Robly AJAX Libraries API Google Hosted Libraries Google Hosted jQuery jQuery 1.10.2 Pound Sterling Japanese Yen Facebook Like Facebook Like Button Twitter Tweet Button LinkedIn Share Plugin Facebook Sharer jQuery 3.3.1 Greenhouse PrismJS GitHub Ruby on Rails Token HTML5 History API CKEditor Nginx 1.14 Google Cloud Oregon es6 promise VideoJS Microsoft Authentication Library for JavaScript React Redux CrUX Top 50m Django CSRF Angular JS Robots Disallow CrUX Top 10m CrUX Top 1m UNPKG MkDocs Clipboard.js MobX Segment Radix UI Stripe v3 Statsig Algolia BIG-IP Vanta Cloudflare Web Analytics Merge Invalid Certificate Dates Zendesk Mail Domain No Data New Relic CloudFront Amazon API Gateway Amazon CloudFront AWS Cloudfront Madrid Edge Babel Preact JS Magnific Popup Selectize Amazon SSL Amazon Route 53 AWS Cloudfront Mumbai Edge AWS Cloudfront Dallas Fort Worth Edge AWS Cloudfront Paris Edge AWS Cloudfront Seattle WA Edge Hurricane Electric Discourse JavaScript Modules Google Maps OpenStreetMap Google Calendar Spotify Link Openads/OpenX Adbutler Fancybox Yii Framework Unsplash AWS Cloudfront Ashburn VA Edge AWS Cloudfront Miami FL Edge AWS Cloudfront Marseille Edge AWS Cloudfront Chicago IL Edge AWS Cloudfront San Francisco CA Edge
Website
Competitors
NVIDIA
NVIDIA dominates general-purpose GPU computing with broader ecosystem; Groq specializes in purpose-built inference-only silicon with lower latency and energy efficiency.
Cerebras
Cerebras focuses on wafer-scale processor design for training; Groq targets inference-optimized LPU architecture for throughput and latency.
Graphcore
Graphcore builds IPUs for both training and inference; Groq focuses exclusively on inference with deterministic latency guarantees.
Sambanova
Sambanova offers dataflow architecture for AI; Groq uses static-scheduled tensor streaming for predictable, ultra-low latency inference.
Why this matters: Groq represents a fundamental rethinking of AI inference silicon—moving from general-purpose GPUs to deterministic, latency-optimized LPU architectures. The company's $20B Nvidia licensing deal validates its technical approach, while remaining independent with $1.75B raised and $6.9B valuation signals serious infrastructure-scale ambitions in an increasingly commoditized inference market.
Best for: AI teams building latency-sensitive applications, LLM inference platforms, enterprises requiring on-premises AI deployment, and developers seeking predictable sub-100ms inference performance.
Use cases
Real-time Conversational AI
Customer support chatbots and conversational agents require sub-second response times. Groq's LPU delivers 10x higher throughput with deterministic latency, enabling smooth real-time conversations without noticeable delays. Companies can serve more concurrent users per hardware unit.
High-throughput Batch Inference
Content moderation platforms, recommendation engines, and document processing systems need to process thousands of items per minute. Groq's tensor streaming architecture handles batch inference with predictable performance, enabling SLAs that GPUs cannot guarantee.
On-premises AI Deployment
Enterprises with data residency requirements or sensitive workloads deploy GroqRack in colocation or private data centers. The deterministic performance and energy efficiency reduce cooling and power costs compared to GPU clusters while maintaining ultra-low latency.
Inference-heavy API Services
SaaS platforms monetizing AI (AI-as-a-service) use GroqCloud to minimize per-token inference costs and latency. The freemium tier with generous rate limits enables developers to prototype; pay-as-you-go scales with production traffic without engineering overhead.
Alternatives
NVIDIA GPUs + TensorRT General-purpose, broader ecosystem, higher power consumption; best for training and diverse workloads, not latency-optimized inference.
AWS Inferentia & Trainium AWS-managed inference accelerators with lower cost but less latency guarantees; tightly coupled to AWS ecosystem.
Together AI Hosted inference API service; easier onboarding but not purpose-built hardware; runs on various backends including GPUs.
Cerebras Systems Supports both training and inference; larger chip but broader use cases beyond pure inference optimization.
FAQ
What does Groq do? +
Groq designs and manufactures Language Processing Units (LPUs)—specialized AI inference chips that deliver up to 10x higher throughput and ultra-low latency compared to GPUs. The company offers GroqCloud, a managed inference platform with OpenAI-compatible APIs, and GroqRack for on-premises deployment of up to 576+ LPUs per cluster.
How much does Groq cost? +
GroqCloud offers a free tier with 14,400 daily API requests. Production pricing is pay-as-you-go based on tokens, starting at $0.06 per 1M tokens for Llama 3.1 8B and scaling up to ~$0.66 per 1M tokens for larger models. GroqRack on-premises clusters require custom enterprise quotes.
What are alternatives to Groq? +
Main alternatives include NVIDIA GPUs with TensorRT (broader ecosystem, less latency-optimized), AWS Inferentia (AWS-managed, lower cost), Together AI (hosted inference API), and Cerebras (supports training and inference). Choice depends on whether you prioritize latency, cost, ecosystem lock-in, or training capability.
Who uses Groq? +
Target customers include AI infrastructure teams, LLM application developers, enterprises requiring sub-100ms inference, cloud service providers, and SaaS platforms monetizing AI. Specific customer names are not publicly disclosed, though the company serves both startups and enterprises via GroqCloud and on-premises GroqRack.
How does Groq compare to NVIDIA? +
NVIDIA dominates general-purpose GPU compute with a mature ecosystem for training and inference. Groq builds inference-only hardware with deterministic, ultra-low latency and 10x higher per-token throughput, making it better for latency-sensitive applications but narrower in scope. Notably, Nvidia and Groq announced a $20B licensing partnership in December 2025, with Groq remaining independent.
Is Groq acquired by Nvidia? +
No. In December 2025, Nvidia and Groq announced a non-exclusive $20B licensing agreement for Groq's inference technology, with founder Jonathan Ross and president Sunny Madra joining Nvidia. Groq stated it would continue operating as an independent company with its own hardware and GroqCloud service.
Tags
AI inference semiconductor LLM low latency hardware acceleration GroqCloud ASIC tensor streaming edge AI LPU