Groq
Groq builds AI inference chips delivering 10x faster throughput than GPUs.
Groq builds Language Processing Units (LPUs)—purpose-built AI inference chips that deliver up to 10x the throughput of comparable GPUs with ultra-low latency. The company offers GroqCloud, a fully managed inference platform with OpenAI-compatible APIs, and GroqRack for on-premises deployments. Founded by former Google TPU engineers, Groq has raised $1.75B and reached a $6.9B valuation, with a landmark $20B licensing deal with Nvidia announced in December 2025.
Problem solved
AI inference at scale suffers from high latency and energy consumption on general-purpose GPUs, limiting real-time AI applications and increasing computational costs.
Target customer
AI infrastructure teams, LLM application developers, enterprises requiring low-latency inference, cloud service providers, and inference-heavy AI platforms.
Founders
J
Jonathan Ross
Founder & CEO
Former Google engineer and primary designer of Google's Tensor Processing Unit (TPU); studied mathematics and computer science at NYU's Courant Institute.
D
Douglas Wightman
Co-founder, First CEO
Entrepreneur and former engineer at Google X (X Development).
Funding history
Series A
$10.3M
December 2016
Led by Social Capital
· Chamath Palihapitiya
Series C
$300M
April 2021
Led by Tiger Global Management
· D1 Capital Partners
Series D
$640M
August 2024
Led by BlackRock Private Equity Partners
· Various institutional investors
Series E
$750M
September 2025
Led by Disruptive
· BlackRock, Neuberger Berman, Samsung, Cisco, D1, Altimeter, Infinitum, 1789 Capital
Total raised:
$1.75B
Pricing
Freemium model: free tier with 14,400 API requests/day and 30 requests/minute rate limit. Pay-as-you-go for production: token-based pricing starting from $0.06 per 1M tokens (Llama 3.1 8B) up to $0.66 per 1M tokens for larger models. Enterprise GroqRack pricing available via custom quotes.
Notable customers
Not publicly disclosed; enterprise and startup customers using GroqCloud API and on-premises GroqRack deployments
Integrations
OpenAI-compatible REST API, HuggingFace models, major LLMs (Llama, Mistral, Meta), cloud integrations via GroqCloud platform
Tech stack
Google DNS
Google Cloud DNS
SPF
Google Font API
Google Apps for Business
Google
IPv6
GStatic Google Static Content
LetsEncrypt
IE Pinning
Really Simple Discovery
Jetpack Site Accelerator
RSS
Apple Mobile Web Clips Icon
Gravatar Profiles
Live Writer Support
Wordpress Plugins
WordPress
IE SiteMode
Viewport Meta
IPhone / Mobile Compatible
SSL by Default
WebFont Loader
jQuery
Jetpack
Twemoji
CDN JS
Content Delivery Network
BootstrapCDN
Font Awesome
Contact Form 7
CookieYes for WordPress
Modernizr
Modernizr 2.7
OWL Carousel
Google Universal Analytics
Google Analytics
Apache
Apache 2.2
CentOS
PHP
PHP 7
yepnope
html5shiv
Google Cloud
Careers
GoDaddy SSL
Beeswax
Ultimate Social Media
WebEx
jsDelivr
Popper.js
Facebook for Websites
Facebook SDK
LinkedIn Platform API
Twitter Platform
OwlCarousel2
core-js
Google Domains
Facebook CDN
MonsterInsights
Google Search Appliance
jQuery Migrate
jQuery UI
Cloudflare
Cloudflare JS
DreamsTime
Google Tag Manager
Global Site Tag
Google Webmaster
Twitter CDN
WP Engine
YouTube
jQuery Masonry
Packery
Fizzy UI Utils
Lightbox
Yahoo Image CDN
Hammer JS
lodash
Google Analytics 4
Google Analytics Opt-Out Privacy
CookieYes
AOS
Okta
Getty Images
Thumbor
Simple Banner for Wordpress
ElasticPress
anime.js
Vue
GSAP
Hover Intent
WP Rocket
jQuery Waypoints
Elementor Custom Skin
Elementor
Elementor Pro
JetElements
JetTabs
Page Links To
Person Schema
Sitelinks Search Box
RankMath
Premium Addons for Elementor
jQuery Form
WordPress Download Manager
Webpack
Intersection Observer
reCAPTCHA
Hello Elementor
Slick JS
imagesLoaded
Events Page
Custom Twitter Feeds
Zendesk
DNSSEC
Marketo
Marketo Mail
PayPal
DMARC
US Privacy User Signal Mechanism
LinkedIn
Twitter
Cloudflare CDN
Facebook
Euro
RegFox
Instagram
COVID-19
DocuSign
Atlassian Cloud
Site Kit
Microsoft Azure DNS
Marketo Forms
CrUX Dataset
CrUX Top 5m
Slack
HSTS IncludeSubdomains PreLoad
HSTS
Cloudflare SSL
Google Conversion Linker
DoubleClick.Net
Google AdWords Conversion
jQuery 3.7.0
Bootstrap.js
Cloudflare Hosting
U.S. Server Location
Cloudflare Bot Manager
Cloudflare Radar
Cloudflare Radar Top 500k
API Developer
Angular
Azure Active Directory
Hubspot
HubSpot WordPress Plugin
HubSpot Analytics
Google Remarketing
Hubspot Ads
Hubspot Forms
Flutter
Periodic
Dart
Apple Mobile Web App Capable
Apple Mobile Web App Status Bar Style
Mobile Non Scaleable Content
Google Identity Platform
Google Cloud Storage
Google Cloud Global Multi-Region
Google Cloud CDN
Cloudflare Radar Top 200k
Next.js
Vercel
React
Sentry
My Salesforce
Amazon
HackerOne
reCAPTCHA Enterprise
WebAuthn
Cloudflare Radar Top 50k
Cloudflare Radar Top 100k
YouTube Link
Cloudflare Challenge
Cloudflare Automatic Challenge
FLoC Opt-Out
403 Error
CrUX Top 500k
Intercom
Artificial Intelligence
Threads
X
jQuery 3.7.1
NitroPack
TablePress
Cookie Policy
Plans and Pricing
About Cookies
All About Cookies
EDAA
English - Inferred
matchMedia
jQuery 3.5.1
jQuery 3.6.0
jQuery 3.6.1
DMARC Reject
Global Privacy Control
Edge Network
Adobe Enterprise Cloud
Styled Components
Moment JS
CommonCrawl Top 500k
Google Cloud South Carolina
Copyright Year 2020
The Trade Desk
Amazon S3
Salesforce
Box
Dropbox Business
PostHog
KaTeX
OpenTelemetry
shadcn ui
Sanity
Cloudflare Insights
10 to 14 ccTLD Redirects
Stripe
OpenAI
Cogeco Peer 1
Google Adsense
Kasada
nginx
WordPress.com Hosting
BatCache
Hubspot Hosted
HubSpot CMS Hub
Cloudflare Network Error Logging
Envoy
jQuery 2.1.4
Domain Not Resolving
Bamboo HR
Robly
AJAX Libraries API
Google Hosted Libraries
Google Hosted jQuery
jQuery 1.10.2
Pound Sterling
Japanese Yen
Facebook Like
Facebook Like Button
Twitter Tweet Button
LinkedIn Share Plugin
Facebook Sharer
jQuery 3.3.1
Greenhouse
PrismJS
GitHub
Ruby on Rails Token
HTML5 History API
CKEditor
Nginx 1.14
Google Cloud Oregon
es6 promise
VideoJS
Microsoft Authentication Library for JavaScript
React Redux
CrUX Top 50m
Django CSRF
Angular JS
Robots Disallow
CrUX Top 10m
CrUX Top 1m
UNPKG
MkDocs
Clipboard.js
MobX
Segment
Radix UI
Stripe v3
Statsig
Algolia
BIG-IP
Vanta
Cloudflare Web Analytics
Merge
Invalid Certificate Dates
Zendesk Mail
Domain No Data
New Relic
CloudFront
Amazon API Gateway
Amazon CloudFront
AWS Cloudfront Madrid Edge
Babel
Preact JS
Magnific Popup
Selectize
Amazon SSL
Amazon Route 53
AWS Cloudfront Mumbai Edge
AWS Cloudfront Dallas Fort Worth Edge
AWS Cloudfront Paris Edge
AWS Cloudfront Seattle WA Edge
Hurricane Electric
Discourse
JavaScript Modules
Google Maps
OpenStreetMap
Google Calendar
Spotify Link
Openads/OpenX
Adbutler
Fancybox
Yii Framework
Unsplash
AWS Cloudfront Ashburn VA Edge
AWS Cloudfront Miami FL Edge
AWS Cloudfront Marseille Edge
AWS Cloudfront Chicago IL Edge
AWS Cloudfront San Francisco CA Edge
Website
Competitors
NVIDIA
NVIDIA dominates general-purpose GPU computing with broader ecosystem; Groq specializes in purpose-built inference-only silicon with lower latency and energy efficiency.
Cerebras
Cerebras focuses on wafer-scale processor design for training; Groq targets inference-optimized LPU architecture for throughput and latency.
Graphcore
Graphcore builds IPUs for both training and inference; Groq focuses exclusively on inference with deterministic latency guarantees.
Sambanova
Sambanova offers dataflow architecture for AI; Groq uses static-scheduled tensor streaming for predictable, ultra-low latency inference.
Why this matters: Groq represents a fundamental rethinking of AI inference silicon—moving from general-purpose GPUs to deterministic, latency-optimized LPU architectures. The company's $20B Nvidia licensing deal validates its technical approach, while remaining independent with $1.75B raised and $6.9B valuation signals serious infrastructure-scale ambitions in an increasingly commoditized inference market.
Best for: AI teams building latency-sensitive applications, LLM inference platforms, enterprises requiring on-premises AI deployment, and developers seeking predictable sub-100ms inference performance.
Use cases
Real-time Conversational AI
Customer support chatbots and conversational agents require sub-second response times. Groq's LPU delivers 10x higher throughput with deterministic latency, enabling smooth real-time conversations without noticeable delays. Companies can serve more concurrent users per hardware unit.
High-throughput Batch Inference
Content moderation platforms, recommendation engines, and document processing systems need to process thousands of items per minute. Groq's tensor streaming architecture handles batch inference with predictable performance, enabling SLAs that GPUs cannot guarantee.
On-premises AI Deployment
Enterprises with data residency requirements or sensitive workloads deploy GroqRack in colocation or private data centers. The deterministic performance and energy efficiency reduce cooling and power costs compared to GPU clusters while maintaining ultra-low latency.
Inference-heavy API Services
SaaS platforms monetizing AI (AI-as-a-service) use GroqCloud to minimize per-token inference costs and latency. The freemium tier with generous rate limits enables developers to prototype; pay-as-you-go scales with production traffic without engineering overhead.
Alternatives
NVIDIA GPUs + TensorRT
General-purpose, broader ecosystem, higher power consumption; best for training and diverse workloads, not latency-optimized inference.
AWS Inferentia & Trainium
AWS-managed inference accelerators with lower cost but less latency guarantees; tightly coupled to AWS ecosystem.
Together AI
Hosted inference API service; easier onboarding but not purpose-built hardware; runs on various backends including GPUs.
Cerebras Systems
Supports both training and inference; larger chip but broader use cases beyond pure inference optimization.
FAQ
What does Groq do? +
Groq designs and manufactures Language Processing Units (LPUs)—specialized AI inference chips that deliver up to 10x higher throughput and ultra-low latency compared to GPUs. The company offers GroqCloud, a managed inference platform with OpenAI-compatible APIs, and GroqRack for on-premises deployment of up to 576+ LPUs per cluster.
How much does Groq cost? +
GroqCloud offers a free tier with 14,400 daily API requests. Production pricing is pay-as-you-go based on tokens, starting at $0.06 per 1M tokens for Llama 3.1 8B and scaling up to ~$0.66 per 1M tokens for larger models. GroqRack on-premises clusters require custom enterprise quotes.
What are alternatives to Groq? +
Main alternatives include NVIDIA GPUs with TensorRT (broader ecosystem, less latency-optimized), AWS Inferentia (AWS-managed, lower cost), Together AI (hosted inference API), and Cerebras (supports training and inference). Choice depends on whether you prioritize latency, cost, ecosystem lock-in, or training capability.
Who uses Groq? +
Target customers include AI infrastructure teams, LLM application developers, enterprises requiring sub-100ms inference, cloud service providers, and SaaS platforms monetizing AI. Specific customer names are not publicly disclosed, though the company serves both startups and enterprises via GroqCloud and on-premises GroqRack.
How does Groq compare to NVIDIA? +
NVIDIA dominates general-purpose GPU compute with a mature ecosystem for training and inference. Groq builds inference-only hardware with deterministic, ultra-low latency and 10x higher per-token throughput, making it better for latency-sensitive applications but narrower in scope. Notably, Nvidia and Groq announced a $20B licensing partnership in December 2025, with Groq remaining independent.
Is Groq acquired by Nvidia? +
No. In December 2025, Nvidia and Groq announced a non-exclusive $20B licensing agreement for Groq's inference technology, with founder Jonathan Ross and president Sunny Madra joining Nvidia. Groq stated it would continue operating as an independent company with its own hardware and GroqCloud service.
Tags
AI inference
semiconductor
LLM
low latency
hardware acceleration
GroqCloud
ASIC
tensor streaming
edge AI
LPU