Positron AI

Positron builds memory-optimized inference hardware for generative AI at scale.
Series B $305M total Founded 2023 Reno, Nevada 62 employees
Positron AI designs purpose-built FPGA-based hardware (Atlas) optimized for generative AI inference workloads, achieving 93% memory bandwidth utilization versus 10-30% in GPU systems while consuming less than a third of the power of Nvidia H100s. The company serves enterprises and cloud providers running transformer-based models at scale, from content moderation to token-as-a-service platforms. Their next-generation chip (Asimov/Titan) targets memory-intensive applications like video, trading, and multi-trillion parameter models with 6x more RAM than competing solutions.
Problem solved
Transformer inference workloads are memory-bandwidth constrained, and existing GPU architectures deliver only 10-30% memory utilization while consuming excessive power, making inference inefficient and costly.
Target customer
Enterprise cloud providers, neocloud operators, and AI infrastructure companies deploying transformer models at scale; primarily Series B+ infrastructure and AI-native companies.
Founders
T
Thomas Sohmers
Co-Founder, CTO
Thiel Fellow (2013) who started REX Computing designing processors for HPC at age 17, worked on crypto ASICs, then joined Lambda as principal hardware architect before founding Positron in spring 2023.
E
Edward Kmett
Co-Founder, CTO
Software architecture specialist with deep expertise in compiler and runtime optimization who joined Positron from Lambda.
B
Barrett Woodside
Co-Founder, VP of Product
Product leadership background from Lambda, focusing on hardware-software integration and customer deployment.
M
Mitesh Agrawal
CEO
Former COO of Lambda who joined Positron to scale commercial operations and enterprise sales.
Funding history
Seed $23.5M February 2025 Led by Flume Ventures, Valor Equity Partners · Atreides Management, Resilience Reserve
Series A $51.6M July 2025 Led by Valor Equity Partners, Atreides Management, DFJ Growth · Flume Ventures, Resilience Reserve, 1517 Fund, Unless
Series B $230M February 2026 Led by ARENA, Jump Trading, Unless · QIA, Arm Holdings, Helena, Valor Equity Partners, Atreides Management, DFJ Growth, Resilience Reserve, Flume Ventures, 1517
Total raised: $305M
Pricing
Not publicly disclosed. Free trial model available: customers can access dedicated Atlas server instances in Positron's engineering facility for testing before purchase ('try before you buy').
Notable customers
Jump Trading, Parasail (SnapServe), Cloudflare, Oracle (partnership announced), unnamed major enterprises and leading neocloud providers in networking, gaming, content moderation, CDN, and Token-as-a-Service sectors
Integrations
HuggingFace Transformers Library (drag-and-drop .pt and .safetensors model upload), Tailscale (network infrastructure), Oracle Cloud partnership
Tech stack
React (JavaScript frameworks) Next.js (Web servers) GSAP (JavaScript frameworks) Webpack PWA Open Graph Vercel Analytics (Analytics) Google Analytics (Analytics) HSTS (Security) Node.js (Programming languages) Apple iCloud Mail (Webmail) Google Workspace (Email) HubSpot (Marketing automation) Vercel (PaaS) Vultr (PaaS) Priority Hints (Performance)
Website
Competitors
Cerebras
Cerebras focuses on wafer-scale processors for dense compute, while Positron emphasizes memory architecture optimization for bandwidth-constrained inference.
d-Matrix
d-Matrix targets inference with custom silicon, but Positron's FPGA approach offers faster iteration and broader model compatibility through HuggingFace integration.
Nvidia
Nvidia's GPUs are general-purpose compute accelerators; Positron's specialized architecture delivers 3x lower latency and higher bandwidth utilization for inference-specific workloads.
EnCharge AI
EnCharge focuses on edge inference optimization, while Positron targets large-scale cloud and enterprise deployment with massive context windows and parameter counts.
Why this matters: Positron is tackling the inference efficiency bottleneck that Nvidia's general-purpose GPUs fail to optimize for, backed by $305M from institutional and strategic investors (Jump Trading, ARENA, Arm Holdings). With demonstrated 3x latency improvements over H100s and partnerships with Oracle and leading cloud providers, Positron represents a differentiated architecture play that could reshape AI infrastructure economics at scale.
Best for: Enterprise cloud providers and AI infrastructure operators deploying transformer inference at scale who need lower power consumption, higher memory bandwidth utilization, and support for trillion-parameter models.
Use cases
High-throughput content moderation at CDN scale
Content delivery networks need to run inference on video and text across globally distributed servers. Positron Atlas reduces power consumption by 66% versus H100s while delivering 93% memory utilization, enabling cost-effective moderation at scale without rebuilding infrastructure.
Trading and financial modeling with extended context
Trading firms require low-latency inference on multi-trillion parameter models analyzing extended market history. Positron's Asimov with 2.3TB RAM per device enables processing massive financial datasets in a single inference pass versus multiple GPU clusters.
Token-as-a-Service platforms
Providers tokenizing assets and managing on-chain transactions need efficient inference for smart contract verification and transaction validation. Positron's memory-optimized architecture handles burst inference loads with minimal power overhead, reducing operational costs per transaction.
Alternatives
Nvidia H100 GPUs General-purpose compute accelerators with broader software ecosystem, but use 3x more power and deliver only 10-30% memory bandwidth utilization for inference workloads.
AWS Trainium/Inferentia AWS's custom silicon tied to their cloud ecosystem with lower flexibility, while Positron offers hardware-software co-optimization specifically for inference with broader model compatibility.
Cerebras Wafer Scale Engine Wafer-scale architecture optimized for compute-dense workloads, better suited for training; Positron targets memory-intensive inference with superior bandwidth-to-power ratio.
FAQ
What does Positron AI do? +
Positron designs specialized FPGA-based hardware (Atlas) and next-generation chips (Asimov/Titan) optimized for transformer model inference. Their systems achieve 93% memory bandwidth utilization while consuming 66% less power than Nvidia H100 GPUs, supporting models up to half a trillion parameters on a single 2kW server.
How much does Positron cost? +
Pricing is not publicly disclosed. Positron offers a free trial program where customers can test dedicated Atlas instances in their engineering facility before committing to purchase.
What are alternatives to Positron? +
Alternatives include Nvidia H100/H200 GPUs (broader software support but less efficient for inference), Cerebras Wafer Scale Engine (compute-optimized vs. memory-optimized), and d-Matrix (custom inference silicon with different architectural approach).
Who uses Positron? +
Customers include Jump Trading (co-investor and 3x latency improvement case study), Cloudflare, Parasail/SnapServe, and Oracle. Target customers are enterprise cloud providers and AI infrastructure operators deploying inference at scale in networking, content moderation, CDN, trading, and token-as-a-service sectors.
How does Positron compare to Nvidia? +
Positron's FPGA architecture achieves 3x lower latency and 66% lower power consumption than H100s on inference workloads by optimizing memory bandwidth (93% vs. 10-30%). Nvidia GPUs remain superior for general compute and training; Positron is purpose-built for inference efficiency at scale.
What is Positron's next product? +
Asimov (codename Titan) will deliver 5x more tokens-per-watt than Nvidia's Rubin GPU, 2.3TB RAM per device (vs. 384GB for Rubin), and support for multi-trillion parameter models with massive context windows—targeting video, trading, and next-generation AI applications.
Tags
inference acceleration hardware FPGA transformer models memory bandwidth AI infrastructure semiconductor