Voltron Data
Voltron Data accelerates petabyte-scale queries using GPU compute and Apache Arrow.
Voltron Data builds Theseus, a GPU-accelerated query engine that enables petabyte-scale data analytics directly on data lakes, lakehouses, and warehouses using Apache Arrow standards. The platform allows organizations to query massive datasets 10-100x faster while reducing infrastructure costs dramatically—one retailer cut server count from 1,400 CPU machines to 14 GPU servers. Unlike traditional CPU-based analytics platforms, Voltron's approach is designed to complement existing systems like Spark and DuckDB, letting users scale without vendor lock-in.
Problem solved
Organizations with 30-100 terabyte query workloads face prohibitively high infrastructure costs and slow query performance using traditional CPU-based analytics engines.
Target customer
Enterprise organizations with petabyte-scale data analytics workloads in financial services, government, and tech sectors; data warehouse and lakehouse operators.
Founders
J
Joshua Patterson
CEO & Co-Founder
Led software engineering at NVIDIA where he created the RAPIDS GPU computing ecosystem; served as White House Presidential Innovation Fellow at Accenture; holds BA in economics from UNC Chapel Hill and MA from University of South Carolina Moore School of Business.
W
Wes McKinney
CTO & Co-Founder
Co-creator and lead of Apache Arrow, the industry standard for accelerated data interchange and in-memory computing across multiple languages.
R
Rodrigo Aramburu
Chief Product Officer & Co-Founder
Co-creator of BlazingSQL, a distributed GPU-accelerated SQL query engine.
D
Darren Haas
Co-Founder
Early employee at Siri before Apple acquisition; held roles at GE, Apple, and AWS.
K
Keith Kraus
Co-Founder
W
William Malpica
Co-Founder
Funding history
Seed
$22M
February 2022
Led by BlackRock, Walden Catalyst
· Unknown
Series A
$88M
February 2022
Led by Walden Catalyst, Lightspeed Venture Partners
· Battery Ventures, Coatue Management
Series A-II
Undisclosed
February 2025
Led by Accenture Ventures
· Unknown
Total raised:
$110M+
Pricing
Enterprise subscription service for Apache Arrow-based solutions with three editions including a free tier; specific pricing not publicly available. Contact sales for custom enterprise pricing.
Notable customers
Two large U.S. federal government agencies, Meta, Snowflake, Datastax, hedge funds, major retailer (14+ enterprise customers total)
Integrations
Apache Arrow, Apache Spark, DuckDB, Apache Iceberg, Parquet file format
Website
Competitors
Redis
Redis is an in-memory data store focused on caching and real-time applications; Voltron targets large-scale batch analytics on data lakes.
GridGain
GridGain provides distributed in-memory computing for Java applications; Voltron is GPU-accelerated and designed for petabyte-scale data lake queries.
Zilliz
Zilliz focuses on vector databases for AI/ML similarity search; Voltron handles general SQL analytics at massive scale.
Dremio
Dremio provides SQL query acceleration on data lakes but uses CPU-based compute; Voltron uses GPUs for superior performance on very large datasets.
InfluxData
InfluxData specializes in time-series data; Voltron is a general-purpose GPU-accelerated query engine for petabyte-scale analytics.
Why this matters: Voltron Data represents a fundamental shift in data analytics economics by proving GPU acceleration is practical for petabyte-scale workloads on open data formats. The team's deep expertise in Apache Arrow and GPU computing (CEO built NVIDIA RAPIDS, CTO created Arrow) positions them to capture a significant portion of the market for cost-optimized enterprise analytics as petabyte-scale data becomes increasingly common.
Best for: Enterprise data teams managing petabyte-scale data lakes or warehouses who need dramatically faster query performance and lower infrastructure costs, especially in financial services and government.
Use cases
Fraud Detection at Scale
Financial services firms use Theseus to run complex fraud detection queries across petabytes of transaction data in minutes instead of hours. This enables real-time risk assessment and faster regulatory compliance reporting without massive CPU infrastructure investments.
Government Data Analytics
U.S. federal agencies process petabyte-sized classified and unclassified datasets for intelligence, national security, and policy analysis using Voltron's GPU acceleration. The reduced server footprint addresses physical space and power constraints in secure facilities.
Risk Modeling for Hedge Funds
Hedge funds run complex portfolio risk simulations and backtests across massive historical datasets. Voltron's GPU acceleration reduces query time from hours to sub-hour, enabling faster investment decisions and more frequent model retraining.
Data Lake Infrastructure Consolidation
Large retailers and tech companies consolidate analytics infrastructure by replacing 1,400+ CPU servers with 14 GPU machines while maintaining or improving query performance. This dramatically reduces operational costs, power consumption, and data center footprint.
Alternatives
Apache Spark
Spark is CPU-based distributed computing; Voltron is GPU-accelerated and optimized for petabyte-scale interactive queries, not batch processing.
Snowflake
Snowflake is a managed cloud data warehouse with CPU compute; Voltron is designed to complement warehouses by offering GPU-accelerated queries on open formats without vendor lock-in.
DuckDB
DuckDB is a lightweight in-process OLAP database for single machines; Voltron scales to petabyte datasets across distributed GPU clusters.
Presto
Presto is a distributed SQL query engine for multi-source data; Voltron is GPU-specialized for maximum performance on single large datasets.
FAQ
What does Voltron Data do? +
Voltron Data builds Theseus, a GPU-accelerated query engine that processes petabyte-scale data analytics directly on data lakes, lakehouses, and warehouses. Using Apache Arrow open standards, it enables organizations to run complex queries 10-100x faster and with significantly fewer servers than CPU-based alternatives, while avoiding vendor lock-in.
How much does Voltron Data cost? +
Voltron offers an enterprise subscription service with three editions including a free tier, but specific pricing is not publicly disclosed. Organizations should contact sales for custom pricing based on deployment scale and support needs.
What are alternatives to Voltron Data? +
Apache Spark is a popular distributed computing framework but uses CPU-based compute; Snowflake is a managed cloud warehouse but doesn't offer GPU acceleration; DuckDB is lightweight but limited to single machines; Presto is a distributed SQL engine but not GPU-optimized. Each suits different scale and architecture requirements.
Who uses Voltron Data? +
Target customers are enterprise organizations with petabyte-scale analytics workloads. Known users include U.S. federal government agencies, Meta, Snowflake, Datastax, hedge funds, and a major retailer that reduced its server infrastructure from 1,400 to 14 machines.
How does Voltron Data compare to Snowflake? +
Snowflake is a managed cloud data warehouse with CPU-based compute optimized for multi-user concurrency. Voltron is designed to complement existing systems like Snowflake by offering GPU-accelerated interactive queries on open data formats, providing better performance for single large analytical queries without requiring data migration or long-term vendor commitment.
What data formats does Voltron support? +
Theseus supports Apache Iceberg, Parquet, and standard file formats commonly found in data lakes. It's built on Apache Arrow, an open standard that enables code portability across analytics platforms.
How much faster is Voltron than traditional databases? +
Voltron has completed the TPC-H 100TB benchmark on unsorted Parquet files in under an hour using only 6TB of GPU memory. Reported improvements range from 10-100x faster depending on workload, with one customer reducing server count from 1,400 to 14 machines.
Tags
GPU acceleration
data analytics
petabyte-scale
Apache Arrow
data lakes
lakehouses
query engine
cost optimization
infrastructure