Startups > Voltron Data

Voltron Data

Voltron Data accelerates petabyte-scale queries using GPU compute and Apache Arrow.

Seed $110M+ total Founded 2021 Mountain View, California

Voltron Data builds Theseus, a GPU-accelerated query engine that enables petabyte-scale data analytics directly on data lakes, lakehouses, and warehouses using Apache Arrow standards. The platform allows organizations to query massive datasets 10-100x faster while reducing infrastructure costs dramatically—one retailer cut server count from 1,400 CPU machines to 14 GPU servers. Unlike traditional CPU-based analytics platforms, Voltron's approach is designed to complement existing systems like Spark and DuckDB, letting users scale without vendor lock-in.

Problem solved

Organizations with 30-100 terabyte query workloads face prohibitively high infrastructure costs and slow query performance using traditional CPU-based analytics engines.

Target customer

Enterprise organizations with petabyte-scale data analytics workloads in financial services, government, and tech sectors; data warehouse and lakehouse operators.

Website LinkedIn Crunchbase

Founders

Joshua Patterson

CEO & Co-Founder

Led software engineering at NVIDIA where he created the RAPIDS GPU computing ecosystem; served as White House Presidential Innovation Fellow at Accenture; holds BA in economics from UNC Chapel Hill and MA from University of South Carolina Moore School of Business.

Wes McKinney

CTO & Co-Founder

Co-creator and lead of Apache Arrow, the industry standard for accelerated data interchange and in-memory computing across multiple languages.

Rodrigo Aramburu

Chief Product Officer & Co-Founder

Co-creator of BlazingSQL, a distributed GPU-accelerated SQL query engine.

Darren Haas

Co-Founder

Early employee at Siri before Apple acquisition; held roles at GE, Apple, and AWS.

Keith Kraus

Co-Founder

William Malpica

Co-Founder

Funding history

Seed $22M February 2022 Led by BlackRock, Walden Catalyst · Unknown

Series A $88M February 2022 Led by Walden Catalyst, Lightspeed Venture Partners · Battery Ventures, Coatue Management

Series A-II Undisclosed February 2025 Led by Accenture Ventures · Unknown

Total raised: $110M+

Industries

Analytics Hardware Software

Pricing

Enterprise subscription service for Apache Arrow-based solutions with three editions including a free tier; specific pricing not publicly available. Contact sales for custom enterprise pricing.

Notable customers

Two large U.S. federal government agencies, Meta, Snowflake, Datastax, hedge funds, major retailer (14+ enterprise customers total)

Integrations

Apache Arrow, Apache Spark, DuckDB, Apache Iceberg, Parquet file format

Website

voltrondata.com/

Competitors

Redis

Redis is an in-memory data store focused on caching and real-time applications; Voltron targets large-scale batch analytics on data lakes.

GridGain

GridGain provides distributed in-memory computing for Java applications; Voltron is GPU-accelerated and designed for petabyte-scale data lake queries.

Zilliz

Zilliz focuses on vector databases for AI/ML similarity search; Voltron handles general SQL analytics at massive scale.

Dremio

Dremio provides SQL query acceleration on data lakes but uses CPU-based compute; Voltron uses GPUs for superior performance on very large datasets.

InfluxData

InfluxData specializes in time-series data; Voltron is a general-purpose GPU-accelerated query engine for petabyte-scale analytics.

Why this matters: Voltron Data represents a fundamental shift in data analytics economics by proving GPU acceleration is practical for petabyte-scale workloads on open data formats. The team's deep expertise in Apache Arrow and GPU computing (CEO built NVIDIA RAPIDS, CTO created Arrow) positions them to capture a significant portion of the market for cost-optimized enterprise analytics as petabyte-scale data becomes increasingly common.

Best for: Enterprise data teams managing petabyte-scale data lakes or warehouses who need dramatically faster query performance and lower infrastructure costs, especially in financial services and government.

Use cases

Fraud Detection at Scale

Financial services firms use Theseus to run complex fraud detection queries across petabytes of transaction data in minutes instead of hours. This enables real-time risk assessment and faster regulatory compliance reporting without massive CPU infrastructure investments.

Government Data Analytics

U.S. federal agencies process petabyte-sized classified and unclassified datasets for intelligence, national security, and policy analysis using Voltron's GPU acceleration. The reduced server footprint addresses physical space and power constraints in secure facilities.

Risk Modeling for Hedge Funds

Hedge funds run complex portfolio risk simulations and backtests across massive historical datasets. Voltron's GPU acceleration reduces query time from hours to sub-hour, enabling faster investment decisions and more frequent model retraining.

Data Lake Infrastructure Consolidation

Large retailers and tech companies consolidate analytics infrastructure by replacing 1,400+ CPU servers with 14 GPU machines while maintaining or improving query performance. This dramatically reduces operational costs, power consumption, and data center footprint.

Alternatives

Apache Spark Spark is CPU-based distributed computing; Voltron is GPU-accelerated and optimized for petabyte-scale interactive queries, not batch processing.

Snowflake Snowflake is a managed cloud data warehouse with CPU compute; Voltron is designed to complement warehouses by offering GPU-accelerated queries on open formats without vendor lock-in.

DuckDB DuckDB is a lightweight in-process OLAP database for single machines; Voltron scales to petabyte datasets across distributed GPU clusters.

Presto Presto is a distributed SQL query engine for multi-source data; Voltron is GPU-specialized for maximum performance on single large datasets.

FAQ

What does Voltron Data do? +

Voltron Data builds Theseus, a GPU-accelerated query engine that processes petabyte-scale data analytics directly on data lakes, lakehouses, and warehouses. Using Apache Arrow open standards, it enables organizations to run complex queries 10-100x faster and with significantly fewer servers than CPU-based alternatives, while avoiding vendor lock-in.

How much does Voltron Data cost? +

Voltron offers an enterprise subscription service with three editions including a free tier, but specific pricing is not publicly disclosed. Organizations should contact sales for custom pricing based on deployment scale and support needs.

What are alternatives to Voltron Data? +

Apache Spark is a popular distributed computing framework but uses CPU-based compute; Snowflake is a managed cloud warehouse but doesn't offer GPU acceleration; DuckDB is lightweight but limited to single machines; Presto is a distributed SQL engine but not GPU-optimized. Each suits different scale and architecture requirements.

Who uses Voltron Data? +

Target customers are enterprise organizations with petabyte-scale analytics workloads. Known users include U.S. federal government agencies, Meta, Snowflake, Datastax, hedge funds, and a major retailer that reduced its server infrastructure from 1,400 to 14 machines.

How does Voltron Data compare to Snowflake? +

Snowflake is a managed cloud data warehouse with CPU-based compute optimized for multi-user concurrency. Voltron is designed to complement existing systems like Snowflake by offering GPU-accelerated interactive queries on open data formats, providing better performance for single large analytical queries without requiring data migration or long-term vendor commitment.

What data formats does Voltron support? +

Theseus supports Apache Iceberg, Parquet, and standard file formats commonly found in data lakes. It's built on Apache Arrow, an open standard that enables code portability across analytics platforms.

How much faster is Voltron than traditional databases? +

Voltron has completed the TPC-H 100TB benchmark on unsorted Parquet files in under an hour using only 6TB of GPU memory. Reported improvements range from 10-100x faster depending on workload, with one customer reducing server count from 1,400 to 14 machines.