RoboCache: A Distributed Key–Value Store for Petabyte–Scale Multimodal Robot Learning Datasets
Tejas Patel, Sandeep Shivam, Amit Kumar Padhy, Bharadwaj Vulugunda, Chaitanya Kulkarni, Chandrashekhar Medicherla
- Year
- 2025
- Citations
- 6
Abstract
RoboCache is a high-performance, fault-tolerant distributed key-value store explicitly designed to store, version, and serve petabyte-scale multimodal robot learning datasets, including images, point clouds, proprioceptive signals, actions, and language annotations. Existing cloud object stores and research datasets such as RoboNet and RT-X exhibit high latency, weak random-access performance, and poor versioning semantics, which bottleneck the training of robot foundation models. RoboCache introduces (1) multimodal-aware partitioning that co-locates temporally and semantically related robot traces, (2) a hierarchical tiered-engine enabling zero-copy data movement across NVMe, GPU memory, and remote nodes, (3) built-in dataset versioning and time-travel queries using snapshot-based copy-on-write, and (4) a robot-native query API supporting filtering by embodiment, timestamp, task domain, or natural-language annotations. Deployments on a 128-node cluster sustain 1.8 million random reads/sec and reduce data-loading time by 42× compared to S3+SQLite baselines and 12× compared to state-of-the-art multimodal dataset systems. RoboCache is open-source and currently powers multiple large-scale robot foundation model training pipelines.
Keywords
Related papers
Artificial intelligence: a modern approach
1995
Are we ready for autonomous driving? The KITTI vision benchmark suite
Andreas Geiger, P Lenz, R. Urtasun
2012
Self-Organizing Maps
Teuvo Kohonen
1995
Vision meets robotics: The KITTI dataset
Andreas Geiger, Philip Lenz, Christoph Stiller +1 more
2013