Home /Research /RoboCache: A Distributed Key–Value Store for Petabyte–Scale Multimodal Robot Learning Datasets
LEARNING

RoboCache: A Distributed Key–Value Store for Petabyte–Scale Multimodal Robot Learning Datasets

Tejas Patel, Sandeep Shivam, Amit Kumar Padhy, Bharadwaj Vulugunda, Chaitanya Kulkarni, Chandrashekhar Medicherla

Year
2025
Citations
6

Abstract

RoboCache is a high-performance, fault-tolerant distributed key-value store explicitly designed to store, version, and serve petabyte-scale multimodal robot learning datasets, including images, point clouds, proprioceptive signals, actions, and language annotations. Existing cloud object stores and research datasets such as RoboNet and RT-X exhibit high latency, weak random-access performance, and poor versioning semantics, which bottleneck the training of robot foundation models. RoboCache introduces (1) multimodal-aware partitioning that co-locates temporally and semantically related robot traces, (2) a hierarchical tiered-engine enabling zero-copy data movement across NVMe, GPU memory, and remote nodes, (3) built-in dataset versioning and time-travel queries using snapshot-based copy-on-write, and (4) a robot-native query API supporting filtering by embodiment, timestamp, task domain, or natural-language annotations. Deployments on a 128-node cluster sustain 1.8 million random reads/sec and reduce data-loading time by 42× compared to S3+SQLite baselines and 12× compared to state-of-the-art multimodal dataset systems. RoboCache is open-source and currently powers multiple large-scale robot foundation model training pipelines.

Keywords

RobotFeature (linguistics)Human–robot interactionRoboticsRobot learning

Related papers

Browse all LEARNING papers