Ingesting Data
kino ingest normalizes external robot datasets into one .kdb file. The current CLI supports:
--format hdf5--format lerobot--format rldsor--format tfrecord
Command Shape
Section titled “Command Shape”kino ingest <SRC> \ --format <hdf5|lerobot|rlds> \ --output <OUT.kdb> \ --embodiment <robot-name> \ --task "<optional task>" \ --fps <hz> \ --max-episodes <N> \ --compress <jpeg-quality>--compress is available for HDF5 and LeRobot ingest. It JPEG-compresses raw image frames, while already-compressed JPEG/PNG payloads are passed through.
The HDF5 ingester targets robomimic/LIBERO-style files:
data/ demo_0/ actions (T, action_dim) float32 rewards (T,) float32 optional dones (T,) float/int optional obs/ agentview_image (T, H, W, C) uint8 optional camera robot0_eef_pos (T, D) float32 optional state robot0_eef_quat (T, D) float32 optional state robot0_gripper_qpos (T, D) float32 optional stateExample:
kino ingest lift.hdf5 \ --format hdf5 \ --output lift.kdb \ --embodiment franka \ --task "lift the cube" \ --fps 20.0 \ --compress 85Discovery rules:
| Data | Rule |
|---|---|
| Episodes | Groups named demo_* under data/, sorted numerically |
| Actions | Required actions dataset |
| State | Any obs/* dataset with 2 dimensions, concatenated in sorted key order |
| Cameras | Any obs/* dataset with 4 dimensions |
| Success | Last reward greater than zero when rewards exist |
| Terminal | dones[t] > 0.5, otherwise the final frame |
The numeric sort matters. A benchmark correctness issue was traced to native comparison code sorting demo_10 before demo_2; kinodb sorts episode keys numerically.
LeRobot
Section titled “LeRobot”The LeRobot ingester supports v2/v3-style directories with meta/, data/, and optionally videos/.
dataset/ meta/ info.json tasks.jsonl or tasks.parquet data/ chunk-000/ file-000.parquet videos/Example:
kino ingest ./lerobot_pusht \ --format lerobot \ --output pusht.kdbWith overrides:
kino ingest ./aloha_sim \ --format lerobot \ --output aloha.kdb \ --embodiment aloha \ --task "insert the peg" \ --max-episodes 100 \ --compress 85Discovery rules:
| Data | Rule |
|---|---|
| FPS | meta/info.json field fps, default 30.0 |
| Embodiment | meta/info.json field robot_type, unless overridden |
| Tasks | meta/tasks.jsonl or meta/tasks.parquet |
| Episodes | Rows grouped by episode_index |
| Actions | Column named action or columns prefixed with action. |
| State | Column named observation.state or prefixed with observation.state. |
| Images | Struct columns containing image bytes, stored as ImageObs |
LeRobot list columns are handled. This matters because common datasets store actions/states as FixedSizeList/List arrays rather than scalar float columns.
RLDS / TFRecord
Section titled “RLDS / TFRecord”The RLDS ingester parses TFRecord files directly rather than importing TensorFlow.
kino ingest ./bridge_rlds \ --format rlds \ --output bridge.kdb \ --embodiment widowx \ --fps 3.0Expected layout:
dataset/ 1.0.0/ dataset_info.json train.tfrecord-00000-of-00005 train.tfrecord-00001-of-00005Parsing rules:
| Data | Rule |
|---|---|
| Episode boundaries | is_first, is_last, and is_terminal flags |
| Actions | action float list |
| State | Sorted observation/* float fields excluding image keys |
| Language | language_instruction or observation/language_instruction |
| Images | observation/*image* byte features decoded to RGB when possible |
| Reward | Per-step reward, summed into episode metadata |
Image Storage
Section titled “Image Storage”The writer has two paths:
| Input | Storage behavior |
|---|---|
Raw RGB bytes and --compress Q | JPEG encode at quality Q |
| JPEG/PNG bytes | Pass through the compressed payload |
| Raw RGB bytes without compression | Store raw pixels |
The benchmark history found that JPEG pass-through is the right current default for LeRobot image datasets: it keeps .kdb near native size rather than expanding compressed images into raw RGB.
After Ingest
Section titled “After Ingest”Always inspect, validate, and query:
kino info data.kdbkino schema data.kdbkino validate data.kdb --verbosekino query data.kdb "num_frames > 50"For archival or release workflows:
kino export data.kdb --format numpy --output export/kino export data.kdb --format json --output metadata/