Training Pipeline
The training benchmarks answer the question that matters most for robot learning: when data loading is measured inside the training loop, does kinodb move wall-clock time?
The latest experiment run says yes. For CNN and MLP policies, kinodb turns the loader from the dominant cost into a smaller part of the step. For ViT policies, the win is still visible but smaller because model compute dominates.
Training Curves
Section titled “Training Curves”Setup:
- datasets:
pusht_imageandlibero_spatial; - policies:
cnn_bc,mlp, andvit; - seeds:
42,123, and456; - skipped:
robomimic_lift; - metric: native loader vs kinodb loader inside the measured training loop.
Total Speedup By Dataset And Policy
Section titled “Total Speedup By Dataset And Policy”| Dataset | Policy | Init speedup | Total speedup |
|---|---|---|---|
libero_spatial | cnn_bc | 1394x | 7.1x +/- 0.0 |
libero_spatial | mlp | 1364x | 7.7x +/- 0.2 |
libero_spatial | vit | 2828x | 2.2x +/- 0.0 |
pusht_image | cnn_bc | 26x | 6.8x +/- 0.0 |
pusht_image | mlp | 31x | 6.6x +/- 0.0 |
pusht_image | vit | 31x | 2.4x +/- 0.0 |
What The Curve Logs Show
Section titled “What The Curve Logs Show”The losses match between native and kinodb for the same dataset, policy, seed, and epoch. The win is wall-clock time, not a different learning target.
| Run | Native data per epoch | kinodb data per epoch | Native compute | kinodb compute | Total speedup | Data share shift |
|---|---|---|---|---|---|---|
| PushT image + CNN BC | ~44-45s | ~5.4-5.5s | ~1.7-1.9s | ~1.2-1.5s | 6.7-6.8x | 96% native to 79-80% kinodb |
| PushT image + MLP | ~44-45s | ~5.5-5.7s | ~1.5-1.7s | ~1.2-1.4s | 6.6-6.7x | 96% native to 80-81% kinodb |
| PushT image + ViT | ~45s | ~5.8s | ~23s | ~22.6s | 2.4x | 66% native to 20% kinodb |
| LIBERO spatial + CNN BC | ~34s | ~3.6-3.7s | ~1.7-1.9s | ~1.3-1.5s | 7.1x | 93% native to 71% kinodb |
| LIBERO spatial + MLP | ~34s | ~3.6-3.8s | ~1.2-1.5s | ~0.9-1.2s | 7.6-8.0x | 89-94% native to 76% kinodb |
| LIBERO spatial + ViT | ~34-35s | ~3.8-3.9s | ~22.6-24.6s | ~23s | 2.2-2.3x | 57-59% native to 14% kinodb |
Representative matching losses:
| Dataset / policy / seed | Native epoch 20 | kinodb epoch 20 |
|---|---|---|
| PushT image / CNN BC / 42 | 1445.6413 | 1445.6413 |
| PushT image / MLP / 123 | 1294.5226 | 1294.5226 |
| LIBERO spatial / CNN BC / 42 | 0.0481 | 0.0481 |
| LIBERO spatial / ViT / 456 | 0.0403 | 0.0403 |
Interoperability
Section titled “Interoperability”The interoperability experiment tests the main systems claim: after conversion, training code can sample from multiple robotics formats through one kinodb path.
Mixed sources:
| Source | Weight |
|---|---|
lerobot_pusht_image.kdb | 0.25 |
lerobot_libero_spatial_image.kdb | 0.25 |
lerobot_pusht.kdb | 0.25 |
lerobot_aloha_sim_insertion_scripted.kdb | 0.25 |
Mixed-source training log:
| Epoch | Loss | Time |
|---|---|---|
| 0 | 5559.1095 | 0.642s |
| 5 | 360.9029 | 0.179s |
| 10 | 57.7633 | 0.182s |
| 15 | 33.6219 | 0.179s |
KQL Query Latency
Section titled “KQL Query Latency”The KQL scan stays in microseconds on these converted datasets.
| Dataset | Query | Matches | Time |
|---|---|---|---|
| ALOHA insertion | num_frames > 50 | 50/50 | 1620us |
| ALOHA insertion | num_frames > 100 | 50/50 | 10us |
| LIBERO spatial image | num_frames > 50 | 432/432 | 2890us |
| LIBERO spatial image | num_frames > 100 | 354/432 | 106us |
| PushT state | num_frames > 50 | 205/206 | 97us |
| PushT state | num_frames > 100 | 152/206 | 33us |
| PushT image | num_frames > 50 | 206/206 | 1205us |
| PushT image | num_frames > 100 | 201/206 | 33us |
Interpretation
Section titled “Interpretation”kinodb helps most when the native loader is the bottleneck. That is exactly what the data-share percentages show: CNN/MLP runs spend roughly 89-96% of native time in data loading, then drop to 71-81% with kinodb. ViT runs are still faster, but their compute budget is so large that loader gains have a smaller ceiling.
The mixed-source experiment is equally important: the result is not only faster loading, but fewer custom loaders and one training path across datasets with different state/action dimensions.