Skip to content

Training Pipeline

The training benchmarks answer the question that matters most for robot learning: when data loading is measured inside the training loop, does kinodb move wall-clock time?

The latest experiment run says yes. For CNN and MLP policies, kinodb turns the loader from the dominant cost into a smaller part of the step. For ViT policies, the win is still visible but smaller because model compute dominates.

Setup:

  • datasets: pusht_image and libero_spatial;
  • policies: cnn_bc, mlp, and vit;
  • seeds: 42, 123, and 456;
  • skipped: robomimic_lift;
  • metric: native loader vs kinodb loader inside the measured training loop.
Best total speedup 8.0x LIBERO spatial + MLP, seed 42
Data time cut ~9x LIBERO data step: ~34s native to ~3.6s kinodb
Image data cut ~8x PushT image data step: ~45s native to ~5.6s kinodb
Compute-heavy floor 2.2-2.4x ViT still improves, but compute dominates the step
LIBERO spatial / MLP
7.7x +/- 0.2
LIBERO spatial / CNN BC
7.1x +/- 0.0
PushT image / CNN BC
6.8x +/- 0.0
PushT image / MLP
6.6x +/- 0.0
PushT image / ViT
2.4x +/- 0.0
LIBERO spatial / ViT
2.2x +/- 0.0
DatasetPolicyInit speedupTotal speedup
libero_spatialcnn_bc1394x7.1x +/- 0.0
libero_spatialmlp1364x7.7x +/- 0.2
libero_spatialvit2828x2.2x +/- 0.0
pusht_imagecnn_bc26x6.8x +/- 0.0
pusht_imagemlp31x6.6x +/- 0.0
pusht_imagevit31x2.4x +/- 0.0

The losses match between native and kinodb for the same dataset, policy, seed, and epoch. The win is wall-clock time, not a different learning target.

RunNative data per epochkinodb data per epochNative computekinodb computeTotal speedupData share shift
PushT image + CNN BC~44-45s~5.4-5.5s~1.7-1.9s~1.2-1.5s6.7-6.8x96% native to 79-80% kinodb
PushT image + MLP~44-45s~5.5-5.7s~1.5-1.7s~1.2-1.4s6.6-6.7x96% native to 80-81% kinodb
PushT image + ViT~45s~5.8s~23s~22.6s2.4x66% native to 20% kinodb
LIBERO spatial + CNN BC~34s~3.6-3.7s~1.7-1.9s~1.3-1.5s7.1x93% native to 71% kinodb
LIBERO spatial + MLP~34s~3.6-3.8s~1.2-1.5s~0.9-1.2s7.6-8.0x89-94% native to 76% kinodb
LIBERO spatial + ViT~34-35s~3.8-3.9s~22.6-24.6s~23s2.2-2.3x57-59% native to 14% kinodb

Representative matching losses:

Dataset / policy / seedNative epoch 20kinodb epoch 20
PushT image / CNN BC / 421445.64131445.6413
PushT image / MLP / 1231294.52261294.5226
LIBERO spatial / CNN BC / 420.04810.0481
LIBERO spatial / ViT / 4560.04030.0403

The interoperability experiment tests the main systems claim: after conversion, training code can sample from multiple robotics formats through one kinodb path.

Loader code 8 LOC kinodb mixed-source loader vs 26 native LOC
Sources mixed 4 PushT image, LIBERO image, PushT state, ALOHA insertion
Schema padding 14 / 14 max state and action dimensions across sources
Final logged loss 33.6219 after 15 logged epochs on the mixed dataset

Mixed sources:

SourceWeight
lerobot_pusht_image.kdb0.25
lerobot_libero_spatial_image.kdb0.25
lerobot_pusht.kdb0.25
lerobot_aloha_sim_insertion_scripted.kdb0.25

Mixed-source training log:

EpochLossTime
05559.10950.642s
5360.90290.179s
1057.76330.182s
1533.62190.179s

The KQL scan stays in microseconds on these converted datasets.

DatasetQueryMatchesTime
ALOHA insertionnum_frames > 5050/501620us
ALOHA insertionnum_frames > 10050/5010us
LIBERO spatial imagenum_frames > 50432/4322890us
LIBERO spatial imagenum_frames > 100354/432106us
PushT statenum_frames > 50205/20697us
PushT statenum_frames > 100152/20633us
PushT imagenum_frames > 50206/2061205us
PushT imagenum_frames > 100201/20633us

kinodb helps most when the native loader is the bottleneck. That is exactly what the data-share percentages show: CNN/MLP runs spend roughly 89-96% of native time in data loading, then drop to 71-81% with kinodb. ViT runs are still faster, but their compute budget is so large that loader gains have a smaller ceiling.

The mixed-source experiment is equally important: the result is not only faster loading, but fewer custom loaders and one training path across datasets with different state/action dimensions.