Dataset Mixtures
Modern VLA training often mixes demonstrations from different robots, tasks, and source formats. kinodb gives you two paths:
kino mergecreates one physical.kdbfile.kino mixandfrom_mixture()create weighted virtual mixtures.
Physical Merge
Section titled “Physical Merge”Use merge when you want a single file to ship, archive, upload, or train sequentially.
kino merge lift.kdb pusht.kdb aloha.kdb --output combined.kdbFilter while merging:
kino merge lift.kdb pusht.kdb \ --output successful.kdb \ --filter "success = true"This reads each input episode, applies the optional KQL filter, then writes matching episodes to a new database.
Weighted Mixture CLI
Section titled “Weighted Mixture CLI”Use mix when you want training-time sampling proportions.
kino mix \ --source bridge.kdb:0.4 \ --source aloha.kdb:0.3 \ --source libero.kdb:0.3Sample a distribution:
kino mix \ --source bridge.kdb:0.4 \ --source aloha.kdb:0.3 \ --source libero.kdb:0.3 \ --sample 1000 \ --seed 42Weights are relative. 4:3:3 and 0.4:0.3:0.3 are equivalent.
Python Mixtures
Section titled “Python Mixtures”from kinodb.torch import from_mixturefrom torch.utils.data import DataLoader
dataset = from_mixture( { "bridge.kdb": 0.4, "aloha.kdb": 0.3, "libero.kdb": 0.3, }, seed=42, image_size=(224, 224),)
loader = DataLoader(dataset, batch_size=8)Each source can come from a different original format. The training code only sees .kdb.
Rust API
Section titled “Rust API”use kinodb_core::Mixture;
let mut mix = Mixture::builder() .add("bridge.kdb", 0.4) .add("aloha.kdb", 0.3) .add("libero.kdb", 0.3) .seed(42) .build()?;
let episode = mix.sample()?;let global_episode = mix.read_global(10)?;let order = mix.weighted_epoch(1000);Mixed Schema Reality
Section titled “Mixed Schema Reality”Different datasets can have different action and state dimensions. The experiment history hit this directly when mixing PushT (action_dim = 2) with ALOHA (action_dim = 14): raw torch.stack fails unless the collate function pads or batches by schema.
Common strategies:
| Strategy | When to use |
|---|---|
| Pad state/action vectors to max dimension | One model with source-aware masks |
| Bucket by schema | Multi-embodiment training with separate heads |
| Train separate adapters | Different embodiments have genuinely different action spaces |
| Physical merge only same-schema data | Simplest archival/distribution path |
kinodb preserves dimensions and metadata; it does not hide schema differences from your model.
When To Merge vs Mix
Section titled “When To Merge vs Mix”| Use case | Pick |
|---|---|
| Publish one converted dataset | kino merge |
| Build a filtered release split | kino merge --filter |
| Match OpenVLA-style source proportions | kino mix or from_mixture() |
| Change ratios between runs | virtual mixture |
| Maximize sequential reads from one file | physical merge |