Architecture
kinodb is a Rust workspace with a small Python package around the core reader.
kinodb/ Cargo.toml crates/ kinodb-core/ # .kdb file format, reader, writer, KQL, mixtures kinodb-ingest/ # HDF5, LeRobot, RLDS ingest kinodb-cli/ # kino command kinodb-serve/ # kino-serve gRPC server kinodb-py/ # PyO3 bindings, built separatelyData Flow
Section titled “Data Flow”HDF5 / LeRobot / RLDS │ ▼kinodb-ingest │ ▼KdbWriter ──► .kdb file ──► KdbReader │ ┌──────────────┼──────────────┐ ▼ ▼ ▼ kino CLI Python API kino-servekinodb-core
Section titled “kinodb-core”Owns the file format and all source-independent behavior.
| Module | Responsibility |
|---|---|
types | EpisodeMeta, Frame, ImageObs, Episode |
header | 64-byte file header |
index | fixed-size episode index entries |
writer | streaming .kdb writer |
reader | mmap-backed .kdb reader |
kql | parser and evaluator |
mixture | weighted multi-file sampling |
prefetch | prefetch-related helpers |
Design decisions:
- Keep file layout explicit and little-endian.
- Store episode payloads contiguously.
- Put the index at the end so the writer can stream episodes.
- Use mmap for low open cost and OS-managed paging.
- Keep KQL metadata-only so filtering does not decode frames.
kinodb-ingest
Section titled “kinodb-ingest”Normalizes external formats into Episode objects and writes through KdbWriter.
| Module | Source |
|---|---|
hdf5 | robomimic/LIBERO-style data/demo_* files |
lerobot | LeRobot v2/v3 Parquet datasets |
rlds | TFRecord files with RLDS-style features |
The ingesters make format-specific choices, such as concatenating HDF5 obs/* state keys in sorted order and grouping LeRobot rows by episode_index.
kinodb-cli
Section titled “kinodb-cli”Wraps core and ingest functionality in kino.
Implemented commands:
create-testingestinfoschemavalidatequerymixmergeexportbench
kinodb-serve
Section titled “kinodb-serve”Provides a gRPC service through tonic and prost.
Backends:
- single
.kdbfile, - weighted
Mixture.
RPCs:
ServerInfoGetMetaGetEpisodeGetBatchQuery
kinodb-py
Section titled “kinodb-py”Provides the Python bridge:
kinodb.open()Database- NumPy-returning episode reads
- KQL queries
- PyTorch helpers in
kinodb.torch - gRPC client helpers in
kinodb.remote
The Python package is excluded from the workspace root because it is built with maturin.
Why Not Just HDF5?
Section titled “Why Not Just HDF5?”HDF5 is excellent at hierarchical arrays. kinodb adds trajectory-specific behavior:
- one episode index across the whole dataset,
- uniform metadata filtering,
- cross-format ingest,
- dataset mixtures,
- CLI validation/schema tools,
- Python training integration over the same
.kdblayout.
Why Not Just Parquet?
Section titled “Why Not Just Parquet?”Parquet is strong for columnar analytics. Robot training often asks for all data for one episode or sampled windows from episodes. kinodb stores an episode contiguously and keeps a direct byte index for that access pattern.
Why Rust?
Section titled “Why Rust?”The hot path needs:
- memory safety around mmap and binary parsing,
- predictable performance,
- native CLI distribution,
- async gRPC serving,
- Python bindings without putting the engine in Python.
Rust gives the project a systems layer while still letting researchers train from Python.
Current Technical Debt
Section titled “Current Technical Debt”The docs intentionally preserve the honest current state:
- HDF5 build compatibility needs pinning or a static build strategy.
- Python dataset handles need more ergonomic multi-worker behavior.
- Image reads currently decode into NumPy arrays; lazy/raw compressed image paths would improve image-heavy workflows.
- Shared-memory serving and hardware decode remain roadmap items from the original PDF blueprint.