Skip to content

Architecture

kinodb is a Rust workspace with a small Python package around the core reader.

kinodb/
Cargo.toml
crates/
kinodb-core/ # .kdb file format, reader, writer, KQL, mixtures
kinodb-ingest/ # HDF5, LeRobot, RLDS ingest
kinodb-cli/ # kino command
kinodb-serve/ # kino-serve gRPC server
kinodb-py/ # PyO3 bindings, built separately
HDF5 / LeRobot / RLDS
kinodb-ingest
KdbWriter ──► .kdb file ──► KdbReader
┌──────────────┼──────────────┐
▼ ▼ ▼
kino CLI Python API kino-serve

Owns the file format and all source-independent behavior.

ModuleResponsibility
typesEpisodeMeta, Frame, ImageObs, Episode
header64-byte file header
indexfixed-size episode index entries
writerstreaming .kdb writer
readermmap-backed .kdb reader
kqlparser and evaluator
mixtureweighted multi-file sampling
prefetchprefetch-related helpers

Design decisions:

  • Keep file layout explicit and little-endian.
  • Store episode payloads contiguously.
  • Put the index at the end so the writer can stream episodes.
  • Use mmap for low open cost and OS-managed paging.
  • Keep KQL metadata-only so filtering does not decode frames.

Normalizes external formats into Episode objects and writes through KdbWriter.

ModuleSource
hdf5robomimic/LIBERO-style data/demo_* files
lerobotLeRobot v2/v3 Parquet datasets
rldsTFRecord files with RLDS-style features

The ingesters make format-specific choices, such as concatenating HDF5 obs/* state keys in sorted order and grouping LeRobot rows by episode_index.

Wraps core and ingest functionality in kino.

Implemented commands:

  • create-test
  • ingest
  • info
  • schema
  • validate
  • query
  • mix
  • merge
  • export
  • bench

Provides a gRPC service through tonic and prost.

Backends:

  • single .kdb file,
  • weighted Mixture.

RPCs:

  • ServerInfo
  • GetMeta
  • GetEpisode
  • GetBatch
  • Query

Provides the Python bridge:

  • kinodb.open()
  • Database
  • NumPy-returning episode reads
  • KQL queries
  • PyTorch helpers in kinodb.torch
  • gRPC client helpers in kinodb.remote

The Python package is excluded from the workspace root because it is built with maturin.

HDF5 is excellent at hierarchical arrays. kinodb adds trajectory-specific behavior:

  • one episode index across the whole dataset,
  • uniform metadata filtering,
  • cross-format ingest,
  • dataset mixtures,
  • CLI validation/schema tools,
  • Python training integration over the same .kdb layout.

Parquet is strong for columnar analytics. Robot training often asks for all data for one episode or sampled windows from episodes. kinodb stores an episode contiguously and keeps a direct byte index for that access pattern.

The hot path needs:

  • memory safety around mmap and binary parsing,
  • predictable performance,
  • native CLI distribution,
  • async gRPC serving,
  • Python bindings without putting the engine in Python.

Rust gives the project a systems layer while still letting researchers train from Python.

The docs intentionally preserve the honest current state:

  • HDF5 build compatibility needs pinning or a static build strategy.
  • Python dataset handles need more ergonomic multi-worker behavior.
  • Image reads currently decode into NumPy arrays; lazy/raw compressed image paths would improve image-heavy workflows.
  • Shared-memory serving and hardware decode remain roadmap items from the original PDF blueprint.