Skip to content

IO Performance

This page presents the systems benchmarks from the latest pasted experiment log. It focuses on open time, sequential reads, KQL latency, storage size, write speed, and image-throughput validation.

Synthetic datasets were generated at 100, 500, 1K, 5K, 10K, and 50K episodes, with 50 frames per episode. Each run generated HDF5, Parquet, ingested to kinodb, then measured open, sequential read, and KQL.

50K open time 1.2ms kinodb vs 158.1ms HDF5 and 2.87s Parquet
50K sequential read 1.26s kinodb vs 13.36s HDF5 and 118.40s Parquet
50K KQL scan 31.7ms metadata query across 50K episodes
Parquet seq gap 94x at 50K episodes: 118.40s vs 1.26s
Open / HDF5
158.1ms
Open / Parquet
2.87s
Open / kinodb
1.2ms
Sequential / HDF5
13.36s
Sequential / Parquet
118.40s
Sequential / kinodb
1.26s
EpisodesHDF5 openParquet openkinodb openHDF5 seqParquet seqkinodb seqKQL
100912us18.1ms51us26.0ms50.1ms2.2ms88us
5001.6ms26.9ms133us127.5ms226.7ms10.7ms259us
1,0002.4ms48.9ms82us251.8ms480.2ms18.8ms507us
5,00010.0ms312.8ms158us1.30s3.22s90.1ms2.9ms
10,00019.0ms666.8ms262us2.61s8.30s181.4ms5.7ms
50,000158.1ms2.87s1.2ms13.36s118.40s1.26s31.7ms

The scaling shape is the result: kinodb keeps open and metadata access near-index-bound, while Parquet open/read costs grow sharply with many small trajectory groups.

The storage experiment tested state-only data and image-heavy data across HDF5, compressed HDF5, NPY directory layouts, Parquet, and kinodb.

Dataset sizeHDF5HDF5 compressedNPY dirParquetkinodb
100 eps x 50 frames0.64 MB1.40 MB0.48 MB0.91 MB0.45 MB
500 eps x 50 frames3.19 MB6.98 MB2.39 MB4.02 MB2.26 MB
1,000 eps x 50 frames6.37 MB13.96 MB4.78 MB7.45 MB4.52 MB

Write time for the 1,000-episode state-only case:

HDF5
0.542s
HDF5 compressed
1.250s
NPY dir
0.247s
Parquet
0.400s
kinodb
0.018s

For image-heavy synthetic data, kinodb lands at storage parity with raw layouts and writes faster than Parquet. HDF5 compression is not helpful on these synthetic images; it increases size slightly and makes writes much slower.

Dataset sizeHDF5HDF5 compressedNPY dirParquetkinodb
84x84, 100 eps x 30 frames64.10 MB72.08 MB63.82 MB64.06 MB63.78 MB
84x84, 500 eps x 30 frames320.48 MB360.38 MB319.10 MB320.17 MB318.90 MB
224x224, 50 eps x 30 frames226.09 MB227.73 MB225.95 MB226.08 MB225.93 MB
224x224, 200 eps x 30 frames904.35 MB910.91 MB903.80 MB904.32 MB903.72 MB

Representative write times:

CaseHDF5HDF5 compressedNPY dirParquetkinodb
84x84, 500 episodes0.680s13.014s0.410s1.364s0.269s
224x224, 200 episodes0.835s25.685s0.705s4.867s0.808s

The strongest systems claims from this run are:

  • kinodb opens 50K-episode synthetic datasets in 1.2ms;
  • kinodb sequentially reads the same 50K run in 1.26s vs 13.36s for HDF5 and 118.40s for Parquet;
  • KQL metadata queries stay below 32ms at 50K episodes;
  • state-only .kdb storage is the smallest format tested in the run;
  • image-heavy .kdb storage is at native-size parity, with faster writes than Parquet in the reported cases.