IO Performance

This page presents the systems benchmarks from the latest pasted experiment log. It focuses on open time, sequential reads, KQL latency, storage size, write speed, and image-throughput validation.

Scaling

Synthetic datasets were generated at 100, 500, 1K, 5K, 10K, and 50K episodes, with 50 frames per episode. Each run generated HDF5, Parquet, ingested to kinodb, then measured open, sequential read, and KQL.

50K open time 1.2ms kinodb vs 158.1ms HDF5 and 2.87s Parquet

50K sequential read 1.26s kinodb vs 13.36s HDF5 and 118.40s Parquet

50K KQL scan 31.7ms metadata query across 50K episodes

Parquet seq gap 94x at 50K episodes: 118.40s vs 1.26s

50K Episode Snapshot

Open / HDF5

158.1ms

Open / Parquet

2.87s

Open / kinodb

1.2ms

Sequential / HDF5

13.36s

Sequential / Parquet

118.40s

Sequential / kinodb

1.26s

Full Scaling Table

Episodes	HDF5 open	Parquet open	kinodb open	HDF5 seq	Parquet seq	kinodb seq	KQL
100	912us	18.1ms	51us	26.0ms	50.1ms	2.2ms	88us
500	1.6ms	26.9ms	133us	127.5ms	226.7ms	10.7ms	259us
1,000	2.4ms	48.9ms	82us	251.8ms	480.2ms	18.8ms	507us
5,000	10.0ms	312.8ms	158us	1.30s	3.22s	90.1ms	2.9ms
10,000	19.0ms	666.8ms	262us	2.61s	8.30s	181.4ms	5.7ms
50,000	158.1ms	2.87s	1.2ms	13.36s	118.40s	1.26s	31.7ms

The scaling shape is the result: kinodb keeps open and metadata access near-index-bound, while Parquet open/read costs grow sharply with many small trajectory groups.

Storage Efficiency

The storage experiment tested state-only data and image-heavy data across HDF5, compressed HDF5, NPY directory layouts, Parquet, and kinodb.

State-Only Storage

Dataset size	HDF5	HDF5 compressed	NPY dir	Parquet	kinodb
100 eps x 50 frames	0.64 MB	1.40 MB	0.48 MB	0.91 MB	0.45 MB
500 eps x 50 frames	3.19 MB	6.98 MB	2.39 MB	4.02 MB	2.26 MB
1,000 eps x 50 frames	6.37 MB	13.96 MB	4.78 MB	7.45 MB	4.52 MB

Write time for the 1,000-episode state-only case:

HDF5

0.542s

HDF5 compressed

1.250s

NPY dir

0.247s

Parquet

0.400s

kinodb

0.018s

Image Storage

For image-heavy synthetic data, kinodb lands at storage parity with raw layouts and writes faster than Parquet. HDF5 compression is not helpful on these synthetic images; it increases size slightly and makes writes much slower.

Dataset size	HDF5	HDF5 compressed	NPY dir	Parquet	kinodb
84x84, 100 eps x 30 frames	64.10 MB	72.08 MB	63.82 MB	64.06 MB	63.78 MB
84x84, 500 eps x 30 frames	320.48 MB	360.38 MB	319.10 MB	320.17 MB	318.90 MB
224x224, 50 eps x 30 frames	226.09 MB	227.73 MB	225.95 MB	226.08 MB	225.93 MB
224x224, 200 eps x 30 frames	904.35 MB	910.91 MB	903.80 MB	904.32 MB	903.72 MB

Representative write times:

Case	HDF5	HDF5 compressed	NPY dir	Parquet	kinodb
84x84, 500 episodes	0.680s	13.014s	0.410s	1.364s	0.269s
224x224, 200 episodes	0.835s	25.685s	0.705s	4.867s	0.808s

Summary

The strongest systems claims from this run are:

kinodb opens 50K-episode synthetic datasets in 1.2ms;
kinodb sequentially reads the same 50K run in 1.26s vs 13.36s for HDF5 and 118.40s for Parquet;
KQL metadata queries stay below 32ms at 50K episodes;
state-only .kdb storage is the smallest format tested in the run;
image-heavy .kdb storage is at native-size parity, with faster writes than Parquet in the reported cases.