Skip to content

KQL Queries

KQL is the small metadata query language built into kinodb-core. It is intentionally narrow: filter episodes by fields needed for robot dataset selection, curriculum construction, validation, and mixed-source training.

<field> <operator> <value> [AND <field> <operator> <value> ...]

Examples:

Terminal window
kino query data.kdb "success = true"
kino query data.kdb "embodiment = 'franka' AND num_frames > 50"
kino query data.kdb "task CONTAINS 'pick' AND fps >= 10.0"
kino query data.kdb "total_reward != null" --limit 25
FieldTypeExample
embodimentstringembodiment = 'franka'
taskstringtask CONTAINS 'drawer'
successbool or nullsuccess = true
num_framesintnum_frames >= 100
action_dimintaction_dim = 7
fpsfloatfps >= 10.0
total_rewardfloat or nulltotal_reward > 0.5

task maps to EpisodeMeta.language_instruction in Rust and to meta["task"] in Python.

OperatorMeaningApplies to
=equalsall fields
!=not equalsall fields
>greater thannumeric fields
<less thannumeric fields
>=greater than or equalnumeric fields
<=less than or equalnumeric fields
CONTAINSsubstring matchstring fields

KQL currently supports AND. OR, parentheses, projections, and joins are intentionally out of scope for the current implementation.

'single quoted string'
"double quoted string"
bare_string
true
false
null
none
123
12.5

Bare words are accepted as strings:

Terminal window
kino query data.kdb "embodiment = franka"

For launch docs and scripts, quote strings anyway. It makes examples easier to read.

Terminal window
kino query data.kdb "success = true AND task CONTAINS 'pick'"
kino query data.kdb "num_frames > 100" --limit 10

The command prints matching episode positions and metadata. Positions are zero-based and can be passed to Python read_episode(position).

import kinodb
db = kinodb.open("data.kdb")
positions = db.query("success = true AND num_frames > 100")
for pos in positions[:5]:
meta = db.read_meta(pos)
print(pos, meta["task"], meta["num_frames"])

KQL filters can be used directly in KinoDataset:

from kinodb.torch import KinoDataset
dataset = KinoDataset(
"data.kdb",
kql_filter="success = true AND action_dim = 7",
)

Create a smaller physical dataset:

Terminal window
kino merge raw.kdb --output successful.kdb --filter "success = true"

This is useful when distributing a curated dataset split.

KQL works by parsing the expression into a small AST and scanning episode metadata with read_meta. It avoids decoding frames and images. That is why benchmark metadata scans are the strongest win: native HDF5/Parquet/RLDS loaders usually have to walk their own source structures, while .kdb keeps episode metadata addressable through the index.

Recorded benchmark summary:

Dataset classMetadata scan result
10 tabular datasetsMedian 375x faster, range 48-612x
5 image datasets605-2,648x faster

Common errors:

ErrorCause
empty queryThe string is empty or whitespace
unknown fieldField is not one of the supported KQL fields
expected operatorMissing =, !=, comparison, or CONTAINS
unterminated stringMissing closing quote
expected ANDKQL currently only supports AND between conditions