identity
dataset, episode, operator, robot embodiment
One manifest per episode, hashed to the episode id. The same contract governs every stage — ingest, preprocess, privacy, enrichment, episodeize, release. Gates fail loud. Releases are signed.
Every top-level block carries required and optional fields with explicit types. Hashing the manifest yields the episode id.
Top-level blocks
identity
dataset, episode, operator, robot embodiment
time
monotonic timebase + per-stream clock offsets
streams
rgb, depth, proprio, action — each with shape + fps
calibration
intrinsics + extrinsics + quality_score per camera
task
natural-language instruction, sub-tasks, success label
consent
jurisdiction, PII state, blur tool version
quality
motion blur, exposure, drift, operator tier
lineage
merkle chain + signed JWS release manifest
# Black Robotics Ingest Contract v0# Hashing the manifest yields episode_id. Missing required fields → hard reject.# Identitydataset_id: uuidepisode_id: hash(manifest)operator_id: uuidrobot:serial: strembodiment_ref: hash(urdf+srdf)firmware_version: semvercapture_device:kind: robot | aria | mocap | handheldserial: strfirmware_version: semver# Timeepisode_start_utc: iso8601episode_end_utc: iso8601timebase: monotonic | wallclock_offsets_ns:rgb_head: intdepth_head: intimu: int# Sensors — one block per streamstreams:rgb_head:type: rgbresolution: [1280, 720]fps: 30shutter: global | rollingexposure_mode: auto | fixedframe_uri: strdepth_head:type: depthresolution: [1280, 720]fps: 30aligned_to: rgb_headframe_uri: strproprio:type: proprioceptionjoint_names: [...]units: rad | m | Nhz: 200action:type: actionspace: delta_joint | abs_joint | ee_pose | vla_tokensjoint_mapping: [...]dt_ms: 33# Calibration (required per camera)calibration:rgb_head:intrinsics: { fx, fy, cx, cy, distortion }extrinsics_ref_frame: head_linkextrinsics_SE3: [...]quality_score: 0.0source: checkerboard | apriltag | viotimestamp: iso8601# Tasktask:nl_instruction: "pick up the red cup and place it on the shelf"ontology_ref: urisubtasks:- { start_ts, end_ts, label }success: truefailure_mode: null# Consent / Privacyconsent:environment: lab | enterprise_consented | public | private_homepii_state: raw | blurred | reviewedblur_tool: egoblur_gen2 | noneblur_tool_version: semverjurisdiction: us | eu | ukretention_policy_ref: uri# Qualityquality:motion_blur_score: 0.0exposure_health: 0.0dropped_frames: intoperator_tier: novice | trained | expertsensor_drift_flags: [...]# Lineage — Merkle chain, signed at releaselineage:parent_refs: [...]processing_steps:- { tool, version, input_hash, output_hash, ts }manifest_signature: jws
Stages are the pipeline’s spine; these dimensions are how we score output at each stage. Thresholds are profile-dependent; failure reasons are always machine-readable.
01
SHA-256 checksums verified against manifest. Missing or corrupted files hard-rejected.
02
Required fields present and correctly typed against the canonical contract definition.
03
Timestamps monotonically increasing. Clock offsets within ±100 ms threshold. Frame gaps flagged.
04
All declared streams have corresponding data. Episode count matches manifest.
05
Action vectors within declared joint limits. Action-proprio latency within profile threshold.
06
Consent field present. PII state matches jurisdiction policy. Blur tool version recorded.
Each stage reads the previous stage’s contract-conformant output and writes its own. Hard rejects return a machine-readable error at the CLI; soft rejects route to a review queue with a scored reason; release produces a signed manifest and nothing reaches training without one.
Accepts MP4 or MCAP/ROS 2 today; other containers stay on the roadmap. Manifest parsed and hashed to episode_id; optional URDF feeds embodiment_ref.
Decode, downsample to canonical resolution + fps, extract frames. Exposure and Laplacian-variance blur scored per clip. Monotonic clock + per-stream offsets verified.
Shipped demo uses deterministic black-band blur on sampled frames plus consent metadata on the manifest. Real EgoBlur + OCR is planned once signing + ingest are default-safe.
CPU enrichers that ship today: Depth Anything V2 (small ONNX), DIS optical flow, MediaPipe Tasks hands + HOI heuristics, LK trajectories, simulated proprio. SAM2 / MANO / SLAM / VLM are frozen until P0 ingest+export is boring.
Packages sampled metadata + proprio into `episode.yaml` / `episode_000.json`. Manual bounds JSON is recorded for future multi-episode slicing; motion/HOI auto-segmentation remains stubbed.
Hashes every artifact, validates the manifest against JSON Schema, then signs with Ed25519 (`ed25519-jws-v1`) or a simulated stamp for demos. `br-pipeline verify` reproduces the canonical digest offline.
MAJOR
Breaking schema changes. New required fields. Migration tool + deprecation window.
MINOR
Additive fields. Backward-compatible. No migration needed.
PATCH
Clarifications, typos, enum additions that don't affect validation.
Every release of the contract ships with a signed schema artifact (protobuf descriptor + JSON Schema), a migration script from the prior major, and linked conformance fixtures. Clients pin a version.