BR · ingest-to-episode contract v0

The schema. The stages. The signed release.

One manifest per episode, hashed to the episode id. The same contract governs every stage — ingest, preprocess, privacy, enrichment, episodeize, release. Gates fail loud. Releases are signed.

01 / schema

The ingest contract v0.

Every top-level block carries required and optional fields with explicit types. Hashing the manifest yields the episode id.

Top-level blocks

identity

dataset, episode, operator, robot embodiment

time

monotonic timebase + per-stream clock offsets

streams

rgb, depth, proprio, action — each with shape + fps

calibration

intrinsics + extrinsics + quality_score per camera

task

natural-language instruction, sub-tasks, success label

consent

jurisdiction, PII state, blur tool version

quality

motion blur, exposure, drift, operator tier

lineage

merkle chain + signed JWS release manifest

episode.yaml·BR Ingest Contract v0
# Black Robotics Ingest Contract v0
# Hashing the manifest yields episode_id. Missing required fields → hard reject.
# Identity
dataset_id: uuid
episode_id: hash(manifest)
operator_id: uuid
robot:
serial: str
embodiment_ref: hash(urdf+srdf)
firmware_version: semver
capture_device:
kind: robot | aria | mocap | handheld
serial: str
firmware_version: semver
# Time
episode_start_utc: iso8601
episode_end_utc: iso8601
timebase: monotonic | wall
clock_offsets_ns:
rgb_head: int
depth_head: int
imu: int
# Sensors — one block per stream
streams:
rgb_head:
type: rgb
resolution: [1280, 720]
fps: 30
shutter: global | rolling
exposure_mode: auto | fixed
frame_uri: str
depth_head:
type: depth
resolution: [1280, 720]
fps: 30
aligned_to: rgb_head
frame_uri: str
proprio:
type: proprioception
joint_names: [...]
units: rad | m | N
hz: 200
action:
type: action
space: delta_joint | abs_joint | ee_pose | vla_tokens
joint_mapping: [...]
dt_ms: 33
# Calibration (required per camera)
calibration:
rgb_head:
intrinsics: { fx, fy, cx, cy, distortion }
extrinsics_ref_frame: head_link
extrinsics_SE3: [...]
quality_score: 0.0
source: checkerboard | apriltag | vio
timestamp: iso8601
# Task
task:
nl_instruction: "pick up the red cup and place it on the shelf"
ontology_ref: uri
subtasks:
- { start_ts, end_ts, label }
success: true
failure_mode: null
# Consent / Privacy
consent:
environment: lab | enterprise_consented | public | private_home
pii_state: raw | blurred | reviewed
blur_tool: egoblur_gen2 | none
blur_tool_version: semver
jurisdiction: us | eu | uk
retention_policy_ref: uri
# Quality
quality:
motion_blur_score: 0.0
exposure_health: 0.0
dropped_frames: int
operator_tier: novice | trained | expert
sensor_drift_flags: [...]
# Lineage — Merkle chain, signed at release
lineage:
parent_refs: [...]
processing_steps:
- { tool, version, input_hash, output_hash, ts }
manifest_signature: jws
02 / validation categories

Six quality dimensions. Cross-cut every stage.

Stages are the pipeline’s spine; these dimensions are how we score output at each stage. Thresholds are profile-dependent; failure reasons are always machine-readable.

01

Integrity

SHA-256 checksums verified against manifest. Missing or corrupted files hard-rejected.

02

Schema

Required fields present and correctly typed against the canonical contract definition.

03

Temporal

Timestamps monotonically increasing. Clock offsets within ±100 ms threshold. Frame gaps flagged.

04

Completeness

All declared streams have corresponding data. Episode count matches manifest.

05

Action validity

Action vectors within declared joint limits. Action-proprio latency within profile threshold.

06

Privacy

Consent field present. PII state matches jurisdiction policy. Blur tool version recorded.

03 / pipeline stages

Six stages, in order, on every episode.

Each stage reads the previous stage’s contract-conformant output and writes its own. Hard rejects return a machine-readable error at the CLI; soft rejects route to a review queue with a scored reason; release produces a signed manifest and nothing reaches training without one.

01

Ingest

Accepts MP4 or MCAP/ROS 2 today; other containers stay on the roadmap. Manifest parsed and hashed to episode_id; optional URDF feeds embodiment_ref.

Hard reject
02

Preprocess

Decode, downsample to canonical resolution + fps, extract frames. Exposure and Laplacian-variance blur scored per clip. Monotonic clock + per-stream offsets verified.

QC
03

Privacy

Shipped demo uses deterministic black-band blur on sampled frames plus consent metadata on the manifest. Real EgoBlur + OCR is planned once signing + ingest are default-safe.

Hard reject (EU/UK)
04

Enrichment

CPU enrichers that ship today: Depth Anything V2 (small ONNX), DIS optical flow, MediaPipe Tasks hands + HOI heuristics, LK trajectories, simulated proprio. SAM2 / MANO / SLAM / VLM are frozen until P0 ingest+export is boring.

Workers
05

Episodeize

Packages sampled metadata + proprio into `episode.yaml` / `episode_000.json`. Manual bounds JSON is recorded for future multi-episode slicing; motion/HOI auto-segmentation remains stubbed.

Package
06

Release

Hashes every artifact, validates the manifest against JSON Schema, then signs with Ed25519 (`ed25519-jws-v1`) or a simulated stamp for demos. `br-pipeline verify` reproduces the canonical digest offline.

Signed
04 / formats

Format support matrix.

FormatStageStatusNotes
MP4 (head-cam / demo)InputshippedPrimary path today; OpenCV decode + duration gate.
MCAP / rosbag2InputshippedROS 2 sensor_msgs/Image topics via `rosbags` → H.264 bridge.
LeRobot v3 exportOutputshippedOptional `br-pipeline --export lerobot_v3` Parquet + meta tree.
RLDS / TFRecordOutputplannedNot implemented in the open CLI until ingest is proven with partners.
HDF5 / WebDatasetInputplannedFrozen per skip-list; file issues if your buyer mandates it.
ZED SVO / Aria VRSInputplannedAdapter backlog — ask before assuming support.
OpenLineageOutputplannedCatalog hooks only after release signing is table stakes.
05 / versioning

SemVer. Migrations. Signed schemas.

MAJOR

Breaking schema changes. New required fields. Migration tool + deprecation window.

MINOR

Additive fields. Backward-compatible. No migration needed.

PATCH

Clarifications, typos, enum additions that don't affect validation.

Every release of the contract ships with a signed schema artifact (protobuf descriptor + JSON Schema), a migration script from the prior major, and linked conformance fixtures. Clients pin a version.

research preview

The schema is public. The CLI is private. For now.