BR · ingest-to-episode contract v0

The schema. The stages. The signed release.

One manifest per episode, hashed to the episode id. The same contract governs every stage — ingest, preprocess, privacy, enrichment, episodeize, release. Gates fail loud. Releases are signed.

Talk to the team Jump to schema

01 / schema

The ingest contract v0.

Every top-level block carries required and optional fields with explicit types. Hashing the manifest yields the episode id.

Top-level blocks

identity

dataset, episode, operator, robot embodiment

time

monotonic timebase + per-stream clock offsets

streams

rgb, depth, proprio, action — each with shape + fps

calibration

intrinsics + extrinsics + quality_score per camera

task

natural-language instruction, sub-tasks, success label

consent

jurisdiction, PII state, blur tool version

quality

motion blur, exposure, drift, operator tier

lineage

merkle chain + signed JWS release manifest

episode.yaml·BR Ingest Contract v0

# Black Robotics Ingest Contract v0
# Hashing the manifest yields episode_id. Missing required fields → hard reject.
 
# Identity
dataset_id: uuid
episode_id: hash(manifest)
operator_id: uuid
 
robot:
  serial: str
  embodiment_ref: hash(urdf+srdf)
  firmware_version: semver
 
capture_device:
  kind: robot | aria | mocap | handheld
  serial: str
  firmware_version: semver
 
# Time
episode_start_utc: iso8601
episode_end_utc: iso8601
timebase: monotonic | wall
clock_offsets_ns:
  rgb_head: int
  depth_head: int
  imu: int
 
# Sensors — one block per stream
streams:
  rgb_head:
    type: rgb
    resolution: [1280, 720]
    fps: 30
    shutter: global | rolling
    exposure_mode: auto | fixed
    frame_uri: str
  depth_head:
    type: depth
    resolution: [1280, 720]
    fps: 30
    aligned_to: rgb_head
    frame_uri: str
  proprio:
    type: proprioception
    joint_names: [...]
    units: rad | m | N
    hz: 200
  action:
    type: action
    space: delta_joint | abs_joint | ee_pose | vla_tokens
    joint_mapping: [...]
    dt_ms: 33
 
# Calibration (required per camera)
calibration:
  rgb_head:
    intrinsics: { fx, fy, cx, cy, distortion }
    extrinsics_ref_frame: head_link
    extrinsics_SE3: [...]
    quality_score: 0.0
    source: checkerboard | apriltag | vio
    timestamp: iso8601
 
# Task
task:
  nl_instruction: "pick up the red cup and place it on the shelf"
  ontology_ref: uri
  subtasks:
    - { start_ts, end_ts, label }
  success: true
  failure_mode: null
 
# Consent / Privacy
consent:
  environment: lab | enterprise_consented | public | private_home
  pii_state: raw | blurred | reviewed
  blur_tool: egoblur_gen2 | none
  blur_tool_version: semver
  jurisdiction: us | eu | uk
  retention_policy_ref: uri
 
# Quality
quality:
  motion_blur_score: 0.0
  exposure_health: 0.0
  dropped_frames: int
  operator_tier: novice | trained | expert
  sensor_drift_flags: [...]
 
# Lineage — Merkle chain, signed at release
lineage:
  parent_refs: [...]
  processing_steps:
    - { tool, version, input_hash, output_hash, ts }
  manifest_signature: jws

02 / validation categories

Six quality dimensions. Cross-cut every stage.

Stages are the pipeline’s spine; these dimensions are how we score output at each stage. Thresholds are profile-dependent; failure reasons are always machine-readable.

Integrity

SHA-256 checksums verified against manifest. Missing or corrupted files hard-rejected.

Schema

Required fields present and correctly typed against the canonical contract definition.

Temporal

Timestamps monotonically increasing. Clock offsets within ±100 ms threshold. Frame gaps flagged.

Completeness

All declared streams have corresponding data. Episode count matches manifest.

Action validity

Action vectors within declared joint limits. Action-proprio latency within profile threshold.

Privacy

Consent field present. PII state matches jurisdiction policy. Blur tool version recorded.

03 / pipeline stages

Six stages, in order, on every episode.

Each stage reads the previous stage’s contract-conformant output and writes its own. Hard rejects return a machine-readable error at the CLI; soft rejects route to a review queue with a scored reason; release produces a signed manifest and nothing reaches training without one.

Ingest

Accepts MP4 or MCAP/ROS 2 today; other containers stay on the roadmap. Manifest parsed and hashed to episode_id; optional URDF feeds embodiment_ref.

Hard reject

Preprocess

Decode, downsample to canonical resolution + fps, extract frames. Exposure and Laplacian-variance blur scored per clip. Monotonic clock + per-stream offsets verified.

Privacy

Shipped demo uses deterministic black-band blur on sampled frames plus consent metadata on the manifest. Real EgoBlur + OCR is planned once signing + ingest are default-safe.

Hard reject (EU/UK)

Enrichment

CPU enrichers that ship today: Depth Anything V2 (small ONNX), DIS optical flow, MediaPipe Tasks hands + HOI heuristics, LK trajectories, simulated proprio. SAM2 / MANO / SLAM / VLM are frozen until P0 ingest+export is boring.

Workers

Episodeize

Packages sampled metadata + proprio into `episode.yaml` / `episode_000.json`. Manual bounds JSON is recorded for future multi-episode slicing; motion/HOI auto-segmentation remains stubbed.

Package

Release

Hashes every artifact, validates the manifest against JSON Schema, then signs with Ed25519 (`ed25519-jws-v1`) or a simulated stamp for demos. `br-pipeline verify` reproduces the canonical digest offline.

Signed

04 / formats

Format support matrix.

FormatStageStatusNotes

MP4 (head-cam / demo)InputshippedPrimary path today; OpenCV decode + duration gate.

MCAP / rosbag2InputshippedROS 2 sensor_msgs/Image topics via `rosbags` → H.264 bridge.

LeRobot v3 exportOutputshippedOptional `br-pipeline --export lerobot_v3` Parquet + meta tree.

RLDS / TFRecordOutputplannedNot implemented in the open CLI until ingest is proven with partners.

HDF5 / WebDatasetInputplannedFrozen per skip-list; file issues if your buyer mandates it.

ZED SVO / Aria VRSInputplannedAdapter backlog — ask before assuming support.

OpenLineageOutputplannedCatalog hooks only after release signing is table stakes.

05 / versioning

SemVer. Migrations. Signed schemas.

MAJOR

Breaking schema changes. New required fields. Migration tool + deprecation window.

MINOR

Additive fields. Backward-compatible. No migration needed.

PATCH

Clarifications, typos, enum additions that don't affect validation.

Every release of the contract ships with a signed schema artifact (protobuf descriptor + JSON Schema), a migration script from the prior major, and linked conformance fixtures. Clients pin a version.

research preview

The schema is public. The CLI is private. For now.

Talk to the team