Ingest gateway
Shipped path: MP4 plus an MCAP/ROS 2 bridge via `rosbags` (sensor_msgs/Image → H.264). Other containers (HDF5, VRS, SVO, WebDataset) stay roadmap until design partners pull them forward.
Black Robotics is the data infrastructure layer between source intake and training-ready release — so aggregators and model teams ship with less handoff friction and zero rebuilt plumbing.
Raw head-cam footage or MCAP enters at the top; a schema-gated `episode.yaml`, CPU enrichments, and a signed manifest leave at the bottom. Optional LeRobot exports are additive, not magic buyer coverage.
Shipped path: MP4 plus an MCAP/ROS 2 bridge via `rosbags` (sensor_msgs/Image → H.264). Other containers (HDF5, VRS, SVO, WebDataset) stay roadmap until design partners pull them forward.
Decodes the working MP4, samples frames for QC, scores exposure + Laplacian blur, and records thumbnails before enrichment.
Today: deterministic black-band simulation with full-frame coverage on sampled thumbnails plus consent metadata on the manifest. EgoBlur Gen 2 + OCR redaction is the planned upgrade path.
CPU stack that ships today: Depth Anything V2 (ONNX), DIS optical flow, MediaPipe Tasks hands + HOI heuristics, LK point trajectories, and a simulated proprio stream. SAM2 / MANO / SLAM / VLM captioning are explicitly frozen until the ingest+export path is boringly reliable.
Emits `episode.yaml` + `episodes/episode_000.json`, optional manual episode bounds metadata, and an optional LeRobot v3-shaped Parquet export — not a full training SKU marketplace.
Validates against `br-ingest-v0`, hashes artifacts, and signs `release_manifest.json` with either a simulated stamp or Ed25519 (`br-pipeline verify --pubkey …`).
Profiles stack. Production inherits Base. Compliance Enhanced inherits Production.
Acceptance profile · Base
Validates that every episode has the required shape before anything else. Hard reject on structural failure.
Source lane
Processing lane
Release lane
Point your existing POST endpoint at the BR ingest gateway. Contract validation and content-addressed episode IDs come back in the response.
Trigger normalization and conformance runs from existing orchestration (Airflow, Dagster, Step Functions, GitHub Actions).
Subscribe training orchestrators to signed release events. Buyers get a verifiable manifest, not a Slack ping.