Product

Faster release workflow for robotics data operations.

Black Robotics is the data infrastructure layer between source intake and training-ready release — so aggregators and model teams ship with less handoff friction and zero rebuilt plumbing.

01 / modules

Six pipeline stages, one contract.

Raw head-cam footage or MCAP enters at the top; a schema-gated `episode.yaml`, CPU enrichments, and a signed manifest leave at the bottom. Optional LeRobot exports are additive, not magic buyer coverage.

Ingest gateway

Shipped path: MP4 plus an MCAP/ROS 2 bridge via `rosbags` (sensor_msgs/Image → H.264). Other containers (HDF5, VRS, SVO, WebDataset) stay roadmap until design partners pull them forward.

Preprocess & QC

Decodes the working MP4, samples frames for QC, scores exposure + Laplacian blur, and records thumbnails before enrichment.

Privacy & consent

Today: deterministic black-band simulation with full-frame coverage on sampled thumbnails plus consent metadata on the manifest. EgoBlur Gen 2 + OCR redaction is the planned upgrade path.

Enrichment workers

CPU stack that ships today: Depth Anything V2 (ONNX), DIS optical flow, MediaPipe Tasks hands + HOI heuristics, LK point trajectories, and a simulated proprio stream. SAM2 / MANO / SLAM / VLM captioning are explicitly frozen until the ingest+export path is boringly reliable.

Episodeize & exports

Emits `episode.yaml` + `episodes/episode_000.json`, optional manual episode bounds metadata, and an optional LeRobot v3-shaped Parquet export — not a full training SKU marketplace.

Signed release registry

Validates against `br-ingest-v0`, hashes artifacts, and signs `release_manifest.json` with either a simulated stamp or Ed25519 (`br-pipeline verify --pubkey …`).

02 / acceptance profiles

Pick the acceptance profile. We enforce it.

Profiles stack. Production inherits Base. Compliance Enhanced inherits Production.

Acceptance profile · Base

Schema and integrity — the minimum viable release.

Validates that every episode has the required shape before anything else. Hard reject on structural failure.

  • All required manifest fields present and correctly typed
  • Clock timebase monotonically increasing
  • Capture device serial on the allowlist
  • File checksums match manifest SHA-256 hashes
03 / data lifecycle

The path every episode takes. Three lanes. No exceptions.

Source lane

  • Head-cam MP4 · MCAP / rosbag2
  • Collector manifest + SHA-256 episode id
  • Optional URDF → embodiment_ref
  • Local or S3/GCS upload hooks (content-addressed keys)

Processing lane

  • Decode + normalize
  • Exposure + blur scoring (+ optional checkerboard probe)
  • Privacy pass (simulated blur today)
  • Depth + flow + hands + trajectories (CPU)
  • Episodeize + schema gate

Release lane

  • Validated episode.yaml
  • Optional LeRobot v3 export
  • Signed release manifest (Ed25519 or sim)
04 / integration

Drop into your stack. No re-architecture required.

Uploader API

Point your existing POST endpoint at the BR ingest gateway. Contract validation and content-addressed episode IDs come back in the response.

Batch pipelines

Trigger normalization and conformance runs from existing orchestration (Airflow, Dagster, Step Functions, GitHub Actions).

Release webhooks

Subscribe training orchestrators to signed release events. Buyers get a verifiable manifest, not a Slack ping.