rskill-omdet-turbo-indoor
OpenRAL rSkill β OmDet-Turbo (Swin-tiny) packaged as an in-process, Apache-2.0 open-vocabulary object detector run over a fixed curated indoor vocabulary (~266 household / kitchen / office / manipulation classes). It is an unprompted background perception producer: it streams
ObjectsMetadatato/openral/perception/objectsevery frame, giving the world model far more object classes than the 80 COCO categories β without any reasoner prompting. No actuators.
This package wraps hf://omlab/omdet-turbo-swin-tiny-hf with a rskill.yaml
manifest that adds the fixed-vocabulary detector contract, capability checking,
license surfacing, and latency budgets. It does not copy model weights.
What this skill does
Detects objects from a fixed curated indoor vocabulary in every RGB camera
frame and publishes 2D detections (ObjectsMetadata) on the perception bus. It
emits no action chunks, drives no actuators, and has no proprioception
contract β a pure ADR-0037 perception producer. Because the class list is fixed
(not query-driven), it behaves like a large closed-vocabulary detector: the
reasoner does not retarget it.
| Field | Value |
|---|---|
| Actions | detect |
| Objects | open-vocabulary indoor objects β kitchenware, tableware, appliances, furniture, tools, containers (~266 classes) |
| Scenes | tabletop, kitchen, indoor, household, office, bathroom |
| Embodiment | embodiment-agnostic (any RGB camera β₯ 640Γ480) |
How it works
OmDet-Turbo is a real-time, transformer-based open-vocabulary detector
(AutoModelForZeroShotObjectDetection). Unlike locateanything-3b-nf4 β a heavy
VLM pinned to transformers==4.57.1 that must run out-of-process in a sidecar β
OmDet-Turbo is a first-class transformers architecture that loads under the
OpenRAL runtime's own transformers>=5. It therefore runs in process: no
sidecar venv, no ZMQ.
The OpenRAL backend
(OmDetTurboDetector)
loads the processor + model on first detect(), moves the model to CUDA when
available (CPU fallback otherwise), and runs the manifest's fixed labels
vocabulary against each frame via processor.post_process_grounded_object_detection.
It is selected as DetectorTier.ZEROSHOT_HF by build_manifest_detector for
manifests whose detector.engine is zeroshot_hf, and consumes the same
system-memory BGR camera-tee branch as the CPU ONNX and VLM-sidecar tiers
(ADR-0037 2026-06-12 amendment).
Observation β action contract
| Direction | Key | Shape | Notes |
|---|---|---|---|
| in | any RGB camera | (H, W, 3) BGR uint8 |
system-memory frame from the camera tee; min 640Γ480. vla_feature_key is intentionally omitted |
| out | ObjectsMetadata |
list of ObjectDetection2D |
(label, confidence, bbox_xyxy) per detection on /openral/perception/objects; no action chunk |
Upstream model and training
This rSkill is a thin wrapper around the upstream Apache-2.0 OmDet-Turbo checkpoint; the weights live upstream and are not copied here.
| Field | Value |
|---|---|
| Source repo | omlab/omdet-turbo-swin-tiny-hf |
| Base model | OmDet-Turbo, Swin-tiny backbone |
| Paper | arxiv:2403.06892 β Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head |
| License | apache-2.0 (commercial use permitted) |
| Parameters | ~115 M |
| Training data | upstream: Objects365 / GoldG and grounding data per the OmDet-Turbo release |
Supported robots
This detector is embodiment-agnostic β the only requirement is an RGB camera
stream. All in-tree embodiment tags are declared in rskill.yaml.
| Robot | Embodiment tag | Status | Notes |
|---|---|---|---|
| any with an RGB camera | franka_panda, so100_follower, aloha, β¦ |
β‘ experimental | camera-only; see rskill.yaml::embodiment_tags for the full list |
Sensors required
Mirrors rskill.yaml::sensors_required.
| Key | Modality | Min resolution | Format |
|---|---|---|---|
| any RGB camera | RGB | 640 Γ 480 | uint8 BGR frame |
Manifest summary
| Field | Value |
|---|---|
name |
OpenRAL/rskill-omdet-turbo-indoor |
version |
0.1.0 |
license |
apache-2.0 |
role / kind |
s1 / detector |
embodiment_tags |
all in-tree embodiments (camera-only) |
runtime / quantization.dtype |
pytorch / fp16 |
detector.engine |
zeroshot_hf (in-process Transformers zero-shot) |
weights_uri |
hf://omlab/omdet-turbo-swin-tiny-hf |
latency_budget.per_chunk_ms |
200 ms |
commercial_use_allowed |
yes (Apache-2.0 weights) |
Full schema: openral_core.schemas.RSkillManifest.
Quick start
uv sync --group omdet # torch + transformers for the in-process backend
from openral_core.schemas import RSkillManifest, DetectorEngine
manifest = RSkillManifest.from_yaml("rskills/omdet-turbo-indoor/rskill.yaml")
assert manifest.kind == "detector"
assert manifest.detector.engine is DetectorEngine.ZEROSHOT_HF
print(len(manifest.detector.labels), "fixed indoor classes")
Run it on the camera tee in sim (publishes ObjectsMetadata every frame, no
ONNX file and no prompting):
openral deploy sim \
--object-detector-manifest rskills/omdet-turbo-indoor/rskill.yaml
Reproduction
This is a packaging-only wrapper β there are no trained numbers to reproduce. To validate the wiring (manifest + in-process dispatch) without a GPU:
just bootstrap && uv sync --all-packages
uv run pytest tests/unit/test_omdet_turbo_detector.py
The GPU-gated end-to-end test
(test_e2e_detects_indoor_objects_on_coco_sample) loads the real Apache-2.0
weights and grounds indoor classes on the coco_sample.jpg fixture; it skips on
GPU-less hosts (the legitimate CI skip path, CLAUDE.md Β§12).
Evaluation
No benchmarks shipped β packaging-only wrapper; see CLAUDE.md Β§6.4.
License
This rSkill package (rskill.yaml, README.md) is apache-2.0. The wrapped
weights at hf://omlab/omdet-turbo-swin-tiny-hf are also released under
apache-2.0, so the detector is fully commercial-safe (CLAUDE.md Β§1.9) β
unlike the NVIDIA non-commercial locateanything-3b-nf4 open-vocab detector.
See also
rskills/locateanything-3b-nf4/β query-driven open-vocab detector (NVIDIA non-commercial;VLM_SIDECARtier).rskills/rtdetr-coco-r18/β fixed 80-class COCO RT-DETR detector (ONNX tier).docs/adr/0037-gstreamer-perception-bus-object-detection.mdβ detector kind + tier contract.- CLAUDE.md Β§6.4 β rSkill packaging contract.
Model tree for OpenRAL/rskill-omdet-turbo-indoor
Base model
omlab/omdet-turbo-swin-tiny-hf