Detect-crime Miner — Recipe to Beat the King

Target element: manak0/Detect-crime on subnet 423 (open-source / public track).

Read ANALYSIS.md first — it documents the king's model (the manak0 baseline) and where the gap lives.

Current king of record (2026-05-04 leaderboard): hotkey 5CSeBY…tv9f, score 0.576. Crime is uncontested: there is no Detect-crime-winner HF repo, and the king's score is within rounding of the published baseline's overall_iou (0.597). Anybody who lands a modest improvement takes the throne.

Layout

crime_miner/
├── ANALYSIS.md            ← analysis of the king + scoring + constraints
├── README.md              ← this file
├── miner.py               ← deployable inference (multi-scale TTA + WBF + CLAHE)
├── chute_config.yml       ← chute resource spec (16 GB GPU, matches king's)
├── class_names.txt        ← target class order — DO NOT REORDER
└── training/
    ├── DATASET.md         ← dataset sources + pipeline (start here)
    ├── build_dataset.py   ← end-to-end builder: manako + Roboflow + COCO bat
    ├── poll_manako.py     ← background poller for in-domain frames + king's preds
    ├── train.py           ← two-stage YOLOv11 training (silver → clean fine-tune)
    ├── verify_dataset.py  ← QA over assembled YOLO dirs
    ├── export_onnx.py     ← export with NMS baked in -> [1, 300, 6]
    └── requirements.txt

What the miner does differently

miner.py keeps the king's I/O contract (single weights.onnx → TVFrameResult) but adds six concrete improvements over the auto-generated subnet_bridge template the king ships:

Letterboxed input at 1280 instead of stretch-resized 640. Small objects (balaclava ~30 px, glove ~25 px, spray paint can ~20 px) survive — the king's stretch resize destroys them. This alone lifts recall on the four catastrophic classes.
Per-class confidence floors. King uses one global 0.25 across all six classes; we set balaclava=0.05, bat=0.10, glove=0.05, graffiti=0.20, hoodie=0.20, spray paint=0.10. Synthetic-benchmark recalls were 0.034 / 0.143 / 0.064 / 0.321 / 0.274 / 0.161 — the bottleneck is recall, and the FFPI cap has plenty of headroom (~6.5 preds/img today).
Multi-scale TTA at {1280, 1536} × {orig, hflip} = 4 forward passes, collapsed to 2 when the ONNX export is static-shape. Pro_6000 has the budget (latency p95 = 10 s).
Weighted Box Fusion across TTA streams. WBF averages cluster boxes weighted by score, which yields tighter localizations than always picking the highest-confidence proposal — and tighter boxes mean more cases cross the IoU≥0.5 bar that the scorer uses.
CLAHE on dark frames only (luma gate). Crime CCTV is night-heavy. King applies no preprocessing.
Class-aware NMS at IoU=0.45. King uses class-agnostic NMS, which suppresses balaclava-on-hoodie or glove-near-bat overlaps. Class-aware keeps both.

Total ONNX inference cost on Pro_6000 with YOLOv11s + 2-scale TTA is well under 1 s/frame.

How to deploy

You need: a weights.onnx exported in [1, 300, 6] layout (NMS baked in) — produced by training/export_onnx.py after training, OR you can ship the king's raw ONNX directly to test the inference improvements alone.

Option 1 — drop-in test with the king's weights

Sanity-check that the inference improvements alone help, before training:

cp /root/turbovision_crime/king_models/Detect-crime/weights.onnx ./weights.onnx
python miner.py   # smoke test on /tmp/crime_proof.png

Expected: with the king's weights but our miner.py, you should already see a noticeable lift on the rare classes (recall driven up by the lower per-class conf floors and the 1280 input that the dynamic-shape ONNX accepts). The king's published ONNX is static at 640×640, so the dynamic letterbox path won't help unless you re-export — see below.

Option 2 — train a real beating model

See training/DATASET.md for full data-pipeline notes. Quick path:

cd training
pip install -r requirements.txt

# 1) Start the manako poller in the background to accumulate in-domain frames
#    (each rotation surfaces a fresh challenge ~every few minutes during active scoring).
python poll_manako.py --out ../manako_pool --interval 120 --forever &

# 2) Build the silver dataset. Combine manako frames (king-labeled), Roboflow
#    per-class detection sets, and (optional) COCO baseball bat. Roboflow needs
#    ROBOFLOW_API_KEY in env.
python build_dataset.py \
    --out ../data \
    --king-onnx /root/turbovision_crime/king_models/Detect-crime/weights.onnx \
    --manako --manako-polls 30 --manako-poll-delay 120 \
    --roboflow balaclava=brainster/balaclava-detection-v3 \
    --roboflow glove=ppe-detection/gloves-v1 \
    --roboflow graffiti=graffiti-detection/graffiti-v3 \
    --roboflow "spray paint=tools/spray-paint-can-v1" \
    --coco-bat /path/to/coco/instances_train2017.json /path/to/coco/train2017 \
    --extra-dir ../manako_pool/images \
    --min-conf 0.10 --keep-empty --intra-threads 16

# 3) Verify the assembled dataset
python verify_dataset.py --data ../data/data.yaml --visualize 20

# 4) Stage A: silver pretrain
python train.py --data ../data/data.yaml --weights yolo11s.pt \
                --imgsz 1280 --batch 16 --stage A --epochs 200 --name crime_a

# 5) Build a clean set: hand-verify (or LLM-verify) ~300 manako frames into
#    ../data_clean/data.yaml with the same YOLO layout.

# 6) Stage B: clean fine-tune
python train.py --data ../data_clean/data.yaml \
                --weights ../runs/detect/crime_a/weights/best.pt \
                --imgsz 1280 --batch 16 --stage B --epochs 50 --name crime_b

# 7) Export with NMS baked in -> [1, 300, 6]
python export_onnx.py --weights ../runs/detect/crime_b/weights/best.pt \
                      --imgsz 1280 --out ../weights.onnx

Option 3 — deploy via the turbovision CLI

cd /root/turbovision_crime
sv -vv deploy-os-miner --model-path scratch/crime_miner --element-id manak0/Detect-crime

The CLI uploads miner.py, weights.onnx, class_names.txt, chute_config.yml to your HF repo, builds the chute, and commits the on-chain pointer.

Tuning knobs (top of `miner.py`)

Constant	Default	Effect of raising	Effect of lowering
`PER_CLASS_CONF[0]` (balaclava)	0.05	fewer FPs (good for FFPI)	more recall (better AP, better IoU)
`PER_CLASS_CONF[2]` (glove)	0.05	as above	as above
`PER_CLASS_CONF[4]` (hoodie)	0.20	fewer hoodie FPs	more boxes (may hurt precision)
`TTA_SIZES`	(1280, 1536)	better small-object recall	faster inference
`WBF_IOU`	0.55	more conservative fusion	tighter clusters
`NMS_IOU`	0.45	keeps more near-duplicates	stricter dedup
`MAX_DET`	100	more boxes survive ranking	tighter cap
`CLAHE_DARK_THRESHOLD`	70	CLAHE on more frames	only the very dark ones

When tuning, validate against runs/detect/crime_b/val_batch*.jpg and the manako latest challenge image — don't hill-climb on the synthetic benchmark alone (it's only 50 frames).

Why these specific choices

The IoU pillar dominates the live score (dashboard 0.576 ≈ baseline overall_iou 0.597). IoU is the label-agnostic AUC-F1 placement metric — what matters most is whether any well-placed box exists for each GT. So the optimal strategy is to flood predictions for the rare classes; the FFPI cap (10 FP/image, currently ~6.5 preds/img baseline) gives generous headroom.
mAP@50 matters too because secondary pillars are likely weighted in. mAP@50 is per-class-averaged with strict label match. Raising recall on the four near-zero classes even modestly (0.03 → 0.20 on balaclava) lifts the per-class mean by ~0.03 alone.
WBF over hard NMS: tighter localizations → more boxes clearing the IoU≥0.5 bar.
Class-aware NMS: balaclava overlaps with hoodie geometry; bat overlaps with glove on a held bat. Class-agnostic NMS would silently kill one of each pair.
CLAHE only on dark frames: applying CLAHE to bright frames hurts hoodie/graffiti texture. Luma gate keeps it surgical.

Verifying you're actually beating the king

Before committing on-chain:

Pull the latest annotated challenge image+predictions:

curl -sL "https://console.scorevision.io/api/v2/elements/manak0%2FDetect-crime?lookback_days=7" \
  | jq '.latestAnnotatedChallenge'

Run your miner.py on that image; visually verify your boxes ≥ king's, especially on balaclava, glove, and spray paint.
Run sv -vv run-once (per MINER.md) to score yourself end-to-end on a real challenge without committing — confirms the chute deploys correctly and your output format matches.
Only after the offline score is repeatedly above 0.62 (the king + a comfortable margin) should you deploy and commit.

Open questions / pending work

Live pillar weights for Detect-crime — confirm by reading the active manifest with sv -vv elements list once .env is configured. The recipe above assumes IoU-dominated scoring; if mAP/precision/recall pillars are weighted higher, the per-class confidence floors should be raised (less recall, more precision).
Real GT vs SAM3 PGT — confirm whether elements[].ground_truth = true in the live manifest. If real GT (Manako-internal), the synthetic_fixed dataset on HF is the closest proxy and we should overfit it carefully. If SAM3 PGT, the live targets are whatever SAM3 detects when prompted with the 6 class names — slightly fuzzier.
Manako data pull — poll_manako.py is built but untested for Detect-crime. The endpoint shape is the same as petrol-station's; if Manako gates the API for low-traffic elements, fall back to using the king's ONNX as the silver labeler over Roboflow data.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support