LibreRFDETRn-sem (experimental)
⚠️ Experimental preview. This is LibreYOLO's first semantic-segmentation checkpoint. It is an early, under-trained model published to enable the
semantictask end-to-end — not a competitive/flagship weight.
RF-DETR nano semantic segmentation: a DINOv2-small backbone with a dense decoder, predicting one class per pixel over the 182 COCO-Stuff classes.
Results
| Metric | Value | Eval |
|---|---|---|
| mIoU | 0.215 | COCO-Stuff val (5,000 images) |
| pixel accuracy | 0.55 | COCO-Stuff val |
Single-scale, 518×518 input, 24.2M parameters. For honest context, modern ~14–28M-param specialists (e.g. SegNeXt-S/B) reach ~0.43–0.46 mIoU on this benchmark — this nano preview is roughly half that and is offered as a starting point, not a state-of-the-art result.
Usage
from libreyolo import LibreYOLO
model = LibreYOLO("LibreRFDETRn-sem.pt")
result = model("image.jpg")
result.semantic_mask # (H, W) per-pixel class IDs
result.semantic_mask.plot() # colormapped overlay
Training
- Data: COCO-Stuff 164k (118k train / 5k val). Annotations are CC BY 4.0; the images follow the COCO terms. Mask pixel values are the COCO-Stuff label id minus one (0–181); 255 is unlabeled/ignore.
- Recipe: transfer from the detection-pretrained DINOv2 backbone, AdamW, cross-entropy + Lovász-Softmax (IoU-surrogate) loss, EMA.
License
- Code: MIT (LibreYOLO).
- Weights: MIT. Trained on CC BY 4.0 annotations + COCO images; usable commercially. Please cite COCO and COCO-Stuff.
Limitations
Common classes (sky, road, person, buildings) segment reasonably; rare "stuff" classes are weak, which dominates the mIoU gap. Not recommended for production semantic segmentation yet.