LibreRFDETRn-sem (experimental)

⚠️ Experimental preview. This is LibreYOLO's first semantic-segmentation checkpoint. It is an early, under-trained model published to enable the semantic task end-to-end — not a competitive/flagship weight.

RF-DETR nano semantic segmentation: a DINOv2-small backbone with a dense decoder, predicting one class per pixel over the 182 COCO-Stuff classes.

Results

Metric	Value	Eval
mIoU	0.215	COCO-Stuff val (5,000 images)
pixel accuracy	0.55	COCO-Stuff val

Single-scale, 518×518 input, 24.2M parameters. For honest context, modern ~14–28M-param specialists (e.g. SegNeXt-S/B) reach ~0.43–0.46 mIoU on this benchmark — this nano preview is roughly half that and is offered as a starting point, not a state-of-the-art result.

Usage

from libreyolo import LibreYOLO

model = LibreYOLO("LibreRFDETRn-sem.pt")
result = model("image.jpg")
result.semantic_mask          # (H, W) per-pixel class IDs
result.semantic_mask.plot()   # colormapped overlay

Training

Data: COCO-Stuff 164k (118k train / 5k val). Annotations are CC BY 4.0; the images follow the COCO terms. Mask pixel values are the COCO-Stuff label id minus one (0–181); 255 is unlabeled/ignore.
Recipe: transfer from the detection-pretrained DINOv2 backbone, AdamW, cross-entropy + Lovász-Softmax (IoU-surrogate) loss, EMA.

License

Code: MIT (LibreYOLO).
Weights: MIT. Trained on CC BY 4.0 annotations + COCO images; usable commercially. Please cite COCO and COCO-Stuff.

Limitations

Common classes (sky, road, person, buildings) segment reasonably; rare "stuff" classes are weak, which dominates the mIoU gap. Not recommended for production semantic segmentation yet.

Downloads last month: -; Downloads are not tracked for this model. How to track