metadata
license: mit
tags:
- object-detection
- yolov8
- grocery
- retail
- onnx
datasets:
- custom
pipeline_tag: object-detection
NM i AI 2026 — NorgesGruppen Object Detection
Multi-class YOLOv8x detector for 356 grocery product categories on store shelf images.
Performance
| Method | Leaderboard Score |
|---|---|
| Multi-scale TTA (640+960+1280 + flip) | 0.9230 |
| Single inference | 0.8922 |
Competition scoring:
Model Details
- Architecture: YOLOv8x (68.5M parameters)
- Classes: 356 grocery product categories
- Training data: 248 shelf images, 22,731 COCO annotations
- Training resolution: 1280px
- Export format: ONNX (dynamic input, 262 MB)
- Inference: Multi-scale TTA at 640/960/1280px with horizontal flip + WBF fusion
Training
- Pretrained on COCO (YOLOv8x), fine-tuned on competition data
- Optimizer: AdamW (lr=0.01, weight_decay=0.0005, cosine LR)
- Augmentation: mosaic, mixup (0.2), copy-paste (0.15), perspective, rotation (±15°)
- 300 epochs at 1280px, batch=2 on NVIDIA A100 40GB
- Model soup: weight averaging of epochs 240-290 for better generalization
Submission Contents
contains:
- — YOLOv8x model soup, dynamic input (262 MB)
- — YOLO class → COCO category_id mapping
- — Multi-scale TTA inference pipeline
Usage
Sandbox Environment
- GPU: NVIDIA L4, 24 GB VRAM
- Runtime: ~113s for test set (300s timeout)
- Dependencies: onnxruntime-gpu, opencv, numpy, ensemble-boxes
Key Learnings
- Multi-class YOLO (detect + classify in one step) massively outperformed two-stage (detector + kNN classifier)
- Multi-scale TTA gave +0.031 improvement by better detecting small products
- Model soup (weight averaging) improves generalization
- Higher validation mAP does NOT predict better leaderboard score when training on all data
- Dynamic ONNX export required for multi-scale inference
License
MIT