punnerud's picture
Upload README.md with huggingface_hub
991fed2 verified
metadata
license: mit
tags:
  - object-detection
  - yolov8
  - grocery
  - retail
  - onnx
datasets:
  - custom
pipeline_tag: object-detection

NM i AI 2026 — NorgesGruppen Object Detection

Multi-class YOLOv8x detector for 356 grocery product categories on store shelf images.

Performance

Method Leaderboard Score
Multi-scale TTA (640+960+1280 + flip) 0.9230
Single inference 0.8922

Competition scoring:

Model Details

  • Architecture: YOLOv8x (68.5M parameters)
  • Classes: 356 grocery product categories
  • Training data: 248 shelf images, 22,731 COCO annotations
  • Training resolution: 1280px
  • Export format: ONNX (dynamic input, 262 MB)
  • Inference: Multi-scale TTA at 640/960/1280px with horizontal flip + WBF fusion

Training

  • Pretrained on COCO (YOLOv8x), fine-tuned on competition data
  • Optimizer: AdamW (lr=0.01, weight_decay=0.0005, cosine LR)
  • Augmentation: mosaic, mixup (0.2), copy-paste (0.15), perspective, rotation (±15°)
  • 300 epochs at 1280px, batch=2 on NVIDIA A100 40GB
  • Model soup: weight averaging of epochs 240-290 for better generalization

Submission Contents

contains:

  • — YOLOv8x model soup, dynamic input (262 MB)
  • — YOLO class → COCO category_id mapping
  • — Multi-scale TTA inference pipeline

Usage

Sandbox Environment

  • GPU: NVIDIA L4, 24 GB VRAM
  • Runtime: ~113s for test set (300s timeout)
  • Dependencies: onnxruntime-gpu, opencv, numpy, ensemble-boxes

Key Learnings

  1. Multi-class YOLO (detect + classify in one step) massively outperformed two-stage (detector + kNN classifier)
  2. Multi-scale TTA gave +0.031 improvement by better detecting small products
  3. Model soup (weight averaging) improves generalization
  4. Higher validation mAP does NOT predict better leaderboard score when training on all data
  5. Dynamic ONNX export required for multi-scale inference

License

MIT