FathomNet-CLEF 2026: 4th Place Solution Models

πŸ† 4th / 102 teams (Top 3.9%) | 0.2808 mAP (Private LB) | ~$10 GPU Cost

Competition: FathomNetCLEF2026 - Positive-Unlabeled Object Detection in Marine Images Workshop: CVPR 2026 FGVC Workshop + LifeCLEF 2026 Solution Write-up: GitHub Repository


Intended Use

These models are designed for marine organism object detection in deep-sea imagery, specifically trained for the FathomNet-CLEF 2026 Positive-Unlabeled detection challenge. They detect 32 classes of marine organisms including fish, sponges, sea stars, urchins, corals, and other benthic fauna.

Use cases:

  • Marine biodiversity monitoring from ROV/AUV footage
  • Automated annotation of deep-sea imagery
  • Research on positive-unlabeled object detection methods
  • Benchmarking marine detection models

Limitations:

  • Trained on MBARI deep-sea footage; may not generalize to shallow-water or tropical reef imagery
  • 32 coarse-grained classes; does not provide species-level identification
  • Optimized for the PU setting where training annotations are incomplete

Model Description

Architecture

All YOLO models are based on MBARI-315k-YOLOv8x (68M params, 258 GFLOPs), pretrained on 315,000 marine images from MBARI, then fine-tuned through iterative pseudo-label self-training.

The DINOv3 decoder is a lightweight DETR-style Transformer decoder (7.8M params) trained on frozen DINOv3-7B features.

Training Strategy

Our solution combines four complementary strategies:

  1. Phase A: Iterative Pseudo-Label Self-Training β€” 4-round YOLO training with decreasing confidence thresholds (22K β†’ 77K annotations, 3.45x enrichment)
  2. Phase B: DINOv3-7B Unsupervised Object Discovery β€” TokenCut spectral clustering on 6.7B-param self-supervised ViT features (16,695 new objects discovered)
  3. Phase C: FathomNet API Database Enrichment β€” Expert annotations via find_by_uuid with 276-species hierarchical taxonomy mapping (+0.1517 mAP, single largest contribution)
  4. Phase D: Multi-Model TTA Ensemble & WBF Fusion β€” 3 models Γ— 3 scales Γ— flip augmentation

Models Included

File Description Params Size
yolo_r0_best.pt Round 0: Base MBARI-YOLOv8x fine-tuned on original GT (22K annotations) 68M 131MB
yolo_r1_best.pt Round 1: Self-trained with pseudo-labels (27K annotations) 68M 131MB
yolo_r2_best.pt Round 2: Self-trained with more pseudo-labels (39K annotations) 68M 131MB
yolo_r3_best.pt Round 3: Self-trained with all enriched data incl. DINOv3 discoveries (77K annotations) 68M 131MB
dinov3_decoder_best.pt DETR decoder on frozen DINOv3-7B features (experimental) 7.8M 31MB
mbari_315k_yolov8_pretrained.pt Original MBARI-315k pretrained YOLOv8x (starting checkpoint) 68M 132MB

Metrics

Final Submission Score

Metric Value
COCO mAP (IoU 0.5:0.95) β€” Private LB 0.2808
COCO mAP (IoU 0.5:0.95) β€” Public LB 0.2584
Ranking (Private LB) 4th / 102 teams

Public β†’ Private LB Shake-up: Our ranking jumped from #7 (0.2584) β†’ #4 (0.2808), a +0.0224 mAP gain. Among the top 10 public LB teams, most experienced a score drop on the private LB. Only 2 teams (including ours) improved, and we had the largest score gain. This demonstrates strong generalization β€” our FathomNet API-based taxonomy mapping and tiered confidence calibration were robust to unseen test data rather than overfitting to the public test split.

Note: In Kaggle competitions, the public leaderboard is calculated on a subset of test data (~30%) during the competition, while the private leaderboard uses the remaining ~70% and is only revealed after the competition ends. The private LB determines the final ranking.

Score Progression

Component mAP Delta
YOLO R0 baseline 0.0293 -
+ R1 self-training 0.0539 +0.0246
+ R2 self-training 0.0699 +0.0160
+ FathomNet API enrichment 0.2216 +0.1517
+ Tiered confidence calibration 0.2465 +0.0249
+ R3 + TTA ensemble + dedup 0.2584 +0.0119

Per-Round YOLO Training

Round Epochs Annotations Val mAP50
R0 40 22,225 (GT) 0.836
R1 45 27,217 0.924
R2 50 38,969 ~0.85
R3 40 76,747 -

Training and Evaluation Data

  • Training: 6,463 images with 22,225 annotations (32 classes), expanded to 76,747 via self-training
  • Test: 1,425 images with complete annotations
  • External data: FathomNet Database API (permitted by competition rules)
  • Base model pretraining: MBARI-315k dataset (315,000 marine images)

32 Detection Classes

anemone, barnacle, benthic worm, bony fish, brittle star, bryozoan, cnidarian, coral, crab, dead organism, echinoderm, feather star, gastropod, glass sponge, holothurian, hydromedusa, isopod, jelly, nudibranch, octopus, polychaete, sea cucumber, sea fan, sea pen, sea slug, sea spider, sea star, shrimp, snail, sponge, squat lobster, urchin


Deployment

Quick Start with Ultralytics

from ultralytics import YOLO

# Load any of the 4 trained rounds
model = YOLO("yolo_r1_best.pt")

# Run inference
results = model("deep_sea_image.jpg", imgsz=1280, conf=0.25)

# Visualize
results[0].show()

Multi-Model TTA Ensemble (competition inference)

from ultralytics import YOLO
from ensemble_boxes import weighted_boxes_fusion

models = [
    (YOLO("yolo_r0_best.pt"), 2.0),
    (YOLO("yolo_r1_best.pt"), 3.5),
    (YOLO("yolo_r2_best.pt"), 4.0),
]
scales = [1024, 1280, 1536]

# Run TTA inference for each model at each scale
# Then fuse with WBF (iou_thr=0.55)
# See full pipeline: https://github.com/fan1344rwere/fathomnet-clef-2026-4th-place

Requirements

torch>=2.0
ultralytics>=8.4
ensemble-boxes
fathomnet  # For API enrichment

Hardware & Cost

Resource Details
GPU NVIDIA RTX 5090 32GB
Total GPU hours ~23 hours
Total cost 70 CNY ($10 USD)

Citation

@misc{fathomnet2026_4th,
  title={FathomNetCLEF 2026: 4th Place Solution -- Database-Guided Taxonomy Mapping
         and Self-Supervised Object Discovery for Positive-Unlabeled Marine Detection},
  author={Xiuqi Fan},
  year={2026},
  howpublished={\url{https://github.com/fan1344rwere/fathomnet-clef-2026-4th-place}}
}

Acknowledgments

  • FathomNet & MBARI for open marine data and pretrained models
  • Meta AI for DINOv3-7B
  • Laura Chrobak & Kevin Barnard for organizing the competition
Downloads last month
48
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for Nihil-Drf/FathomNet-2026-4thplace-Model