FathomNet-CLEF 2026: 4th Place Solution Models

🏆 4th / 102 teams (Top 3.9%) | 0.2808 mAP (Private LB) | ~$10 GPU Cost

Competition: FathomNetCLEF2026 - Positive-Unlabeled Object Detection in Marine Images Workshop: CVPR 2026 FGVC Workshop + LifeCLEF 2026 Solution Write-up: GitHub Repository

Intended Use

These models are designed for marine organism object detection in deep-sea imagery, specifically trained for the FathomNet-CLEF 2026 Positive-Unlabeled detection challenge. They detect 32 classes of marine organisms including fish, sponges, sea stars, urchins, corals, and other benthic fauna.

Use cases:

Marine biodiversity monitoring from ROV/AUV footage
Automated annotation of deep-sea imagery
Research on positive-unlabeled object detection methods
Benchmarking marine detection models

Limitations:

Trained on MBARI deep-sea footage; may not generalize to shallow-water or tropical reef imagery
32 coarse-grained classes; does not provide species-level identification
Optimized for the PU setting where training annotations are incomplete

Model Description

Architecture

All YOLO models are based on MBARI-315k-YOLOv8x (68M params, 258 GFLOPs), pretrained on 315,000 marine images from MBARI, then fine-tuned through iterative pseudo-label self-training.

The DINOv3 decoder is a lightweight DETR-style Transformer decoder (7.8M params) trained on frozen DINOv3-7B features.

Training Strategy

Our solution combines four complementary strategies:

Phase A: Iterative Pseudo-Label Self-Training — 4-round YOLO training with decreasing confidence thresholds (22K → 77K annotations, 3.45x enrichment)
Phase B: DINOv3-7B Unsupervised Object Discovery — TokenCut spectral clustering on 6.7B-param self-supervised ViT features (16,695 new objects discovered)
Phase C: FathomNet API Database Enrichment — Expert annotations via find_by_uuid with 276-species hierarchical taxonomy mapping (+0.1517 mAP, single largest contribution)
Phase D: Multi-Model TTA Ensemble & WBF Fusion — 3 models × 3 scales × flip augmentation

Models Included

File	Description	Params	Size
`yolo_r0_best.pt`	Round 0: Base MBARI-YOLOv8x fine-tuned on original GT (22K annotations)	68M	131MB
`yolo_r1_best.pt`	Round 1: Self-trained with pseudo-labels (27K annotations)	68M	131MB
`yolo_r2_best.pt`	Round 2: Self-trained with more pseudo-labels (39K annotations)	68M	131MB
`yolo_r3_best.pt`	Round 3: Self-trained with all enriched data incl. DINOv3 discoveries (77K annotations)	68M	131MB
`dinov3_decoder_best.pt`	DETR decoder on frozen DINOv3-7B features (experimental)	7.8M	31MB
`mbari_315k_yolov8_pretrained.pt`	Original MBARI-315k pretrained YOLOv8x (starting checkpoint)	68M	132MB

Metrics

Final Submission Score

Metric	Value
COCO mAP (IoU 0.5:0.95) — Private LB	0.2808
COCO mAP (IoU 0.5:0.95) — Public LB	0.2584
Ranking (Private LB)	4th / 102 teams

Public → Private LB Shake-up: Our ranking jumped from #7 (0.2584) → #4 (0.2808), a +0.0224 mAP gain. Among the top 10 public LB teams, most experienced a score drop on the private LB. Only 2 teams (including ours) improved, and we had the largest score gain. This demonstrates strong generalization — our FathomNet API-based taxonomy mapping and tiered confidence calibration were robust to unseen test data rather than overfitting to the public test split.

Note: In Kaggle competitions, the public leaderboard is calculated on a subset of test data (~30%) during the competition, while the private leaderboard uses the remaining ~70% and is only revealed after the competition ends. The private LB determines the final ranking.

Score Progression

Component	mAP	Delta
YOLO R0 baseline	0.0293	-
+ R1 self-training	0.0539	+0.0246
+ R2 self-training	0.0699	+0.0160
+ FathomNet API enrichment	0.2216	+0.1517
+ Tiered confidence calibration	0.2465	+0.0249
+ R3 + TTA ensemble + dedup	0.2584	+0.0119

Per-Round YOLO Training

Round	Epochs	Annotations	Val mAP50
R0	40	22,225 (GT)	0.836
R1	45	27,217	0.924
R2	50	38,969	~0.85
R3	40	76,747	-

Training and Evaluation Data

Training: 6,463 images with 22,225 annotations (32 classes), expanded to 76,747 via self-training
Test: 1,425 images with complete annotations
External data: FathomNet Database API (permitted by competition rules)
Base model pretraining: MBARI-315k dataset (315,000 marine images)

32 Detection Classes

anemone, barnacle, benthic worm, bony fish, brittle star, bryozoan, cnidarian, coral, crab, dead organism, echinoderm, feather star, gastropod, glass sponge, holothurian, hydromedusa, isopod, jelly, nudibranch, octopus, polychaete, sea cucumber, sea fan, sea pen, sea slug, sea spider, sea star, shrimp, snail, sponge, squat lobster, urchin

Deployment

Quick Start with Ultralytics

from ultralytics import YOLO

# Load any of the 4 trained rounds
model = YOLO("yolo_r1_best.pt")

# Run inference
results = model("deep_sea_image.jpg", imgsz=1280, conf=0.25)

# Visualize
results[0].show()

Multi-Model TTA Ensemble (competition inference)

from ultralytics import YOLO
from ensemble_boxes import weighted_boxes_fusion

models = [
    (YOLO("yolo_r0_best.pt"), 2.0),
    (YOLO("yolo_r1_best.pt"), 3.5),
    (YOLO("yolo_r2_best.pt"), 4.0),
]
scales = [1024, 1280, 1536]

# Run TTA inference for each model at each scale
# Then fuse with WBF (iou_thr=0.55)
# See full pipeline: https://github.com/fan1344rwere/fathomnet-clef-2026-4th-place

Requirements

torch>=2.0
ultralytics>=8.4
ensemble-boxes
fathomnet  # For API enrichment

Hardware & Cost

Resource	Details
GPU	NVIDIA RTX 5090 32GB
Total GPU hours	~23 hours
Total cost	~~70 CNY (~~$10 USD)

Citation

@misc{fathomnet2026_4th,
  title={FathomNetCLEF 2026: 4th Place Solution -- Database-Guided Taxonomy Mapping
         and Self-Supervised Object Discovery for Positive-Unlabeled Marine Detection},
  author={Xiuqi Fan},
  year={2026},
  howpublished={\url{https://github.com/fan1344rwere/fathomnet-clef-2026-4th-place}}
}

Acknowledgments

FathomNet & MBARI for open marine data and pretrained models
Meta AI for DINOv3-7B
Laura Chrobak & Kevin Barnard for organizing the competition

Downloads last month: 48

Paper for Nihil-Drf/FathomNet-2026-4thplace-Model

DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 307