Instructions to use Nihil-Drf/FathomNet-2026-4thplace-Model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ultralytics
How to use Nihil-Drf/FathomNet-2026-4thplace-Model with ultralytics:
# Couldn't find a valid YOLO version tag. # Replace XX with the correct version. from ultralytics import YOLOvXX model = YOLOvXX.from_pretrained("Nihil-Drf/FathomNet-2026-4thplace-Model") source = 'http://images.cocodataset.org/val2017/000000039769.jpg' model.predict(source=source, save=True) - Notebooks
- Google Colab
- Kaggle
FathomNet-CLEF 2026: 4th Place Solution Models
π 4th / 102 teams (Top 3.9%) | 0.2808 mAP (Private LB) | ~$10 GPU Cost
Competition: FathomNetCLEF2026 - Positive-Unlabeled Object Detection in Marine Images Workshop: CVPR 2026 FGVC Workshop + LifeCLEF 2026 Solution Write-up: GitHub Repository
Intended Use
These models are designed for marine organism object detection in deep-sea imagery, specifically trained for the FathomNet-CLEF 2026 Positive-Unlabeled detection challenge. They detect 32 classes of marine organisms including fish, sponges, sea stars, urchins, corals, and other benthic fauna.
Use cases:
- Marine biodiversity monitoring from ROV/AUV footage
- Automated annotation of deep-sea imagery
- Research on positive-unlabeled object detection methods
- Benchmarking marine detection models
Limitations:
- Trained on MBARI deep-sea footage; may not generalize to shallow-water or tropical reef imagery
- 32 coarse-grained classes; does not provide species-level identification
- Optimized for the PU setting where training annotations are incomplete
Model Description
Architecture
All YOLO models are based on MBARI-315k-YOLOv8x (68M params, 258 GFLOPs), pretrained on 315,000 marine images from MBARI, then fine-tuned through iterative pseudo-label self-training.
The DINOv3 decoder is a lightweight DETR-style Transformer decoder (7.8M params) trained on frozen DINOv3-7B features.
Training Strategy
Our solution combines four complementary strategies:
- Phase A: Iterative Pseudo-Label Self-Training β 4-round YOLO training with decreasing confidence thresholds (22K β 77K annotations, 3.45x enrichment)
- Phase B: DINOv3-7B Unsupervised Object Discovery β TokenCut spectral clustering on 6.7B-param self-supervised ViT features (16,695 new objects discovered)
- Phase C: FathomNet API Database Enrichment β Expert annotations via
find_by_uuidwith 276-species hierarchical taxonomy mapping (+0.1517 mAP, single largest contribution) - Phase D: Multi-Model TTA Ensemble & WBF Fusion β 3 models Γ 3 scales Γ flip augmentation
Models Included
| File | Description | Params | Size |
|---|---|---|---|
yolo_r0_best.pt |
Round 0: Base MBARI-YOLOv8x fine-tuned on original GT (22K annotations) | 68M | 131MB |
yolo_r1_best.pt |
Round 1: Self-trained with pseudo-labels (27K annotations) | 68M | 131MB |
yolo_r2_best.pt |
Round 2: Self-trained with more pseudo-labels (39K annotations) | 68M | 131MB |
yolo_r3_best.pt |
Round 3: Self-trained with all enriched data incl. DINOv3 discoveries (77K annotations) | 68M | 131MB |
dinov3_decoder_best.pt |
DETR decoder on frozen DINOv3-7B features (experimental) | 7.8M | 31MB |
mbari_315k_yolov8_pretrained.pt |
Original MBARI-315k pretrained YOLOv8x (starting checkpoint) | 68M | 132MB |
Metrics
Final Submission Score
| Metric | Value |
|---|---|
| COCO mAP (IoU 0.5:0.95) β Private LB | 0.2808 |
| COCO mAP (IoU 0.5:0.95) β Public LB | 0.2584 |
| Ranking (Private LB) | 4th / 102 teams |
Public β Private LB Shake-up: Our ranking jumped from #7 (0.2584) β #4 (0.2808), a +0.0224 mAP gain. Among the top 10 public LB teams, most experienced a score drop on the private LB. Only 2 teams (including ours) improved, and we had the largest score gain. This demonstrates strong generalization β our FathomNet API-based taxonomy mapping and tiered confidence calibration were robust to unseen test data rather than overfitting to the public test split.
Note: In Kaggle competitions, the public leaderboard is calculated on a subset of test data (~30%) during the competition, while the private leaderboard uses the remaining ~70% and is only revealed after the competition ends. The private LB determines the final ranking.
Score Progression
| Component | mAP | Delta |
|---|---|---|
| YOLO R0 baseline | 0.0293 | - |
| + R1 self-training | 0.0539 | +0.0246 |
| + R2 self-training | 0.0699 | +0.0160 |
| + FathomNet API enrichment | 0.2216 | +0.1517 |
| + Tiered confidence calibration | 0.2465 | +0.0249 |
| + R3 + TTA ensemble + dedup | 0.2584 | +0.0119 |
Per-Round YOLO Training
| Round | Epochs | Annotations | Val mAP50 |
|---|---|---|---|
| R0 | 40 | 22,225 (GT) | 0.836 |
| R1 | 45 | 27,217 | 0.924 |
| R2 | 50 | 38,969 | ~0.85 |
| R3 | 40 | 76,747 | - |
Training and Evaluation Data
- Training: 6,463 images with 22,225 annotations (32 classes), expanded to 76,747 via self-training
- Test: 1,425 images with complete annotations
- External data: FathomNet Database API (permitted by competition rules)
- Base model pretraining: MBARI-315k dataset (315,000 marine images)
32 Detection Classes
anemone, barnacle, benthic worm, bony fish, brittle star, bryozoan, cnidarian, coral, crab, dead organism, echinoderm, feather star, gastropod, glass sponge, holothurian, hydromedusa, isopod, jelly, nudibranch, octopus, polychaete, sea cucumber, sea fan, sea pen, sea slug, sea spider, sea star, shrimp, snail, sponge, squat lobster, urchin
Deployment
Quick Start with Ultralytics
from ultralytics import YOLO
# Load any of the 4 trained rounds
model = YOLO("yolo_r1_best.pt")
# Run inference
results = model("deep_sea_image.jpg", imgsz=1280, conf=0.25)
# Visualize
results[0].show()
Multi-Model TTA Ensemble (competition inference)
from ultralytics import YOLO
from ensemble_boxes import weighted_boxes_fusion
models = [
(YOLO("yolo_r0_best.pt"), 2.0),
(YOLO("yolo_r1_best.pt"), 3.5),
(YOLO("yolo_r2_best.pt"), 4.0),
]
scales = [1024, 1280, 1536]
# Run TTA inference for each model at each scale
# Then fuse with WBF (iou_thr=0.55)
# See full pipeline: https://github.com/fan1344rwere/fathomnet-clef-2026-4th-place
Requirements
torch>=2.0
ultralytics>=8.4
ensemble-boxes
fathomnet # For API enrichment
Hardware & Cost
| Resource | Details |
|---|---|
| GPU | NVIDIA RTX 5090 32GB |
| Total GPU hours | ~23 hours |
| Total cost |
Citation
@misc{fathomnet2026_4th,
title={FathomNetCLEF 2026: 4th Place Solution -- Database-Guided Taxonomy Mapping
and Self-Supervised Object Discovery for Positive-Unlabeled Marine Detection},
author={Xiuqi Fan},
year={2026},
howpublished={\url{https://github.com/fan1344rwere/fathomnet-clef-2026-4th-place}}
}
Acknowledgments
- Downloads last month
- 48