BirdCLEF+ 2026 — Improved Pipeline (Target: 0.90+)

This repository contains an improved 4-notebook pipeline for BirdCLEF+ 2026, based on lessons learned from a 0.815 score baseline.

🔗 Competition: https://www.kaggle.com/competitions/birdclef-2026
🔗 Author Repo: https://huggingface.co/hello9972/birdclef-2026-improved

Why You Stuck at 0.815

Your original pipeline had these fatal problems that prevented reaching 0.90+:

❌ What Destroyed Your Score

Mistake	Impact	Why
Threshold boosting (`p * 0.85 + mask * 0.15`)	0.815 → 0.52	Thresholds destroy probability ranking. BirdCLEF metric is AUC-based (rank-sensitive).
Mixup + Label Smoothing	Softened outputs to 0.05-0.2 range	Destroyed calibration needed for AUC. AUC needs spread, not softening.
Aggressive calibration (`p ** 0.75`)	0.815 → 0.53	Non-linear transforms distort ranking order.
2-model ensemble only	Ceiling ~0.82	Top solutions use 5-20 models.
No 5-fold CV	Could not ensemble diverse models	Same data, same predictions = no ensemble gain.
No pseudo-labeling	Missing 5-8% boost from test-domain adaptation	Top solutions use noisy student on test predictions.

✅ What Actually Works for BirdCLEF

Raw sigmoid outputs — NO thresholds, NO calibration
Simple ensemble — mean logits, not probabilities
Exact sample submission alignment — sample[["row_id"]].merge(sub, ...)
Pure PyTorch inference — No ONNX in Kaggle submissions
Minimal post-processing — tiny clip only

New Architecture Overview

NB1 → Data Prep + StratifiedKFold(5)
NB2 → 5-Fold Training (AsymmetricLoss, SpecAugment, Energy Crop)
NB3 → Pseudo-Labeling (Noisy Student on train_soundscapes)
NB4 → Inference (10-model ensemble, TTA, rank averaging)

Key Improvements

1. Loss Function: AsymmetricLoss (NOT BCE)

Replaces BCEWithLogitsLoss with AsymmetricLoss from arXiv:2009.14119:

class AsymmetricLoss(nn.Module):
    def __init__(self, gamma_neg=4, gamma_pos=0, clip=0.05):
        ...

Why: Down-weights easy negatives (background noise, empty segments) while preserving signal for rare species. Does NOT squash logits like label smoothing.

2. Energy-Based Window Selection (Perch 2.0 Trick)

For training clips longer than 5 seconds, finds the window with highest audio energy instead of random crop:

def _energy_crop(self, wav):
    energy = librosa.feature.rms(y=wav, frame_length=2048, hop_length=512)[0]
    peak_frame = np.argmax(smoothed_energy)
    # center window around peak with jitter

Why: Bird calls are often brief. Random crops miss them. Energy-based crops hit them 3-4x more often.

3. Augmentations (Waveform + Spectrogram)

Augmentation	Level	Purpose
Cyclic roll	100%	Time-shift invariance
Colored noise	30%	SNR 3-30dB, f^-decay
Background noise	50%	Real soundscape mixing
Gain	30%	±12dB
SpecAugment (freq mask)	50%	24 bins
SpecAugment (time mask)	50%	40 frames

NO mixup. NO label smoothing. Both destroyed your score.

4. 5-Fold StratifiedKFold

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

Each fold gets the same species distribution. 5 diverse models = 5x ensemble power.

5. Layer-Wise LR Decay

lr_scale = layer_decay ** (num_blocks - layer_idx)

Deeper layers (closer to input) get smaller LR. Prevents overfitting on early layers.

6. Test-Time Augmentation (TTA)

4 variants per chunk:

Original
Time-reversed
+3dB gain
-3dB gain

Average logits across all variants.

7. Pseudo-Labeling (Noisy Student)

Use confident predictions (>0.5) on train_soundscapes as additional training data. Retrain with these pseudo-labels + original data.

Expected boost: 0.84 → 0.88

Expected Score Improvements

Stage	Technique	Expected Score
Baseline	Your 0.815 pipeline	0.815
NB2 improvement	AsymmetricLoss + energy crop + no mixup	0.83-0.85
5-fold ensemble	10 models (5 folds × 2 backbones)	0.85-0.87
TTA	4 variants per chunk	0.86-0.88
Pseudo-labeling	Noisy student on soundscapes	0.88-0.91
+ Better backbone	Bird-MAE or ConvNeXt	0.90-0.93

Files

File	Purpose
`nb01_data_prep.py`	Data cleaning, VAD, StratifiedKFold(5)
`nb02_training.py`	5-fold training with AsymmetricLoss, SpecAugment
`nb03_pseudo_labeling.py`	Generate pseudo-labels, noisy student
`nb04_inference.py`	10-model ensemble, TTA, submission generation

How to Run on Kaggle

Step 1: Create Dataset from NB1 Output

After running nb01_data_prep.py, create a Kaggle dataset from /kaggle/working/:

train_cleaned_stratified.csv
soundscape_labels_with_folds.csv
species_list.csv
rare_species.csv

Step 2: NB2 Training

# In Kaggle notebook, attach:
# - Competition data: birdclef-2026
# - NB1 output dataset
# Run nb02_training.py → produces 10 .pt files in /kaggle/working/models/

Save models as a new Kaggle dataset.

Step 3: NB3 Pseudo-Labeling

# Attach NB2 model dataset + NB1 data
# Run nb03_pseudo_labeling.py → produces pseudo_labels_soft.csv

Step 4: NB4 Inference (Submission)

# Attach NB2 model dataset + competition test data
# Run nb04_inference.py → produces submission.csv

Critical Rules for BirdCLEF

NEVER threshold predictions — It destroys AUC ranking.
NEVER apply non-linear calibration (p**0.75, p/(p+1), etc.) — It distorts rank order.
NEVER mixup or label-smooth — It squashes logits into a narrow range, killing AUC spread.
ALWAYS align submission with sample_submission.csv — sample[["row_id"]].merge(sub, ...)
ALWAYS ensemble diverse models — Same model, same folds = no gain.
ALWAYS use raw sigmoid outputs — Let the metric handle calibration.

References

AsymmetricLoss: arXiv:2009.14119
Bird-MAE: arXiv:2504.12880
sl-BEATs: arXiv:2508.11845
Top solution reference: minalkharat12/birdclef-2026-solution

License

MIT — Competition code for educational purposes.

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "hello9972/birdclef-2026-improved"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for hello9972/birdclef-2026-improved