BirdCLEF+ 2026 β Improved Pipeline (Target: 0.90+)
This repository contains an improved 4-notebook pipeline for BirdCLEF+ 2026, based on lessons learned from a 0.815 score baseline.
π Competition: https://www.kaggle.com/competitions/birdclef-2026
π Author Repo: https://huggingface.co/hello9972/birdclef-2026-improved
Why You Stuck at 0.815
Your original pipeline had these fatal problems that prevented reaching 0.90+:
β What Destroyed Your Score
| Mistake | Impact | Why |
|---|---|---|
Threshold boosting (p * 0.85 + mask * 0.15) |
0.815 β 0.52 | Thresholds destroy probability ranking. BirdCLEF metric is AUC-based (rank-sensitive). |
| Mixup + Label Smoothing | Softened outputs to 0.05-0.2 range | Destroyed calibration needed for AUC. AUC needs spread, not softening. |
Aggressive calibration (p ** 0.75) |
0.815 β 0.53 | Non-linear transforms distort ranking order. |
| 2-model ensemble only | Ceiling ~0.82 | Top solutions use 5-20 models. |
| No 5-fold CV | Could not ensemble diverse models | Same data, same predictions = no ensemble gain. |
| No pseudo-labeling | Missing 5-8% boost from test-domain adaptation | Top solutions use noisy student on test predictions. |
β What Actually Works for BirdCLEF
- Raw sigmoid outputs β NO thresholds, NO calibration
- Simple ensemble β mean logits, not probabilities
- Exact sample submission alignment β
sample[["row_id"]].merge(sub, ...) - Pure PyTorch inference β No ONNX in Kaggle submissions
- Minimal post-processing β tiny clip only
New Architecture Overview
NB1 β Data Prep + StratifiedKFold(5)
NB2 β 5-Fold Training (AsymmetricLoss, SpecAugment, Energy Crop)
NB3 β Pseudo-Labeling (Noisy Student on train_soundscapes)
NB4 β Inference (10-model ensemble, TTA, rank averaging)
Key Improvements
1. Loss Function: AsymmetricLoss (NOT BCE)
Replaces BCEWithLogitsLoss with AsymmetricLoss from arXiv:2009.14119:
class AsymmetricLoss(nn.Module):
def __init__(self, gamma_neg=4, gamma_pos=0, clip=0.05):
...
Why: Down-weights easy negatives (background noise, empty segments) while preserving signal for rare species. Does NOT squash logits like label smoothing.
2. Energy-Based Window Selection (Perch 2.0 Trick)
For training clips longer than 5 seconds, finds the window with highest audio energy instead of random crop:
def _energy_crop(self, wav):
energy = librosa.feature.rms(y=wav, frame_length=2048, hop_length=512)[0]
peak_frame = np.argmax(smoothed_energy)
# center window around peak with jitter
Why: Bird calls are often brief. Random crops miss them. Energy-based crops hit them 3-4x more often.
3. Augmentations (Waveform + Spectrogram)
| Augmentation | Level | Purpose |
|---|---|---|
| Cyclic roll | 100% | Time-shift invariance |
| Colored noise | 30% | SNR 3-30dB, f^-decay |
| Background noise | 50% | Real soundscape mixing |
| Gain | 30% | Β±12dB |
| SpecAugment (freq mask) | 50% | 24 bins |
| SpecAugment (time mask) | 50% | 40 frames |
NO mixup. NO label smoothing. Both destroyed your score.
4. 5-Fold StratifiedKFold
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
Each fold gets the same species distribution. 5 diverse models = 5x ensemble power.
5. Layer-Wise LR Decay
lr_scale = layer_decay ** (num_blocks - layer_idx)
Deeper layers (closer to input) get smaller LR. Prevents overfitting on early layers.
6. Test-Time Augmentation (TTA)
4 variants per chunk:
- Original
- Time-reversed
- +3dB gain
- -3dB gain
Average logits across all variants.
7. Pseudo-Labeling (Noisy Student)
Use confident predictions (>0.5) on train_soundscapes as additional training data. Retrain with these pseudo-labels + original data.
Expected boost: 0.84 β 0.88
Expected Score Improvements
| Stage | Technique | Expected Score |
|---|---|---|
| Baseline | Your 0.815 pipeline | 0.815 |
| NB2 improvement | AsymmetricLoss + energy crop + no mixup | 0.83-0.85 |
| 5-fold ensemble | 10 models (5 folds Γ 2 backbones) | 0.85-0.87 |
| TTA | 4 variants per chunk | 0.86-0.88 |
| Pseudo-labeling | Noisy student on soundscapes | 0.88-0.91 |
| + Better backbone | Bird-MAE or ConvNeXt | 0.90-0.93 |
Files
| File | Purpose |
|---|---|
nb01_data_prep.py |
Data cleaning, VAD, StratifiedKFold(5) |
nb02_training.py |
5-fold training with AsymmetricLoss, SpecAugment |
nb03_pseudo_labeling.py |
Generate pseudo-labels, noisy student |
nb04_inference.py |
10-model ensemble, TTA, submission generation |
How to Run on Kaggle
Step 1: Create Dataset from NB1 Output
After running nb01_data_prep.py, create a Kaggle dataset from /kaggle/working/:
train_cleaned_stratified.csv
soundscape_labels_with_folds.csv
species_list.csv
rare_species.csv
Step 2: NB2 Training
# In Kaggle notebook, attach:
# - Competition data: birdclef-2026
# - NB1 output dataset
# Run nb02_training.py β produces 10 .pt files in /kaggle/working/models/
Save models as a new Kaggle dataset.
Step 3: NB3 Pseudo-Labeling
# Attach NB2 model dataset + NB1 data
# Run nb03_pseudo_labeling.py β produces pseudo_labels_soft.csv
Step 4: NB4 Inference (Submission)
# Attach NB2 model dataset + competition test data
# Run nb04_inference.py β produces submission.csv
Critical Rules for BirdCLEF
- NEVER threshold predictions β It destroys AUC ranking.
- NEVER apply non-linear calibration (
p**0.75,p/(p+1), etc.) β It distorts rank order. - NEVER mixup or label-smooth β It squashes logits into a narrow range, killing AUC spread.
- ALWAYS align submission with sample_submission.csv β
sample[["row_id"]].merge(sub, ...) - ALWAYS ensemble diverse models β Same model, same folds = no gain.
- ALWAYS use raw sigmoid outputs β Let the metric handle calibration.
References
- AsymmetricLoss: arXiv:2009.14119
- Bird-MAE: arXiv:2504.12880
- sl-BEATs: arXiv:2508.11845
- Top solution reference: minalkharat12/birdclef-2026-solution
License
MIT β Competition code for educational purposes.
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "hello9972/birdclef-2026-improved"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.