Kaggle BirdCLEF+ 2026 — Inference-only submission

CPU-only inference pipeline for the BirdCLEF+ 2026 competition: 234 species, Pantanal (Brazilian wetland) soundscapes, 90-minute CPU runtime budget at scoring time.

This repository is a mirror of the code-only project: https://github.com/SergheiBrinza/kaggle-hackathon-birdclef-2026

What this repo is (and is not)

This is not a trained or fine-tuned model. It is the inference glue code I wrote and configured around publicly released pre-trained artifacts.

Nothing in this repository was trained or fine-tuned by me.
The pre-trained artifacts (Google Perch 2.0, the chaneyma MoE bundle, the rishikeshjani ONNX export of Perch) are used as released and are not redistributed here.
This matches my published declaration for the submission.
See THIRD_PARTY_LICENSES.md for sources and licenses; download the artifacts yourself from the original locations.

Method

The submission ensembles a frozen audio foundation model with a small mixture of experts:

Perch 2.0 (frozen teacher). Google's bioacoustics foundation model, used via an ONNX export for CPU inference. Outputs per-class logits plus a 1536-dim embedding.
chaneyma MoE. 4× ProtoSSM folds (selective state-space + class prototypes, consuming the Perch embedding) plus a Student CNN and a Student CRNN on log-mel features.
Site / hour prior. Empirical priors over the 234 classes conditioned on recording site (S\d+ parsed from filename) and hour-of-day (24 bins), fit from the BirdCLEF+ 2026 train_soundscapes_labels.csv only.
Temporal smoothing. Per file, over adjacent 5-second windows (12 windows per 60-second file).

My contributions

ONNX patch of Perch. The upstream chaneyma inference script (CC0) loaded Perch through tf.saved_model.load, which requires TensorFlow and was too heavy for the Kaggle CPU budget. I replaced the TF code path with onnxruntime plus a CC0 ONNX export of Perch (rishikeshjani), removing the TensorFlow dependency entirely.
Blend-weight search over (perch, student_cnn, student_crnn) mixing weights of the pre-sigmoid logits.
Prior-scale sweep over the multiplier applied to the site/hour log-odds prior before it is added to the Perch logits.
Per-file temporal smoothing, averaging each 5-second window with its immediate neighbours.

Inference script originally by chaneyma (CC0), patched for ONNX/CPU inference by Serghei Brinza.

Experiments (public leaderboard)

Metric: macro-averaged ROC-AUC (the competition metric, ranking-based).

Variant	Blend (P / CNN / CRNN)	`--prior-scale`	Postprocessing	Public LB
Prior sweep	0.80 / 0.13 / 0.07	0.60	smoothing 0.8 / 0.1 / 0.1	n/a
Prior sweep	0.80 / 0.13 / 0.07	0.40	smoothing 0.8 / 0.1 / 0.1	n/a
Prior sweep	0.80 / 0.13 / 0.07	0.20	smoothing 0.8 / 0.1 / 0.1	0.914
Prior sweep	0.80 / 0.13 / 0.07	0.10	smoothing 0.8 / 0.1 / 0.1	0.914
Power transform on probs	0.80 / 0.13 / 0.07	0.20	+ power transform	no change
Alt smoothing	0.80 / 0.13 / 0.07	0.20	smoothing 0.7 / 0.15 / 0.15	0.913

Best final public-LB score: 0.914.

A few honest caveats:

The power transform on probabilities had no effect, because the macro ROC-AUC metric is ranking-based.
The 0.7 / 0.15 / 0.15 temporal-smoothing variant (0.913) was a negative result: slightly worse than the shipped 0.8 / 0.1 / 0.1 version. Logged because honest experiment logs include the runs that did not help.
Intermediate prior-scale rows (0.60, 0.40) are left blank rather than filled in with guessed numbers.

Required external artifacts (not redistributed)

You must download these yourself; this repo does not host any third-party weights.

Component	Source	License
Google Perch 2.0 (frozen teacher)	https://huggingface.co/cgeorgiaw/Perch	Apache License 2.0
Perch ONNX export for BirdCLEF+ 2026	https://www.kaggle.com/datasets/rishikeshjani/perch-onnx-for-birdclef-2026	CC0 1.0
chaneyma MoE artifacts (4× ProtoSSM folds + StudentCNN + StudentCRNN)	https://www.kaggle.com/datasets/chaneyma/birdclef-2026-cv9245-moe-artifacts	CC0 1.0

These artifacts are used as released: no fine-tuning, distillation, or re-training was performed in this submission.

Runtime

CPU only. Perch runs on CPU via onnxruntime; ProtoSSM, Student CNN and Student CRNN run on CPU via PyTorch. There is no GPU code path.
Kaggle 90-minute CPU budget. Designed around the BirdCLEF+ 2026 scoring environment.
Audio: 32 kHz, 60-second .ogg test files, processed as 12 non-overlapping 5-second windows per file.
Classes: 234, taken from the competition sample_submission.csv.
Region: Brazilian Pantanal soundscapes.

How to run

python src/infer_moe_onnx.py \
  --blend-perch 0.80 \
  --blend-cnn   0.13 \
  --blend-crnn  0.07 \
  --prior-scale 0.20 \
  --out submission.csv

The script exposes 15 CLI flags in total (paths to artifact directories, fold weight prefix, legacy single-student fallback, proto model dim, etc.). Run python src/infer_moe_onnx.py --help for the full list. Only the five flags shown above were varied during experiments; the rest stayed at their defaults.

You will need to download the pre-trained artifacts yourself from the sources listed above and arrange them under the paths the defaults expect, or override via the path flags.

Considered but not pursued

A short design-space review. These directions are listed because I read about them while preparing this submission; I did not run any of them in this work.

Audio foundation models beyond Perch (BirdMAE, NatureLM-audio). Potentially stronger embeddings, but unclear whether they fit the 90-minute CPU budget without distillation work I did not want to claim.
Semi-supervised distillation from a larger teacher into a smaller CPU student. Out of scope here (no training in this repo).
SED (sound-event-detection) heads with frame-wise localisation rather than per-window logits. Would change the I/O contract; not pursued.

References

van Merrienboer, B. et al. (2025). Perch 2.0. arXiv:2508.04665.
Sydorskyi, V. & Goncalves, F. (2025). BirdCLEF+ 2025: 2nd-place CLEF Working Note. CEUR Workshop Proceedings, Vol. 4038. (Related-work reference; not used as code or data here.)

Licensing

My code in this repository: Apache License 2.0, see LICENSE.
Pre-trained artifacts and the upstream inference script: see THIRD_PARTY_LICENSES.md. They are not redistributed here.

Author

Serghei Brinza (sergheibrinza on GitHub).

_{Unofficial, independent submission. Not affiliated with or endorsed by Kaggle, Google, or the BirdCLEF organizers.}

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for Laborator/kaggle-hackathon-birdclef-2026

Perch 2.0: The Bittern Lesson for Bioacoustics

Paper • 2508.04665 • Published Aug 6, 2025 • 2