Kaggle BirdCLEF+ 2026 โ€” Inference-only submission

CPU-only inference pipeline for the BirdCLEF+ 2026 competition: 234 species, Pantanal (Brazilian wetland) soundscapes, 90-minute CPU runtime budget at scoring time.

This repository is a mirror of the code-only project: https://github.com/SergheiBrinza/kaggle-hackathon-birdclef-2026

What this repo is (and is not)

This is not a trained or fine-tuned model. It is the inference glue code I wrote and configured around publicly released pre-trained artifacts.

  • Nothing in this repository was trained or fine-tuned by me.
  • The pre-trained artifacts (Google Perch 2.0, the chaneyma MoE bundle, the rishikeshjani ONNX export of Perch) are used as released and are not redistributed here.
  • This matches my published declaration for the submission.
  • See THIRD_PARTY_LICENSES.md for sources and licenses; download the artifacts yourself from the original locations.

Method

The submission ensembles a frozen audio foundation model with a small mixture of experts:

  • Perch 2.0 (frozen teacher). Google's bioacoustics foundation model, used via an ONNX export for CPU inference. Outputs per-class logits plus a 1536-dim embedding.
  • chaneyma MoE. 4ร— ProtoSSM folds (selective state-space + class prototypes, consuming the Perch embedding) plus a Student CNN and a Student CRNN on log-mel features.
  • Site / hour prior. Empirical priors over the 234 classes conditioned on recording site (S\d+ parsed from filename) and hour-of-day (24 bins), fit from the BirdCLEF+ 2026 train_soundscapes_labels.csv only.
  • Temporal smoothing. Per file, over adjacent 5-second windows (12 windows per 60-second file).

My contributions

  1. ONNX patch of Perch. The upstream chaneyma inference script (CC0) loaded Perch through tf.saved_model.load, which requires TensorFlow and was too heavy for the Kaggle CPU budget. I replaced the TF code path with onnxruntime plus a CC0 ONNX export of Perch (rishikeshjani), removing the TensorFlow dependency entirely.
  2. Blend-weight search over (perch, student_cnn, student_crnn) mixing weights of the pre-sigmoid logits.
  3. Prior-scale sweep over the multiplier applied to the site/hour log-odds prior before it is added to the Perch logits.
  4. Per-file temporal smoothing, averaging each 5-second window with its immediate neighbours.

Inference script originally by chaneyma (CC0), patched for ONNX/CPU inference by Serghei Brinza.

Experiments (public leaderboard)

Metric: macro-averaged ROC-AUC (the competition metric, ranking-based).

Variant Blend (P / CNN / CRNN) --prior-scale Postprocessing Public LB
Prior sweep 0.80 / 0.13 / 0.07 0.60 smoothing 0.8 / 0.1 / 0.1 n/a
Prior sweep 0.80 / 0.13 / 0.07 0.40 smoothing 0.8 / 0.1 / 0.1 n/a
Prior sweep 0.80 / 0.13 / 0.07 0.20 smoothing 0.8 / 0.1 / 0.1 0.914
Prior sweep 0.80 / 0.13 / 0.07 0.10 smoothing 0.8 / 0.1 / 0.1 0.914
Power transform on probs 0.80 / 0.13 / 0.07 0.20 + power transform no change
Alt smoothing 0.80 / 0.13 / 0.07 0.20 smoothing 0.7 / 0.15 / 0.15 0.913

Best final public-LB score: 0.914.

A few honest caveats:

  • The power transform on probabilities had no effect, because the macro ROC-AUC metric is ranking-based.
  • The 0.7 / 0.15 / 0.15 temporal-smoothing variant (0.913) was a negative result: slightly worse than the shipped 0.8 / 0.1 / 0.1 version. Logged because honest experiment logs include the runs that did not help.
  • Intermediate prior-scale rows (0.60, 0.40) are left blank rather than filled in with guessed numbers.

Required external artifacts (not redistributed)

You must download these yourself; this repo does not host any third-party weights.

Component Source License
Google Perch 2.0 (frozen teacher) https://huggingface.co/cgeorgiaw/Perch Apache License 2.0
Perch ONNX export for BirdCLEF+ 2026 https://www.kaggle.com/datasets/rishikeshjani/perch-onnx-for-birdclef-2026 CC0 1.0
chaneyma MoE artifacts (4ร— ProtoSSM folds + StudentCNN + StudentCRNN) https://www.kaggle.com/datasets/chaneyma/birdclef-2026-cv9245-moe-artifacts CC0 1.0

These artifacts are used as released: no fine-tuning, distillation, or re-training was performed in this submission.

Runtime

  • CPU only. Perch runs on CPU via onnxruntime; ProtoSSM, Student CNN and Student CRNN run on CPU via PyTorch. There is no GPU code path.
  • Kaggle 90-minute CPU budget. Designed around the BirdCLEF+ 2026 scoring environment.
  • Audio: 32 kHz, 60-second .ogg test files, processed as 12 non-overlapping 5-second windows per file.
  • Classes: 234, taken from the competition sample_submission.csv.
  • Region: Brazilian Pantanal soundscapes.

How to run

python src/infer_moe_onnx.py \
  --blend-perch 0.80 \
  --blend-cnn   0.13 \
  --blend-crnn  0.07 \
  --prior-scale 0.20 \
  --out submission.csv

The script exposes 15 CLI flags in total (paths to artifact directories, fold weight prefix, legacy single-student fallback, proto model dim, etc.). Run python src/infer_moe_onnx.py --help for the full list. Only the five flags shown above were varied during experiments; the rest stayed at their defaults.

You will need to download the pre-trained artifacts yourself from the sources listed above and arrange them under the paths the defaults expect, or override via the path flags.

Considered but not pursued

A short design-space review. These directions are listed because I read about them while preparing this submission; I did not run any of them in this work.

  • Audio foundation models beyond Perch (BirdMAE, NatureLM-audio). Potentially stronger embeddings, but unclear whether they fit the 90-minute CPU budget without distillation work I did not want to claim.
  • Semi-supervised distillation from a larger teacher into a smaller CPU student. Out of scope here (no training in this repo).
  • SED (sound-event-detection) heads with frame-wise localisation rather than per-window logits. Would change the I/O contract; not pursued.

References

  • van Merrienboer, B. et al. (2025). Perch 2.0. arXiv:2508.04665.
  • Sydorskyi, V. & Goncalves, F. (2025). BirdCLEF+ 2025: 2nd-place CLEF Working Note. CEUR Workshop Proceedings, Vol. 4038. (Related-work reference; not used as code or data here.)

Licensing

  • My code in this repository: Apache License 2.0, see LICENSE.
  • Pre-trained artifacts and the upstream inference script: see THIRD_PARTY_LICENSES.md. They are not redistributed here.

Author

Serghei Brinza (sergheibrinza on GitHub).


Unofficial, independent submission. Not affiliated with or endorsed by Kaggle, Google, or the BirdCLEF organizers.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for Laborator/kaggle-hackathon-birdclef-2026