Doppelganger — trained heads and fine-tuned encoders
Models for the Doppelganger benchmark (matching a synthetic sound effect to the real recording it was generated from). Code and paper: https://github.com/elliottash/doppelganger · dataset: https://huggingface.co/datasets/elliottash/doppelganger
Contents
heads/— trained MLP heads (*.head.pt), the paper's models. Each is a compact head (d → 512 → 256, batch-norm, ReLU, ℓ2-normalized) on top of a frozen encoder. Naming:<encoder>_ucs_paired[_<variant>]_<objective>.head.pt, withobjective ∈ {instance, class}and per-fold leave-classes-out (kf0..kf4), data-efficiency (sub250..sub4000), and cross-generator (aldm) variants.- instance head — the generalizing one (learns the synthetic→real instance correspondence).
- class head — class-supervised control (collapses below frozen on unseen events).
ckpts/— fine-tuned encoder checkpoints (BEATs, M2D) used in the six-encoder robustness study.
Use
import torch
head = torch.load("heads/clap_general_ucs_paired_instance.head.pt", map_location="cpu")
# apply to frozen encoder embeddings (see src/apply_head.py in the code repo)
The heads consume frozen-encoder embeddings; the frozen backbones (CLAP, PANNs, AST, AudioMAE) come
from their original releases, and the BEATs/M2D fine-tunes are in ckpts/.
License
MIT (heads and fine-tunes). Frozen backbones follow their original licenses.