Doppelganger — trained heads and fine-tuned encoders

Models for the Doppelganger benchmark (matching a synthetic sound effect to the real recording it was generated from). Code and paper: https://github.com/elliottash/doppelganger · dataset: https://huggingface.co/datasets/elliottash/doppelganger

Contents

  • heads/ — trained MLP heads (*.head.pt), the paper's models. Each is a compact head (d → 512 → 256, batch-norm, ReLU, ℓ2-normalized) on top of a frozen encoder. Naming: <encoder>_ucs_paired[_<variant>]_<objective>.head.pt, with objective ∈ {instance, class} and per-fold leave-classes-out (kf0..kf4), data-efficiency (sub250..sub4000), and cross-generator (aldm) variants.
    • instance head — the generalizing one (learns the synthetic→real instance correspondence).
    • class head — class-supervised control (collapses below frozen on unseen events).
  • ckpts/ — fine-tuned encoder checkpoints (BEATs, M2D) used in the six-encoder robustness study.

Use

import torch
head = torch.load("heads/clap_general_ucs_paired_instance.head.pt", map_location="cpu")
# apply to frozen encoder embeddings (see src/apply_head.py in the code repo)

The heads consume frozen-encoder embeddings; the frozen backbones (CLAP, PANNs, AST, AudioMAE) come from their original releases, and the BEATs/M2D fine-tunes are in ckpts/.

License

MIT (heads and fine-tunes). Frozen backbones follow their original licenses.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support