DF Arena 500M — Speech Anti-Spoofing Arena results
RAPTOR universal anti-spoofing model. A wav2vec 2.0 XLS-R 300M self-supervised front-end whose per-layer hidden states are combined by learnable attention pooling (a layer-wise sigmoid gate over an attention-pooled summary), then passed through a 4-block Conformer head with a class token to a 2-way classifier. FP32, deterministic first-64600-sample (~4.04 s @ 16 kHz) window, tile-repeat if shorter (no random crop, no resampling). score = softmax(logits)[bonafide]; higher = more bona fide. Official Speech-Arena-2025/DF_Arena_500M_V_1 checkpoint.
Paper: arXiv:2603.06164 · Params: 436M · Checkpoint: SpeechAntiSpoofingBenchmarks/DF_Arena_500M_V_1
Arena standing
Live leaderboard: DF Arena 500M on the Speech Anti-Spoofing Arena
Per-dataset results (24 datasets, mean EER 5.09%)
| Dataset | Metric | Score |
|---|---|---|
| J-SPAW_LA | EER | 0% |
| ArAD | EER | 9.01% |
| DFADD | EER | 0% |
| SONAR | EER | 2.11% |
| DeepVoice | EER | 9.71% |
| EmoFake_test | EER | 2.63% |
| LibriSeVoc | EER | 0.11% |
| CD-ADD | EER | 2.46% |
| ODSS | EER | 8.4% |
| InTheWild | EER | 1.87% |
| DECRO | EER | 4.33% |
| CFAD | EER | 8% |
| ASVspoof2019_LA | EER | 1.19% |
| HABLA | EER | 3.27% |
| CVoiceFake_small | EER | 7.9% |
| ASVspoof2021_LA | EER | 5.78% |
| PyAra | EER | 15.96% |
| XMAD | EER | 2.83% |
| ASVspoof2021_DF | EER | 3.5% |
| ASVspoof5 | EER | 13.43% |
| ADD22_eval_31 | EER | 1.97% |
| ADD2023_track12_test_r1 | EER | 7.44% |
| EmoSpoofTTS | 1-SRR | 3.1% |
| LRLspoof | 1-SRR | 1.61% |
EER = Equal Error Rate (lower better). 1-SRR = spoof-only complement of the Spoof
Recall Rate at the model's own DeepVoice EER operating point (lower better). All rows
scoring-verified (reproduce --scoring, Δ 0.0) and computed with the TensorRT engine
(parity-verified vs PyTorch).
Usage
from transformers import pipeline
import librosa
pipe = pipeline("antispoofing", model="SpeechAntiSpoofingBenchmarks/DF_Arena_500M_V_1", trust_remote_code=True, device="cuda")
audio, sr = librosa.load("sample.wav", sr=16000)
print(pipe(audio)) # {'label': 'bonafide'|'spoof', 'all_scores': {...}}
Citation
@misc{kulkarni2026compactsslbackbonesmatter,
title={Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR},
author={Ajinkya Kulkarni and Sandipana Dowerah and Atharva Kulkarni and Tanel Alumäe and Mathew Magimai Doss},
year={2026},
eprint={2603.06164},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2603.06164}
}
- Downloads last month
- 440