AASIST

EER% 0.83 on ASVspoof2019_LA EER% 12.35 on ASVspoof2021_LA EER% 17.04 on ASVspoof2021_DF EER% 43.01 on InTheWild EER% 51.05 on CD-ADD arena tier arena rank

AASIST audio anti-spoofing (voice-deepfake detection) countermeasure from "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks" (Jung et al., ICASSP 2022). This is the official AASIST variant (not AASIST-L), using the upstream clovaai/aasist ASVspoof2019 LA pretrained checkpoint. The model takes a raw speech waveform and returns a score where higher = more bona fide.

This repo is self-contained for inference: the network definition is in _net.py and the exact wrapper used to produce the Arena scores in aasist.py.

Architecture

AASIST operates directly on the raw waveform: a sinc-convolution front-end and a RawNet2-style residual encoder produce a spectro-temporal feature map, which is modelled by heterogeneous stacking graph attention layers over spectral and temporal sub-graphs with a learnable max/average readout, followed by a 2-class output (bona fide vs. spoof). The Arena score is the bona-fide logit.

Reproducing the Arena scores

Inference uses a deterministic first-64600-sample window (no random crop), matching the upstream data_utils.pad() used at eval. Audio is provided as float32 mono at 16 kHz (no resampling in the wrapper).

from aasist import AASIST
m = AASIST(); m.load()
scores = m.score_batch([wav], [16000])   # higher = more bona fide
Dataset EER % n_trials
ASVspoof2019_LA (in-domain) 0.83 71,237
ASVspoof2021_LA 12.35 181,566
ASVspoof2021_DF 17.04 611,829
InTheWild 43.01 31,779
CD-ADD 51.05 20,786

The in-domain ASVspoof2019 LA result reproduces the paper's reported EER (~0.83%).

License

MIT (inherited from clovaai/aasist; see LICENSE).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for SpeechAntiSpoofingBenchmarks/AASIST