- ArA-DF-Baseline
- Benchmark results
- Repository contents
- Architecture
- Step 0 β Install DeepFense
- Step 1 β Download the ArA-DF 2026 dataset
- Step 2 β Extract the audio
- Step 3 β Generate DeepFense parquet files
- Step 4 β Fill in
config.yaml - Competition Submission (no labels required)
- Evaluate (with labels)
- Train from scratch
- Citation
- Links
- Benchmark results
ArA-DF-Baseline
Arabic deepfake audio detector for the ArA-DF 2026 shared task, published under the ArabicSpeech organization.
Architecture: wav2vec2 XLS-R 300M frontend + AASIST backend.
Benchmark results
| Split | EER β (%) | # utterances |
|---|---|---|
| track-1_development_test | 14.65 | 16,023 |
| track-1_test | 14.54 | 144,210 |
| track-2_development_test | 27.21 | 14,193 |
| track-2_test | 27.11 | 127,746 |
track-1_development_test / track-1_test β Track 1 development / evaluation phases (CodaBench). track-2_development_test / track-2_test β Track 2 development / evaluation phases (CodaBench).
Repository contents
| File | Description |
|---|---|
best_model.pth |
Trained DeepFense checkpoint |
xlsr2_300m.pt |
Fairseq XLS-R 300M frontend weights |
config.yaml |
DeepFense config β fill in your parquet_files paths before use |
build_parquets.py |
Standalone parquet generation script (no DeepFense install needed) |
results.json |
Published metrics on the two test tracks above |
Architecture
Raw waveform (16 kHz)
β
βΌ
wav2vec2 XLS-R 300M (xlsr2_300m.pt, fairseq)
frame-level features β (T, 1024)
β
βΌ
AASIST backend
Linear(1024 β 128)
RawNet2 encoder β 6 residual blocks on a 2-D (freq Γ time) map
Spectral attention β soft attention along frequency axis β spectral graph nodes
Temporal attention β soft attention along time axis β temporal graph nodes
GAT_S / GAT_T β independent graph attention on each graph
HtrgGAT Γ4 β heterogeneous cross-graph attention with learnable master nodes
Graph pooling + readout: max & mean over S, T, and master β 160-d embedding
β
βΌ
CrossEntropy head β P(bonafide) / P(spoof)
The AASIST backend is from Jung et al., ICASSP 2022. This checkpoint uses mean pooling on XLS-R frame features before the backend.
Step 0 β Install DeepFense
git clone https://github.com/Yaselley/deepfense-framework.git
cd deepfense-framework
conda create -n deepfense python=3.10 -y
conda activate deepfense
pip install deepfense
Download this model (same root as the dataset β see Step 1):
huggingface-cli download ArabicSpeech/ArA-DF-Baseline --local-dir ArA-DF-2026/models/ArA-DF-Baseline
or
hf download ArabicSpeech/ArA-DF-Baseline --local-dir ArA-DF-2026/models/ArA-DF-Baseline
All tutorial paths use
ArA-DF-2026under your home directory (no root/write access needed). On a cluster you can substitute e.g./data/ArA-DF-2026if you have write permission there β keep paths consistent across steps.
Step 1 β Download the ArA-DF 2026 dataset
The audio is distributed as WebDataset TAR shards on Hugging Face.
huggingface-cli download ArabicSpeech/ArA-DF-2026 \
--repo-type dataset \
--local-dir ArA-DF-2026
or
hf download ArabicSpeech/ArA-DF-2026 \
--repo-type dataset \
--local-dir ArA-DF-2026
After the download, the layout on disk is:
ArA-DF-2026/
βββ data/
β βββ train/
β βββ dev/
β βββ track-1_development_test/
β βββ track-1_test/
β βββ track-2_development_test/
β βββ track-2_test/
βββ metadata/
β βββ train.parquet
β βββ dev.parquet
β βββ track-1_development_test.parquet
β βββ track-1_test.parquet
β βββ track-2_development_test.parquet
β βββ track-2_test.parquet
βββ models/
βββ ArA-DF-Baseline/ β from Step 0
βββ config.yaml
βββ best_model.pth
βββ xlsr2_300m.pt
Step 2 β Extract the audio
Extract every TAR in-place (same command for all splits):
for split in train dev track-1_development_test track-1_test track-2_development_test track-2_test; do
cd ArA-DF-2026/data/$split
for tar in *.tar; do tar -xf "$tar"; done
cd -
done
After extraction, FLACs may land flat (directly in the split folder) or in shard sub-folders β both work. build_parquets.py auto-detects either layout.
Flat example:
ArA-DF-2026/data/track-1_test/
βββ track-1_test-000000.tar
βββ test_0000008.flac
βββ test_0000031.flac
βββ ...
| Argument | Value |
|---|---|
--data_root |
ArA-DF-2026/data |
--meta_root |
ArA-DF-2026/metadata |
Step 3 β Generate DeepFense parquet files
Parquet files hold the metadata DeepFense reads at training/evaluation time.
Each row has: ID, path (absolute path to audio), label (1 as bonafide or 0 as spoof), dataset_name.
Label convention (HF metadata, parquets, and this model's training):
| Class | HF label |
DeepFense label_map |
|---|---|---|
| bonafide | 1 |
1 (bonafide) |
| spoof | 0 |
0 (spoof) |
A standalone build_parquets.py script is included in this repo. It only needs pandas and pyarrow:
pip install pandas pyarrow
cd ArA-DF-2026/models/ArA-DF-Baseline
python build_parquets.py \
--data_root ../../data \
--meta_root ../../metadata \
--output_dir ../../parquets
Output:
ArA-DF-2026/parquets/
βββ aradf_train.parquet
βββ aradf_val.parquet
βββ aradf_track-1_development_test.parquet β Track 1, CodaBench dev phase
βββ aradf_track-1_test.parquet β Track 1, CodaBench eval phase
βββ aradf_track-2_development_test.parquet β Track 2, CodaBench dev phase
βββ aradf_track-2_test.parquet β Track 2, CodaBench eval phase
Test-only (skip train/dev):
python build_parquets.py \
--data_root ../../data \
--meta_root ../../metadata \
--output_dir ../../parquets \
--splits track-1_development_test track-1_test track-2_development_test track-2_test
Step 4 β Fill in config.yaml
After downloading, all files live under ArA-DF-2026/models/ArA-DF-Baseline/:
ArA-DF-2026/models/ArA-DF-Baseline/
βββ config.yaml
βββ best_model.pth
βββ xlsr2_300m.pt β Fairseq XLS-R 300M frontend (bundled)
Fairseq / XLS-R checkpoint (absolute path required)
Set ckpt_path to the absolute path of xlsr2_300m.pt on your machine:
model:
frontend:
args:
source: fairseq
ckpt_path: ArA-DF-2026/models/ArA-DF-Baseline/xlsr2_300m.pt
Expand ~ to your home directory, e.g. /home/you/ArA-DF-2026/models/ArA-DF-Baseline/xlsr2_300m.pt.
Do not use a relative path β
train.py/test.pymay fail depending on the working directory.
Parquet paths (required for training / local evaluation)
Replace the placeholder parquet paths under data.train, data.val, and data.test:
data:
train:
parquet_files:
- ArA-DF-2026/parquets/aradf_train.parquet
val:
parquet_files:
- ArA-DF-2026/parquets/aradf_val.parquet
test:
dataset_names:
- track-1_development_test
- track-1_test
- track-2_development_test
- track-2_test
parquet_files:
- ArA-DF-2026/parquets/aradf_track-1_development_test.parquet
- ArA-DF-2026/parquets/aradf_track-1_test.parquet
- ArA-DF-2026/parquets/aradf_track-2_development_test.parquet
- ArA-DF-2026/parquets/aradf_track-2_test.parquet
run_inference.pyonly needs--configand--checkpointβ it does not read the parquet paths.
Competition Submission (no labels required)
For the ArA-DF 2026 shared task you submit a ZIP file containing one CSV per track. CodaBench expects a continuous bonafide score per utterance β not hard 0/1 labels.
| Track | CodaBench | Dev phase folder | Eval phase folder | CSV in ZIP |
|---|---|---|---|---|
| Track 1 | https://www.codabench.org/competitions/17138/ | data/track-1_development_test/ |
data/track-1_test/ |
track1_preds.csv |
| Track 2 | https://www.codabench.org/competitions/17139/ | data/track-2_development_test/ |
data/track-2_test/ |
track2_preds.csv |
Use the bundled run_inference.py script. DeepFense must be installed (see Step 0).
Point --audio_dir at the folder for the current CodaBench phase.
# Track 1 β development phase
# set Model to the absolute path of models/ArA-DF-Baseline
MODEL="/netscratch/yelkheir/DeepFense/ArA-DF-2026/models/ArA-DF-Baseline"
cd ArA-DF-2026/models/ArA-DF-Baseline
python run_inference.py \
--audio_dir ../../data/track-1_development_test \
--config $MODEL/config.yaml \
--checkpoint $MODEL/best_model.pth \
--output track1_preds.csv
zip submission_track1.zip track1_preds.csv
# Track 1 β evaluation phase (re-run with track-1_test/)
python run_inference.py \
--audio_dir ../../data/track-1_test \
--config $MODEL/config.yaml \
--checkpoint $MODEL/best_model.pth \
--output track1_preds.csv
zip submission_track1.zip track1_preds.csv
# Track 2 β development phase
python run_inference.py \
--audio_dir ../../data/track-2_development_test \
--config $MODEL/config.yaml \
--checkpoint $MODEL/best_model.pth \
--output track2_preds.csv
zip submission_track2.zip track2_preds.csv
# Track 2 β evaluation phase (re-run with track-2_test/)
python run_inference.py \
--audio_dir ../../data/track-2_test \
--config $MODEL/config.yaml \
--checkpoint $MODEL/best_model.pth \
--output track2_preds.csv
zip submission_track2.zip track2_preds.csv
Upload submission_track1.zip / submission_track2.zip on the My Submissions tab of the matching CodaBench competition.
Output format:
audio_id,logit
test_0000008,1.45364702
test_0000031,-1.95676708
audio_idβ utterance ID (filename stem, without.flac)logitβ bonafide score from DeepFense (CrossEntropy.get_score, same astest.py/outputs["scores"]). Label map: bonafide=1, spoof=0 (matches HF metadata) β higher = more likely bonafide- The column is named
logitto match the CodaBench submission format - No thresholding β CodaBench computes EER from the raw scores
run_inference.pysearches recursively inside--audio_dir(works with the shard sub-folders)
Note:
test.py(below) requires alabelcolumn in the parquet for local EER. Userun_inference.pyfor CodaBench submissions.
Evaluate (with labels)
cd deepfense-framework
python test.py \
--config ArA-DF-2026/models/ArA-DF-Baseline/config.yaml \
--checkpoint ArA-DF-2026/models/ArA-DF-Baseline/best_model.pth
Metrics are printed and saved to results.json. Per-sample scores are written to results/predictions/<dataset_name>_predictions.txt.
Train from scratch
cd deepfense-framework
python train.py --config ArA-DF-2026/models/ArA-DF-Baseline/config.yaml
The best checkpoint is saved to outputs/<exp_name>_<timestamp>/best_model.pth. Evaluate with:
python test.py \
--config ArA-DF-2026/models/ArA-DF-Baseline/config.yaml \
--checkpoint outputs/<exp_name>_*/best_model.pth
Multi-GPU (PyTorch DDP):
torchrun --nproc_per_node=4 train.py --config /path/to/config.yaml
Citation
@inproceedings{jung2022aasist,
title={AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks},
author={Jung, Jee-weon and others},
booktitle={ICASSP},
year={2022}
}
Links
- Model: https://huggingface.co/ArabicSpeech/ArA-DF-Baseline
- Dataset: https://huggingface.co/datasets/ArabicSpeech/ArA-DF-2026
- Track 1 CodaBench: https://www.codabench.org/competitions/17137/
- Track 2 CodaBench: https://www.codabench.org/competitions/17138/
- Organization: https://huggingface.co/ArabicSpeech
- DeepFense: https://github.com/Yaselley/deepfense-framework
- Downloads last month
- 65