ArA-DF-Baseline

Arabic deepfake audio detector for the ArA-DF 2026 shared task, published under the ArabicSpeech organization.

Architecture: wav2vec2 XLS-R 300M frontend + AASIST backend.

Benchmark results

Split EER ↓ (%) # utterances
track-1_development_test 14.65 16,023
track-1_test 14.54 144,210
track-2_development_test 27.21 14,193
track-2_test 27.11 127,746

track-1_development_test / track-1_test β€” Track 1 development / evaluation phases (CodaBench). track-2_development_test / track-2_test β€” Track 2 development / evaluation phases (CodaBench).

Repository contents

File Description
best_model.pth Trained DeepFense checkpoint
xlsr2_300m.pt Fairseq XLS-R 300M frontend weights
config.yaml DeepFense config β€” fill in your parquet_files paths before use
build_parquets.py Standalone parquet generation script (no DeepFense install needed)
results.json Published metrics on the two test tracks above

Architecture

Raw waveform (16 kHz)
    β”‚
    β–Ό
wav2vec2 XLS-R 300M  (xlsr2_300m.pt, fairseq)
    frame-level features  β†’  (T, 1024)
    β”‚
    β–Ό
AASIST backend
    Linear(1024 β†’ 128)
    RawNet2 encoder  β€” 6 residual blocks on a 2-D (freq Γ— time) map
    Spectral attention  β€” soft attention along frequency axis β†’ spectral graph nodes
    Temporal attention  β€” soft attention along time axis     β†’ temporal graph nodes
    GAT_S / GAT_T       β€” independent graph attention on each graph
    HtrgGAT Γ—4          β€” heterogeneous cross-graph attention with learnable master nodes
    Graph pooling + readout: max & mean over S, T, and master β†’ 160-d embedding
    β”‚
    β–Ό
CrossEntropy head  β†’  P(bonafide) / P(spoof)

The AASIST backend is from Jung et al., ICASSP 2022. This checkpoint uses mean pooling on XLS-R frame features before the backend.


Step 0 β€” Install DeepFense

git clone https://github.com/Yaselley/deepfense-framework.git
cd deepfense-framework
conda create -n deepfense python=3.10 -y
conda activate deepfense
pip install deepfense

Download this model (same root as the dataset β€” see Step 1):

huggingface-cli download ArabicSpeech/ArA-DF-Baseline --local-dir ArA-DF-2026/models/ArA-DF-Baseline

or

hf download ArabicSpeech/ArA-DF-Baseline --local-dir ArA-DF-2026/models/ArA-DF-Baseline

All tutorial paths use ArA-DF-2026 under your home directory (no root/write access needed). On a cluster you can substitute e.g. /data/ArA-DF-2026 if you have write permission there β€” keep paths consistent across steps.


Step 1 β€” Download the ArA-DF 2026 dataset

The audio is distributed as WebDataset TAR shards on Hugging Face.

huggingface-cli download ArabicSpeech/ArA-DF-2026 \
  --repo-type dataset \
  --local-dir ArA-DF-2026

or

hf download ArabicSpeech/ArA-DF-2026 \
  --repo-type dataset \
  --local-dir ArA-DF-2026

After the download, the layout on disk is:

ArA-DF-2026/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ dev/
β”‚   β”œβ”€β”€ track-1_development_test/
β”‚   β”œβ”€β”€ track-1_test/
β”‚   β”œβ”€β”€ track-2_development_test/
β”‚   └── track-2_test/
β”œβ”€β”€ metadata/
β”‚   β”œβ”€β”€ train.parquet
β”‚   β”œβ”€β”€ dev.parquet
β”‚   β”œβ”€β”€ track-1_development_test.parquet
β”‚   β”œβ”€β”€ track-1_test.parquet
β”‚   β”œβ”€β”€ track-2_development_test.parquet
β”‚   └── track-2_test.parquet
└── models/
    └── ArA-DF-Baseline/          ← from Step 0
        β”œβ”€β”€ config.yaml
        β”œβ”€β”€ best_model.pth
        └── xlsr2_300m.pt

Step 2 β€” Extract the audio

Extract every TAR in-place (same command for all splits):

for split in train dev track-1_development_test track-1_test track-2_development_test track-2_test; do
  cd ArA-DF-2026/data/$split
  for tar in *.tar; do tar -xf "$tar"; done
  cd -
done

After extraction, FLACs may land flat (directly in the split folder) or in shard sub-folders β€” both work. build_parquets.py auto-detects either layout.

Flat example:

ArA-DF-2026/data/track-1_test/
β”œβ”€β”€ track-1_test-000000.tar
β”œβ”€β”€ test_0000008.flac
β”œβ”€β”€ test_0000031.flac
└── ...
Argument Value
--data_root ArA-DF-2026/data
--meta_root ArA-DF-2026/metadata

Step 3 β€” Generate DeepFense parquet files

Parquet files hold the metadata DeepFense reads at training/evaluation time. Each row has: ID, path (absolute path to audio), label (1 as bonafide or 0 as spoof), dataset_name.

Label convention (HF metadata, parquets, and this model's training):

Class HF label DeepFense label_map
bonafide 1 1 (bonafide)
spoof 0 0 (spoof)

A standalone build_parquets.py script is included in this repo. It only needs pandas and pyarrow:

pip install pandas pyarrow

cd ArA-DF-2026/models/ArA-DF-Baseline
python build_parquets.py \
  --data_root  ../../data \
  --meta_root  ../../metadata \
  --output_dir ../../parquets

Output:

ArA-DF-2026/parquets/
β”œβ”€β”€ aradf_train.parquet
β”œβ”€β”€ aradf_val.parquet
β”œβ”€β”€ aradf_track-1_development_test.parquet   ← Track 1, CodaBench dev phase
β”œβ”€β”€ aradf_track-1_test.parquet         ← Track 1, CodaBench eval phase
β”œβ”€β”€ aradf_track-2_development_test.parquet   ← Track 2, CodaBench dev phase
└── aradf_track-2_test.parquet         ← Track 2, CodaBench eval phase

Test-only (skip train/dev):

python build_parquets.py \
  --data_root  ../../data \
  --meta_root  ../../metadata \
  --output_dir ../../parquets \
  --splits track-1_development_test track-1_test track-2_development_test track-2_test

Step 4 β€” Fill in config.yaml

After downloading, all files live under ArA-DF-2026/models/ArA-DF-Baseline/:

ArA-DF-2026/models/ArA-DF-Baseline/
β”œβ”€β”€ config.yaml
β”œβ”€β”€ best_model.pth
└── xlsr2_300m.pt          ← Fairseq XLS-R 300M frontend (bundled)

Fairseq / XLS-R checkpoint (absolute path required)

Set ckpt_path to the absolute path of xlsr2_300m.pt on your machine:

model:
  frontend:
    args:
      source: fairseq
      ckpt_path: ArA-DF-2026/models/ArA-DF-Baseline/xlsr2_300m.pt

Expand ~ to your home directory, e.g. /home/you/ArA-DF-2026/models/ArA-DF-Baseline/xlsr2_300m.pt.

Do not use a relative path β€” train.py / test.py may fail depending on the working directory.

Parquet paths (required for training / local evaluation)

Replace the placeholder parquet paths under data.train, data.val, and data.test:

data:
  train:
    parquet_files:
      - ArA-DF-2026/parquets/aradf_train.parquet
  val:
    parquet_files:
      - ArA-DF-2026/parquets/aradf_val.parquet
  test:
    dataset_names:
      - track-1_development_test
      - track-1_test
      - track-2_development_test
      - track-2_test
    parquet_files:
      - ArA-DF-2026/parquets/aradf_track-1_development_test.parquet
      - ArA-DF-2026/parquets/aradf_track-1_test.parquet
      - ArA-DF-2026/parquets/aradf_track-2_development_test.parquet
      - ArA-DF-2026/parquets/aradf_track-2_test.parquet

run_inference.py only needs --config and --checkpoint β€” it does not read the parquet paths.


Competition Submission (no labels required)

For the ArA-DF 2026 shared task you submit a ZIP file containing one CSV per track. CodaBench expects a continuous bonafide score per utterance β€” not hard 0/1 labels.

Track CodaBench Dev phase folder Eval phase folder CSV in ZIP
Track 1 https://www.codabench.org/competitions/17138/ data/track-1_development_test/ data/track-1_test/ track1_preds.csv
Track 2 https://www.codabench.org/competitions/17139/ data/track-2_development_test/ data/track-2_test/ track2_preds.csv

Use the bundled run_inference.py script. DeepFense must be installed (see Step 0). Point --audio_dir at the folder for the current CodaBench phase.


# Track 1 β€” development phase
# set Model to the absolute path of models/ArA-DF-Baseline 
MODEL="/netscratch/yelkheir/DeepFense/ArA-DF-2026/models/ArA-DF-Baseline"
cd ArA-DF-2026/models/ArA-DF-Baseline

python run_inference.py \
  --audio_dir  ../../data/track-1_development_test \
  --config     $MODEL/config.yaml \
  --checkpoint $MODEL/best_model.pth \
  --output     track1_preds.csv
zip submission_track1.zip track1_preds.csv

# Track 1 β€” evaluation phase (re-run with track-1_test/)
python run_inference.py \
  --audio_dir  ../../data/track-1_test \
  --config     $MODEL/config.yaml \
  --checkpoint $MODEL/best_model.pth \
  --output     track1_preds.csv
zip submission_track1.zip track1_preds.csv

# Track 2 β€” development phase
python run_inference.py \
  --audio_dir  ../../data/track-2_development_test \
  --config     $MODEL/config.yaml \
  --checkpoint $MODEL/best_model.pth \
  --output     track2_preds.csv
zip submission_track2.zip track2_preds.csv

# Track 2 β€” evaluation phase (re-run with track-2_test/)
python run_inference.py \
  --audio_dir  ../../data/track-2_test \
  --config     $MODEL/config.yaml \
  --checkpoint $MODEL/best_model.pth \
  --output     track2_preds.csv
zip submission_track2.zip track2_preds.csv

Upload submission_track1.zip / submission_track2.zip on the My Submissions tab of the matching CodaBench competition.

Output format:

audio_id,logit
test_0000008,1.45364702
test_0000031,-1.95676708
  • audio_id β€” utterance ID (filename stem, without .flac)
  • logit β€” bonafide score from DeepFense (CrossEntropy.get_score, same as test.py / outputs["scores"]). Label map: bonafide=1, spoof=0 (matches HF metadata) β†’ higher = more likely bonafide
  • The column is named logit to match the CodaBench submission format
  • No thresholding β€” CodaBench computes EER from the raw scores
  • run_inference.py searches recursively inside --audio_dir (works with the shard sub-folders)

Note: test.py (below) requires a label column in the parquet for local EER. Use run_inference.py for CodaBench submissions.


Evaluate (with labels)

cd deepfense-framework
python test.py \
  --config ArA-DF-2026/models/ArA-DF-Baseline/config.yaml \
  --checkpoint ArA-DF-2026/models/ArA-DF-Baseline/best_model.pth

Metrics are printed and saved to results.json. Per-sample scores are written to results/predictions/<dataset_name>_predictions.txt.


Train from scratch

cd deepfense-framework
python train.py --config ArA-DF-2026/models/ArA-DF-Baseline/config.yaml

The best checkpoint is saved to outputs/<exp_name>_<timestamp>/best_model.pth. Evaluate with:

python test.py \
  --config ArA-DF-2026/models/ArA-DF-Baseline/config.yaml \
  --checkpoint outputs/<exp_name>_*/best_model.pth

Multi-GPU (PyTorch DDP):

torchrun --nproc_per_node=4 train.py --config /path/to/config.yaml

Citation

@inproceedings{jung2022aasist,
  title={AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks},
  author={Jung, Jee-weon and others},
  booktitle={ICASSP},
  year={2022}
}

Links

Downloads last month
65
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train ArabicSpeech/ArA-DF-Baseline

Space using ArabicSpeech/ArA-DF-Baseline 1

Paper for ArabicSpeech/ArA-DF-Baseline