ArA-DF-Baseline

Arabic deepfake audio detector for the ArA-DF 2026 shared task, published under the ArabicSpeech organization.

Architecture: wav2vec2 XLS-R 300M frontend + AASIST backend.

Benchmark results

Split	EER ↓ (%)	# utterances
track-1_development_test	14.65	16,023
track-1_test	14.54	144,210
track-2_development_test	27.21	14,193
track-2_test	27.11	127,746

track-1_development_test / track-1_test — Track 1 development / evaluation phases (CodaBench). track-2_development_test / track-2_test — Track 2 development / evaluation phases (CodaBench).

Repository contents

File	Description
`best_model.pth`	Trained DeepFense checkpoint
`xlsr2_300m.pt`	Fairseq XLS-R 300M frontend weights
`config.yaml`	DeepFense config — fill in your `parquet_files` paths before use
`build_parquets.py`	Standalone parquet generation script (no DeepFense install needed)
`results.json`	Published metrics on the two test tracks above

Architecture

Raw waveform (16 kHz)
    │
    ▼
wav2vec2 XLS-R 300M  (xlsr2_300m.pt, fairseq)
    frame-level features  →  (T, 1024)
    │
    ▼
AASIST backend
    Linear(1024 → 128)
    RawNet2 encoder  — 6 residual blocks on a 2-D (freq × time) map
    Spectral attention  — soft attention along frequency axis → spectral graph nodes
    Temporal attention  — soft attention along time axis     → temporal graph nodes
    GAT_S / GAT_T       — independent graph attention on each graph
    HtrgGAT ×4          — heterogeneous cross-graph attention with learnable master nodes
    Graph pooling + readout: max & mean over S, T, and master → 160-d embedding
    │
    ▼
CrossEntropy head  →  P(bonafide) / P(spoof)

The AASIST backend is from Jung et al., ICASSP 2022. This checkpoint uses mean pooling on XLS-R frame features before the backend.

Step 0 — Install DeepFense

git clone https://github.com/Yaselley/deepfense-framework.git
cd deepfense-framework
conda create -n deepfense python=3.10 -y
conda activate deepfense
pip install deepfense

Download this model (same root as the dataset — see Step 1):

huggingface-cli download ArabicSpeech/ArA-DF-Baseline --local-dir ArA-DF-2026/models/ArA-DF-Baseline

hf download ArabicSpeech/ArA-DF-Baseline --local-dir ArA-DF-2026/models/ArA-DF-Baseline

All tutorial paths use ArA-DF-2026 under your home directory (no root/write access needed). On a cluster you can substitute e.g. /data/ArA-DF-2026 if you have write permission there — keep paths consistent across steps.

Step 1 — Download the ArA-DF 2026 dataset

The audio is distributed as WebDataset TAR shards on Hugging Face.

huggingface-cli download ArabicSpeech/ArA-DF-2026 \
  --repo-type dataset \
  --local-dir ArA-DF-2026

hf download ArabicSpeech/ArA-DF-2026 \
  --repo-type dataset \
  --local-dir ArA-DF-2026

After the download, the layout on disk is:

ArA-DF-2026/
├── data/
│   ├── train/
│   ├── dev/
│   ├── track-1_development_test/
│   ├── track-1_test/
│   ├── track-2_development_test/
│   └── track-2_test/
├── metadata/
│   ├── train.parquet
│   ├── dev.parquet
│   ├── track-1_development_test.parquet
│   ├── track-1_test.parquet
│   ├── track-2_development_test.parquet
│   └── track-2_test.parquet
└── models/
    └── ArA-DF-Baseline/          ← from Step 0
        ├── config.yaml
        ├── best_model.pth
        └── xlsr2_300m.pt

Step 2 — Extract the audio

Extract every TAR in-place (same command for all splits):

for split in train dev track-1_development_test track-1_test track-2_development_test track-2_test; do
  cd ArA-DF-2026/data/$split
  for tar in *.tar; do tar -xf "$tar"; done
  cd -
done

After extraction, FLACs may land flat (directly in the split folder) or in shard sub-folders — both work. build_parquets.py auto-detects either layout.

Flat example:

ArA-DF-2026/data/track-1_test/
├── track-1_test-000000.tar
├── test_0000008.flac
├── test_0000031.flac
└── ...

Argument	Value
`--data_root`	`ArA-DF-2026/data`
`--meta_root`	`ArA-DF-2026/metadata`

Step 3 — Generate DeepFense parquet files

Parquet files hold the metadata DeepFense reads at training/evaluation time. Each row has: ID, path (absolute path to audio), label (1 as bonafide or 0 as spoof), dataset_name.

Label convention (HF metadata, parquets, and this model's training):

Class	HF `label`	DeepFense `label_map`
bonafide	`1`	`1` (bonafide)
spoof	`0`	`0` (spoof)

A standalone build_parquets.py script is included in this repo. It only needs pandas and pyarrow:

pip install pandas pyarrow

cd ArA-DF-2026/models/ArA-DF-Baseline
python build_parquets.py \
  --data_root  ../../data \
  --meta_root  ../../metadata \
  --output_dir ../../parquets

Output:

ArA-DF-2026/parquets/
├── aradf_train.parquet
├── aradf_val.parquet
├── aradf_track-1_development_test.parquet   ← Track 1, CodaBench dev phase
├── aradf_track-1_test.parquet         ← Track 1, CodaBench eval phase
├── aradf_track-2_development_test.parquet   ← Track 2, CodaBench dev phase
└── aradf_track-2_test.parquet         ← Track 2, CodaBench eval phase

Test-only (skip train/dev):

python build_parquets.py \
  --data_root  ../../data \
  --meta_root  ../../metadata \
  --output_dir ../../parquets \
  --splits track-1_development_test track-1_test track-2_development_test track-2_test

Step 4 — Fill in `config.yaml`

After downloading, all files live under ArA-DF-2026/models/ArA-DF-Baseline/:

ArA-DF-2026/models/ArA-DF-Baseline/
├── config.yaml
├── best_model.pth
└── xlsr2_300m.pt          ← Fairseq XLS-R 300M frontend (bundled)

Fairseq / XLS-R checkpoint (absolute path required)

Set ckpt_path to the absolute path of xlsr2_300m.pt on your machine:

model:
  frontend:
    args:
      source: fairseq
      ckpt_path: ArA-DF-2026/models/ArA-DF-Baseline/xlsr2_300m.pt

Expand ~ to your home directory, e.g. /home/you/ArA-DF-2026/models/ArA-DF-Baseline/xlsr2_300m.pt.

Do not use a relative path — train.py / test.py may fail depending on the working directory.

Parquet paths (required for training / local evaluation)

Replace the placeholder parquet paths under data.train, data.val, and data.test:

data:
  train:
    parquet_files:
      - ArA-DF-2026/parquets/aradf_train.parquet
  val:
    parquet_files:
      - ArA-DF-2026/parquets/aradf_val.parquet
  test:
    dataset_names:
      - track-1_development_test
      - track-1_test
      - track-2_development_test
      - track-2_test
    parquet_files:
      - ArA-DF-2026/parquets/aradf_track-1_development_test.parquet
      - ArA-DF-2026/parquets/aradf_track-1_test.parquet
      - ArA-DF-2026/parquets/aradf_track-2_development_test.parquet
      - ArA-DF-2026/parquets/aradf_track-2_test.parquet

run_inference.py only needs --config and --checkpoint — it does not read the parquet paths.

Competition Submission (no labels required)

For the ArA-DF 2026 shared task you submit a ZIP file containing one CSV per track. CodaBench expects a continuous bonafide score per utterance — not hard 0/1 labels.

Track	CodaBench	Dev phase folder	Eval phase folder	CSV in ZIP
Track 1	https://www.codabench.org/competitions/17138/	`data/track-1_development_test/`	`data/track-1_test/`	`track1_preds.csv`
Track 2	https://www.codabench.org/competitions/17139/	`data/track-2_development_test/`	`data/track-2_test/`	`track2_preds.csv`

Use the bundled run_inference.py script. DeepFense must be installed (see Step 0). Point --audio_dir at the folder for the current CodaBench phase.


# Track 1 — development phase
# set Model to the absolute path of models/ArA-DF-Baseline 
MODEL="/netscratch/yelkheir/DeepFense/ArA-DF-2026/models/ArA-DF-Baseline"
cd ArA-DF-2026/models/ArA-DF-Baseline

python run_inference.py \
  --audio_dir  ../../data/track-1_development_test \
  --config     $MODEL/config.yaml \
  --checkpoint $MODEL/best_model.pth \
  --output     track1_preds.csv
zip submission_track1.zip track1_preds.csv

# Track 1 — evaluation phase (re-run with track-1_test/)
python run_inference.py \
  --audio_dir  ../../data/track-1_test \
  --config     $MODEL/config.yaml \
  --checkpoint $MODEL/best_model.pth \
  --output     track1_preds.csv
zip submission_track1.zip track1_preds.csv

# Track 2 — development phase
python run_inference.py \
  --audio_dir  ../../data/track-2_development_test \
  --config     $MODEL/config.yaml \
  --checkpoint $MODEL/best_model.pth \
  --output     track2_preds.csv
zip submission_track2.zip track2_preds.csv

# Track 2 — evaluation phase (re-run with track-2_test/)
python run_inference.py \
  --audio_dir  ../../data/track-2_test \
  --config     $MODEL/config.yaml \
  --checkpoint $MODEL/best_model.pth \
  --output     track2_preds.csv
zip submission_track2.zip track2_preds.csv

Upload submission_track1.zip / submission_track2.zip on the My Submissions tab of the matching CodaBench competition.

Output format:

audio_id,logit
test_0000008,1.45364702
test_0000031,-1.95676708

audio_id — utterance ID (filename stem, without .flac)
logit — bonafide score from DeepFense (CrossEntropy.get_score, same as test.py / outputs["scores"]). Label map: bonafide=1, spoof=0 (matches HF metadata) → higher = more likely bonafide
The column is named logit to match the CodaBench submission format
No thresholding — CodaBench computes EER from the raw scores
run_inference.py searches recursively inside --audio_dir (works with the shard sub-folders)

Note: test.py (below) requires a label column in the parquet for local EER. Use run_inference.py for CodaBench submissions.

Evaluate (with labels)

cd deepfense-framework
python test.py \
  --config ArA-DF-2026/models/ArA-DF-Baseline/config.yaml \
  --checkpoint ArA-DF-2026/models/ArA-DF-Baseline/best_model.pth

Metrics are printed and saved to results.json. Per-sample scores are written to results/predictions/<dataset_name>_predictions.txt.

Train from scratch

cd deepfense-framework
python train.py --config ArA-DF-2026/models/ArA-DF-Baseline/config.yaml

The best checkpoint is saved to outputs/<exp_name>_<timestamp>/best_model.pth. Evaluate with:

python test.py \
  --config ArA-DF-2026/models/ArA-DF-Baseline/config.yaml \
  --checkpoint outputs/<exp_name>_*/best_model.pth

Multi-GPU (PyTorch DDP):

torchrun --nproc_per_node=4 train.py --config /path/to/config.yaml

Citation

@inproceedings{jung2022aasist,
  title={AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks},
  author={Jung, Jee-weon and others},
  booktitle={ICASSP},
  year={2022}
}

Dataset used to train ArabicSpeech/ArA-DF-Baseline

Space using ArabicSpeech/ArA-DF-Baseline 1

Paper for ArabicSpeech/ArA-DF-Baseline

AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks

Paper • 2110.01200 • Published Oct 4, 2021

ArabicSpeech
/

ArA-DF-Baseline

ArA-DF-Baseline

Benchmark results

Repository contents

Architecture

Step 0 — Install DeepFense

Step 1 — Download the ArA-DF 2026 dataset

Step 2 — Extract the audio

Step 3 — Generate DeepFense parquet files

Step 4 — Fill in `config.yaml`

Fairseq / XLS-R checkpoint (absolute path required)

Parquet paths (required for training / local evaluation)

Competition Submission (no labels required)

Evaluate (with labels)

Train from scratch

Citation

Links

Dataset used to train ArabicSpeech/ArA-DF-Baseline

Space using ArabicSpeech/ArA-DF-Baseline 1

Paper for ArabicSpeech/ArA-DF-Baseline

AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks

ArA-DF-Baseline

Benchmark results

Repository contents

Architecture

Step 0 — Install DeepFense

Step 1 — Download the ArA-DF 2026 dataset

Step 2 — Extract the audio

Step 3 — Generate DeepFense parquet files

Step 4 — Fill in config.yaml

Fairseq / XLS-R checkpoint (absolute path required)

Parquet paths (required for training / local evaluation)

Competition Submission (no labels required)

Evaluate (with labels)

Train from scratch

Citation

Links

Dataset used to train ArabicSpeech/ArA-DF-Baseline

Space using ArabicSpeech/ArA-DF-Baseline 1

Paper for ArabicSpeech/ArA-DF-Baseline

Step 4 — Fill in `config.yaml`