RawNet2 — In-the-Wild Robustness Specialist (ITW)
This repository contains a fine-tuned :contentReference[oaicite:5]{index=5} checkpoint optimized for robustness against real-world audio degradation conditions.
The model was trained using heavy augmentation pipelines targeting:
- codec artifacts
- platform compression
- noisy environments
- variable channel conditions
- social-media-distributed audio
This checkpoint serves as the robustness-specialized parent model within the :contentReference[oaicite:6]{index=6}.
Model Details
| Property | Value |
|---|---|
| Architecture | RawNet2 |
| Domain Specialization | In-the-wild robustness |
| Training Dataset | Müller ITW Dataset |
| Input | Raw mono waveform |
| Sample Rate | 16 kHz |
| Framework | PyTorch |
Intended Use
This checkpoint is intended for:
- real-world audio deepfake detection
- robustness research
- codec-resilient anti-spoofing
- noisy environment evaluation
- weight-merging experiments
Augmentation Pipeline
Training used aggressive augmentation strategies designed to simulate real-world distribution shift.
Augmentations
- Random offset crop/pad
- Gaussian channel noise
- MP3 compression simulation
- Telephone-band filtering
- Randomized channel degradation
These augmentations were specifically designed to improve robustness against:
- TikTok compression
- Instagram re-encoding
- YouTube Shorts transcoding
- mobile-recorded speech
- noisy reposted media
Validation Performance
| Metric | Value |
|---|---|
| Validation AUC | 0.9983 |
| Validation EER | 0.0107 |
| Deepfake-Evals Generalization AUC | 0.4826 |
The large gap between validation and Deepfake-Evals performance highlights the significant distribution-shift challenge in real-world deepfake detection.
Repository Contents
| File | Description |
|---|---|
best_auc.pth |
Best validation AUC checkpoint |
latest.pth |
Latest training checkpoint |
model.py |
RawNet2 architecture definition |
Usage
Load Checkpoint
import torch
from model import RawNet
model = RawNet()
checkpoint = torch.load(
"best_auc.pth",
map_location="cpu"
)
model.load_state_dict(checkpoint)
model.eval()
Relationship to MeGA-IA
This model serves as the:
Robustness-specialized parent model
inside the MeGA-IA genetic weight merging framework.
Its role is to contribute:
- robustness priors
- codec-invariant features
- noise-tolerant representations
- distribution-shift resilience
during genetic weight fusion.
Limitations
- Validation metrics may overestimate real-world performance
- Still vulnerable to unseen synthesis methods
- Performance remains sensitive to extreme domain shift
Citation
If you use these weights, please cite:
@inproceedings{ahmad2026megaia,
title = {MeGA-IA: Genetic Algorithm-Driven Weight Merging for In-the-Wild Deepfake Detection},
author = {Ahmad, Awwab Ext},
booktitle = {Proceedings of the 23rd International Bhurban Conference on Applied Sciences and Technology (IBCAST)},
year = {2026},
note = {Under Review}
}
Please also cite the ITW dataset work:
@inproceedings{muller2022itw,
title = {In-the-Wild Audio Deepfake Detection},
author = {Müller, Nicolas and others},
booktitle = {Proceedings of IWBF},
year = {2022}
}
@inproceedings{jung2020rawnet2,
title = {RawNet2: Bootstrapping Raw Audio End-to-End Neural Network for Speaker Verification},
author = {Jung, Jee-weon and Kim, Heo-jin and Kwon, Yeun-ju and Jung, Jae-hak and Yu, Hsin-Min},
booktitle = {Proceedings of Interspeech},
year = {2020}
}
License
This repository is released under the :contentReference[oaicite:7]{index=7}.
Weights are provided for research and benchmarking purposes.
Evaluation results
- auc on In-the-Wild Audio Deepfake Datasetself-reported0.998
- eer on In-the-Wild Audio Deepfake Datasetself-reported0.011