Hyperion MT Deepfake Detector
This repository packages a Hyperion multitask-trained audio deepfake detector for local inference from a cloned Hugging Face repository. The needed Hyperion runtime code is included under hyperion/, so you do not need a separate Hyperion checkout for inference.
Model
- Backbone:
facebook/wav2vec2-xls-r-300m - Classifier: Hyperion wav2vec2 + ResNet1D x-vector head
- Training variant: multitask auxiliary head,
use_grl=False - Source checkpoint epoch:
model_ep0002.pth - Input: mono waveform, resampled to 16 kHz
- Output labels:
bonafide,spoof - Packaged checkpoint:
model.pth
Setup
Create a Python 3.10 conda environment, clone the repository, and install dependencies:
conda create -n hyperion_mt_infer python=3.10 -y
conda activate hyperion_mt_infer
git lfs install
git clone https://huggingface.co/RuiRuihigh/hyperion-mt-deepfake-detector
cd hyperion-mt-deepfake-detector
pip install -r requirements.txt
Confirm that the model checkpoint was downloaded by Git LFS:
ls -lh model.pth
If model.pth is only a few KB, run:
git lfs pull
Command Line Inference
Run inference from inside the cloned repository:
python inference.py example.wav --model-dir .
By default, the script uses GPU automatically when CUDA is available. If CUDA is not available, it falls back to CPU.
For multiple audio files:
python inference.py a.wav b.wav c.wav --model-dir . --batch-size 8
Force CPU inference:
python inference.py example.wav --model-dir . --device cpu
Force GPU inference:
python inference.py example.wav --model-dir . --device cuda
Command Line Arguments
audio: one or more input audio files to classify. Example:example.wavora.wav b.wav c.wav.--model-dir: directory containingconfig.json,model.pth, and the vendoredhyperion/code. Default:..--device: inference device. Choices:auto,cpu,cuda. Default:auto, which uses CUDA if available and otherwise uses CPU.--batch-size: number of audio files processed per batch when more than one input file is provided. Default:8.
Example output:
{
"label": "bonafide",
"score": 0.9998749494552612,
"scores": {
"bonafide": 0.9998749494552612,
"spoof": 0.00012498960131779313
}
}
Python API
If your Python script is inside the cloned repository, load the local model directory with from_local_dir:
from inference import DeepfakeDetector
detector = DeepfakeDetector.from_local_dir(".")
result = detector.predict("example.wav")
print(result)
from_local_dir(".") also uses GPU automatically when CUDA is available. To force a device:
detector = DeepfakeDetector.from_local_dir(".", device="cpu")
# or
detector = DeepfakeDetector.from_local_dir(".", device="cuda")
Batch inference:
from inference import DeepfakeDetector
detector = DeepfakeDetector.from_local_dir(".")
results = detector.predict_batch(["a.wav", "b.wav", "c.wav"], batch_size=8)
for result in results:
print(result)
If your Python script is outside the cloned repository, add the cloned repository to PYTHONPATH or run the script from this directory, and pass the full model directory path:
from inference import DeepfakeDetector
detector = DeepfakeDetector.from_local_dir("/path/to/hyperion-mt-deepfake-detector")
result = detector.predict("/path/to/example.wav")
print(result)
Python API Arguments
DeepfakeDetector.from_local_dir(model_dir, device=None): loadsconfig.jsonandmodel.pthfrom a local model directory.device=Nonemeans auto-select CUDA if available, otherwise CPU.DeepfakeDetector.from_pretrained(repo_id, device=None): downloadsconfig.jsonandmodel.pthfrom a Hugging Face repo, then loads the model. This still requiresinference.pyand the vendoredhyperion/code to be importable locally.detector.predict(audio_path): runs inference for one audio file and returns a dictionary withlabel,score, andscores.detector.predict_batch(audio_paths, batch_size=8): runs batched inference for a list of audio files.
Files
inference.py: command line inference and Python API.config.json: model metadata and runtime settings.model.pth: checkpoint containingmodel_cfgandmodel_state_dict.hyperion/: vendored runtime modules needed to reconstruct the model.requirements.txt: Python dependencies.
Notes
--model-dir .means the model files are in the current directory.- CPU inference can be slow because the model uses
wav2vec2-xls-r-300m. - Audio is loaded with
soundfileand resampled to 16 kHz when needed.
- Downloads last month
- 45