Hyperion MT Deepfake Detector

This repository packages a Hyperion multitask-trained audio deepfake detector for local inference from a cloned Hugging Face repository. The needed Hyperion runtime code is included under hyperion/, so you do not need a separate Hyperion checkout for inference.

Model

Backbone: facebook/wav2vec2-xls-r-300m
Classifier: Hyperion wav2vec2 + ResNet1D x-vector head
Training variant: multitask auxiliary head, use_grl=False
Source checkpoint epoch: model_ep0002.pth
Input: mono waveform, resampled to 16 kHz
Output labels: bonafide, spoof
Packaged checkpoint: model.pth

Setup

Create a Python 3.10 conda environment, clone the repository, and install dependencies:

conda create -n hyperion_mt_infer python=3.10 -y
conda activate hyperion_mt_infer

git lfs install
git clone https://huggingface.co/RuiRuihigh/hyperion-mt-deepfake-detector
cd hyperion-mt-deepfake-detector

pip install -r requirements.txt

Confirm that the model checkpoint was downloaded by Git LFS:

ls -lh model.pth

If model.pth is only a few KB, run:

git lfs pull

Command Line Inference

Run inference from inside the cloned repository:

python inference.py example.wav --model-dir .

By default, the script uses GPU automatically when CUDA is available. If CUDA is not available, it falls back to CPU.

For multiple audio files:

python inference.py a.wav b.wav c.wav --model-dir . --batch-size 8

Force CPU inference:

python inference.py example.wav --model-dir . --device cpu

Force GPU inference:

python inference.py example.wav --model-dir . --device cuda

Command Line Arguments

audio: one or more input audio files to classify. Example: example.wav or a.wav b.wav c.wav.
--model-dir: directory containing config.json, model.pth, and the vendored hyperion/ code. Default: ..
--device: inference device. Choices: auto, cpu, cuda. Default: auto, which uses CUDA if available and otherwise uses CPU.
--batch-size: number of audio files processed per batch when more than one input file is provided. Default: 8.

Example output:

{
  "label": "bonafide",
  "score": 0.9998749494552612,
  "scores": {
    "bonafide": 0.9998749494552612,
    "spoof": 0.00012498960131779313
  }
}

Python API

If your Python script is inside the cloned repository, load the local model directory with from_local_dir:

from inference import DeepfakeDetector

detector = DeepfakeDetector.from_local_dir(".")
result = detector.predict("example.wav")
print(result)

from_local_dir(".") also uses GPU automatically when CUDA is available. To force a device:

detector = DeepfakeDetector.from_local_dir(".", device="cpu")
# or
detector = DeepfakeDetector.from_local_dir(".", device="cuda")

Batch inference:

from inference import DeepfakeDetector

detector = DeepfakeDetector.from_local_dir(".")
results = detector.predict_batch(["a.wav", "b.wav", "c.wav"], batch_size=8)

for result in results:
    print(result)

If your Python script is outside the cloned repository, add the cloned repository to PYTHONPATH or run the script from this directory, and pass the full model directory path:

from inference import DeepfakeDetector

detector = DeepfakeDetector.from_local_dir("/path/to/hyperion-mt-deepfake-detector")
result = detector.predict("/path/to/example.wav")
print(result)

Python API Arguments

DeepfakeDetector.from_local_dir(model_dir, device=None): loads config.json and model.pth from a local model directory. device=None means auto-select CUDA if available, otherwise CPU.
DeepfakeDetector.from_pretrained(repo_id, device=None): downloads config.json and model.pth from a Hugging Face repo, then loads the model. This still requires inference.py and the vendored hyperion/ code to be importable locally.
detector.predict(audio_path): runs inference for one audio file and returns a dictionary with label, score, and scores.
detector.predict_batch(audio_paths, batch_size=8): runs batched inference for a list of audio files.

Files

inference.py: command line inference and Python API.
config.json: model metadata and runtime settings.
model.pth: checkpoint containing model_cfg and model_state_dict.
hyperion/: vendored runtime modules needed to reconstruct the model.
requirements.txt: Python dependencies.

Notes

--model-dir . means the model files are in the current directory.
CPU inference can be slow because the model uses wav2vec2-xls-r-300m.
Audio is loaded with soundfile and resampled to 16 kHz when needed.

Downloads last month: 45