Fine-Tuned SpeechBrain ECAPA-TDNN for Vietnamese Speaker Verification

This repository contains the fine-tuned checkpoint for the SpeechBrain ECAPA-TDNN speaker verification model, adapted specifically for Vietnamese speaker verification and Personal Voice Activity Detection (VAD).

Model Details

  • Base Model: speechbrain/spkrec-ecapa-voxceleb (ECAPA-TDNN)
  • Fine-Tuning Objective: Residual Embedding Adapter trained under Additive Margin Softmax (AM-Softmax) loss to adapt voice features for Vietnamese dialects.
  • Datasets: Adapted using Vietnamese speaker datasets (such as VoxVietnam and Vietnam-Celeb).

Files Included

  • best_checkpoint_rec98.pt โ€” Fine-tuned embedding model weights (state_dict for model.mods.embedding_model).
  • cohort/vi_cohort_500.pt โ€” Cohort embeddings (500 Vietnamese speakers) for Z-Score/S-Score Normalization.
  • cohort/vi_cohort_500_metadata.json โ€” Cohort metadata.

Evaluation Results on 11 EVAL_* Cases

The fine-tuned model achieves 3.43% Equal Error Rate (EER) on the Vietnamese evaluation suite. Below is the global performance matrix:

Target Recall Threshold Actual Recall Target FR % Imposter Leak %
95% 0.3990 94.34% 5.66% 3.08%
98% 0.3960 96.23% 3.77% 3.08%
99% 0.2996 98.11% 1.89% 7.69%

*At 98% target recall (Threshold = 0.3960), imposter leakage is kept extremely low at 3.08%, making it highly suitable for production Personal VAD gating.*

Usage

from speechbrain.inference.speaker import SpeakerRecognition
from huggingface_hub import hf_hub_download
import torch

# Load base model
model = SpeakerRecognition.from_hparams(
    source="speechbrain/spkrec-ecapa-voxceleb",
    run_opts={"device": "cpu"},
)

# Download and load fine-tuned weights
ckpt_path = hf_hub_download(repo_id="Nampfiev1995/pvad-speechbrain-ft", filename="best_checkpoint_rec98.pt")
state_dict = torch.load(ckpt_path, map_location="cpu")
model.mods.embedding_model.load_state_dict(state_dict, strict=True)
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using Nampfiev1995/pvad-speechbrain-ft 1