Fine-Tuned SpeechBrain ECAPA-TDNN for Vietnamese Speaker Verification

This repository contains the fine-tuned checkpoint for the SpeechBrain ECAPA-TDNN speaker verification model, adapted specifically for Vietnamese speaker verification and Personal Voice Activity Detection (VAD).

Model Details

Base Model: speechbrain/spkrec-ecapa-voxceleb (ECAPA-TDNN)
Fine-Tuning Objective: Residual Embedding Adapter trained under Additive Margin Softmax (AM-Softmax) loss to adapt voice features for Vietnamese dialects.
Datasets: Adapted using Vietnamese speaker datasets (such as VoxVietnam and Vietnam-Celeb).

Files Included

best_checkpoint_rec98.pt — Fine-tuned embedding model weights (state_dict for model.mods.embedding_model).
cohort/vi_cohort_500.pt — Cohort embeddings (500 Vietnamese speakers) for Z-Score/S-Score Normalization.
cohort/vi_cohort_500_metadata.json — Cohort metadata.

Evaluation Results on 11 `EVAL_*` Cases

The fine-tuned model achieves 3.43% Equal Error Rate (EER) on the Vietnamese evaluation suite. Below is the global performance matrix:

Target Recall	Threshold	Actual Recall	Target FR %	Imposter Leak %
95%	`0.3990`	94.34%	5.66%	3.08%
98%	`0.3960`	96.23%	3.77%	3.08%
99%	`0.2996`	98.11%	1.89%	7.69%

*At 98% target recall (Threshold = 0.3960), imposter leakage is kept extremely low at 3.08%, making it highly suitable for production Personal VAD gating.*

Usage

from speechbrain.inference.speaker import SpeakerRecognition
from huggingface_hub import hf_hub_download
import torch

# Load base model
model = SpeakerRecognition.from_hparams(
    source="speechbrain/spkrec-ecapa-voxceleb",
    run_opts={"device": "cpu"},
)

# Download and load fine-tuned weights
ckpt_path = hf_hub_download(repo_id="Nampfiev1995/pvad-speechbrain-ft", filename="best_checkpoint_rec98.pt")
state_dict = torch.load(ckpt_path, map_location="cpu")
model.mods.embedding_model.load_state_dict(state_dict, strict=True)

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Nampfiev1995
/

pvad-speechbrain-ft

Fine-Tuned SpeechBrain ECAPA-TDNN for Vietnamese Speaker Verification

Model Details

Files Included

Evaluation Results on 11 `EVAL_*` Cases

Usage

Space using Nampfiev1995/pvad-speechbrain-ft 1

Fine-Tuned SpeechBrain ECAPA-TDNN for Vietnamese Speaker Verification

Model Details

Files Included

Evaluation Results on 11 EVAL_* Cases

Usage

Space using Nampfiev1995/pvad-speechbrain-ft 1

Evaluation Results on 11 `EVAL_*` Cases