Instructions to use Nampfiev1995/pvad-speechbrain-ft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- speechbrain
How to use Nampfiev1995/pvad-speechbrain-ft with speechbrain:
# interface not specified in config.json
- Notebooks
- Google Colab
- Kaggle
Fine-Tuned SpeechBrain ECAPA-TDNN for Vietnamese Speaker Verification
This repository contains the fine-tuned checkpoint for the SpeechBrain ECAPA-TDNN speaker verification model, adapted specifically for Vietnamese speaker verification and Personal Voice Activity Detection (VAD).
Model Details
- Base Model:
speechbrain/spkrec-ecapa-voxceleb(ECAPA-TDNN) - Fine-Tuning Objective: Residual Embedding Adapter trained under Additive Margin Softmax (AM-Softmax) loss to adapt voice features for Vietnamese dialects.
- Datasets: Adapted using Vietnamese speaker datasets (such as
VoxVietnamandVietnam-Celeb).
Files Included
best_checkpoint_rec98.ptโ Fine-tuned embedding model weights (state_dict formodel.mods.embedding_model).cohort/vi_cohort_500.ptโ Cohort embeddings (500 Vietnamese speakers) for Z-Score/S-Score Normalization.cohort/vi_cohort_500_metadata.jsonโ Cohort metadata.
Evaluation Results on 11 EVAL_* Cases
The fine-tuned model achieves 3.43% Equal Error Rate (EER) on the Vietnamese evaluation suite. Below is the global performance matrix:
| Target Recall | Threshold | Actual Recall | Target FR % | Imposter Leak % |
|---|---|---|---|---|
| 95% | 0.3990 |
94.34% | 5.66% | 3.08% |
| 98% | 0.3960 |
96.23% | 3.77% | 3.08% |
| 99% | 0.2996 |
98.11% | 1.89% | 7.69% |
*At 98% target recall (Threshold = 0.3960), imposter leakage is kept extremely low at 3.08%, making it highly suitable for production Personal VAD gating.*
Usage
from speechbrain.inference.speaker import SpeakerRecognition
from huggingface_hub import hf_hub_download
import torch
# Load base model
model = SpeakerRecognition.from_hparams(
source="speechbrain/spkrec-ecapa-voxceleb",
run_opts={"device": "cpu"},
)
# Download and load fine-tuned weights
ckpt_path = hf_hub_download(repo_id="Nampfiev1995/pvad-speechbrain-ft", filename="best_checkpoint_rec98.pt")
state_dict = torch.load(ckpt_path, map_location="cpu")
model.mods.embedding_model.load_state_dict(state_dict, strict=True)
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support