metadata
language: en
thumbnail: null
tags:
- speechbrain
- embeddings
- Speaker
- Verification
- Identification
- pytorch
- ECAPA-TDNN
license: apache-2.0
datasets:
- voxceleb
metrics:
- EER
- Accuracy
widget:
- example_title: VoxCeleb Speaker id10003
src: https://cdn-media.huggingface.co/speech_samples/VoxCeleb1_00003.wav
- example_title: VoxCeleb Speaker id10004
src: https://cdn-media.huggingface.co/speech_samples/VoxCeleb_00004.wav
Speaker Identification with ECAPA-TDNN embeddings on Voxceleb
This repository provides a pretrained ECAPA-TDNN model using SpeechBrain. The system can be used to extract speaker embeddings as well. It is trained on Voxceleb 2 development data only.
Pipeline description
This system is composed of an ECAPA-TDNN model. It is a combination of convolutional and residual blocks. The embeddings are extracted using attentive statistical pooling. The system is trained with Additive Margin Softmax Loss.
Compute the speaker embeddings
The system is trained with recordings sampled at 16kHz (single channel).
import torchaudio
from speechbrain.pretrained import EncoderClassifier
classifier = EncoderClassifier.from_hparams(
source="yangwang825/ecapa-tdnn-vox2"
)
signal, fs = torchaudio.load('spk1_snt1.wav')
embeddings = classifier.encode_batch(signal)
You can find our training results (models, logs, etc) here.