ECAPA2 Speaker Embedding Extractor

Link to paper: ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings.

ECAPA2 is a hybrid neural network architecture and training strategy for generating robust speaker embeddings. The provided pre-trained model has an easy-to-use API to extract speaker embeddings and other hierarchical features. More information can be found in our original ECAPA2 paper.

Usage Guide

Download model

You need to install the huggingface_hub package to download the ECAPA2 model:

pip install --upgrade huggingface_hub

Or with Conda:

conda install -c conda-forge huggingface_hub

Download model:

from huggingface_hub import hf_hub_download

# automatically checks for cached file, optionally set `cache_dir` location
model_file = hf_hub_download(repo_id='Jenthe/ECAPA2', filename='ecapa2.pt', cache_dir=None)

Speaker Embedding Extraction

Extracting speaker embeddings is easy and only requires a few lines of code:

import torch
import torchaudio

ecapa2 = torch.jit.load(model_file, map_location='cpu')
audio, sr = torchaudio.load('sample.wav') # sample rate of 16 kHz expected

embedding = ecapa2(audio)

For faster, 16-bit half-precision CUDA inference (recommended):

import torch
import torchaudio

ecapa2 = torch.jit.load(model_file, map_location='cuda')
ecapa2.half() # optional, but results in faster inference
audio, sr = torchaudio.load('sample.wav') # sample rate of 16 kHz expected

embedding = ecapa2(audio)

The initial calls to the JIT-model can in some cases take a very long time because of optimization attempts of the compiler. If you have issues, the JIT-optimizer can be disabled as following:

with torch.jit.optimized_execution(False):
  embedding = ecapa2(audio)

There is no need for ecapa2.eval() or torch.no_grad(), this is done automatically.

Citation

BibTeX:

@INPROCEEDINGS{ecapa2,
  author={Jenthe Thienpondt and Kris Demuynck},
  booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)}, 
  title={ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings}, 
  year={2023},
  volume={},
  number={}
}

APA:

Jenthe Thienpondt, Kris Demuynck (2023). ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Contact

Name: Jenthe Thienpondt
E-mail: jenthe.thienpondt@ugent.be

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.