How to use this model to just get audio embedding?

by mohitmayank - opened Jan 28

Jan 28

Is it possible to use this model for speaker identification and for the same how to use this model to just get audio embedding?

taejinp

NVIDIA org Jan 28

•

edited Feb 1

This model does not use speaker representation (speaker embeddings used in speaker verification tasks, e.g. x-vector) so unfortunately you cannot use it for speaker recognition tasks. Try using TitaNet (https://huggingface.co/nvidia/speakerverification_en_titanet_large) in NeMo toolkit which is speaker embedding extractor. You can use speaker embedding on top of Sortformer diarizer's output (filtering out silence etc.).

mohitmayank changed discussion status to closed Jan 29

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment