How to use this model to just get audio embedding?

#4
by mohitmayank - opened

Is it possible to use this model for speaker identification and for the same how to use this model to just get audio embedding?

NVIDIA org
β€’
edited Feb 1

This model does not use speaker representation (speaker embeddings used in speaker verification tasks, e.g. x-vector) so unfortunately you cannot use it for speaker recognition tasks. Try using TitaNet (https://huggingface.co/nvidia/speakerverification_en_titanet_large) in NeMo toolkit which is speaker embedding extractor. You can use speaker embedding on top of Sortformer diarizer's output (filtering out silence etc.).

mohitmayank changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment