Pretrained ECAPA-TDNN with 256 as output?

#9
by Khumbaba - opened

Hello!

Many in the literature on Personalized Speech Enhancement (PSE) use your pretrained ECAPA-TDNN model to generate the embeddings from enrollment utterances (for instance, it's the goto embedding model for Microsoft's 5th DNS challenge, check out its github). The issue is, they always mention an embedding size of 256 which boggles me, because your pretrained parameters has clearly 192 as output size for the encoder. Am I missing something?

(btw, thank you for your valuable open source contributions!)

Sign up or log in to comment