Regarding working of model

#2
by casafurix - opened

Hello, I wanted to ask what the output matrix signifies, is it outputting phonemes from the input text?

Hi,

I don't know what you mean by "output matrix". This is a phoneme level bart that takes phonemes as input and output. If you have further question, please let me know :)

Jeff

Actually I tried inputting a sentence and got a matrix as output, sorry as I don't know much about its working. I am actually searching for a model which will be text-to-speech, and output an audio file containing the speech, along with the phonemes (with timestamp) from the text as well, so will this help me in getting the time-stamped phonemes as the output part?

Hi,

Unfortunately this is a text-only model. This model is not designed to do any of the above that you mentioned. You can try Seamless-M4T by Meta AI.

Jeff

Thank you for the information!
But actually I specifically need phonemes in my output, so I think your model is really close to what I need, could you please explain what the output is here? Thank you again for your time.
image.png

Hi,

I believe that this is the encoder features of the bart model, for the decoder to attend to. If you really want phonemes as output, you should call model.generate(). However, I must emphasize that the model is not trained on the text to phoneme task so you shouldn't expect it will work, even if the output are phonemes. The model is trained on a phoneme to phoneme unsupervised task.

Jeff

Sign up or log in to comment