Regarding working of model

by casafurix - opened Mar 30

Discussion

casafurix

Mar 30

Hello, I wanted to ask what the output matrix signifies, is it outputting phonemes from the input text?

Splend1dchan

Owner Mar 30

Hi,

I don't know what you mean by "output matrix". This is a phoneme level bart that takes phonemes as input and output. If you have further question, please let me know :)

Jeff

casafurix

Mar 30

Actually I tried inputting a sentence and got a matrix as output, sorry as I don't know much about its working. I am actually searching for a model which will be text-to-speech, and output an audio file containing the speech, along with the phonemes (with timestamp) from the text as well, so will this help me in getting the time-stamped phonemes as the output part?

Splend1dchan

Owner Mar 30

Hi,

Unfortunately this is a text-only model. This model is not designed to do any of the above that you mentioned. You can try Seamless-M4T by Meta AI.

Jeff

casafurix

Mar 30

Thank you for the information!
But actually I specifically need phonemes in my output, so I think your model is really close to what I need, could you please explain what the output is here? Thank you again for your time.

Splend1dchan

Owner Mar 31

Hi,

I believe that this is the encoder features of the bart model, for the decoder to attend to. If you really want phonemes as output, you should call model.generate(). However, I must emphasize that the model is not trained on the text to phoneme task so you shouldn't expect it will work, even if the output are phonemes. The model is trained on a phoneme to phoneme unsupervised task.

Jeff

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment