Model Dataset

#1
by Sadique5 - opened

I'd like to get the dataset used to train this model. It performs really bad so I'm also thinking of fine tuning it on my own dataset

Hello @Sadique5 , section 3 of the paper is all about the dataset creation.

Thanks. Current model while reading ignores several letters

Hey @Sadique5 ! The model was trained on normalised text (i.e. excluding casing and punctuation), hence why these are not respected by the model when you perform inference. You can try using the hyphen character - to add pauses in the speech, since it is present in the vocabulary of the Arabic MMS TTS checkpoint and serves mostly this purpose. Otherwise, ensure all your text is lowercased and in the vocabulary of the model: https://huggingface.co/facebook/mms-tts-ara/blob/main/vocab.json

Sign up or log in to comment