How to fine fine MMS text to speech models?

#1
by allandclive - opened

Is there any way to work around fine tuning MMS-TTS models?

AI at Meta org

@Matthijs and @sanchit-gandhi are working on this. We hope to have it soon.

cc. @bowenshi

Any updates on this??

It's ongoing! The model addition is the final review stages, then we can work on a fine-tuning script (cc @ylacombe )

any updates on this? @sanchit-gandhi ?

Following

any updates on this please?

Hi there, I'm currently working on finetuning VITS and MMS, stay tuned!

@ylacombe any updates on this?

Hey @arbianqx , it's still a WIP.

If you are interested, here are the two ongoing PRs on which I'm working on: https://github.com/huggingface/transformers/pull/27340 https://github.com/huggingface/transformers/pull/27244
Note that as long as the PRs are not merged, I can't really give you support on this.

On another note, what languages are you interested in? Finetuning MMS is an interesting task, and I'm trying to understand which languages are the most interesting to work on!

Hey @ylacombe , thanks for quick reply.

Well, I'll be patiently wait on this.

Indeed it is. I'm planning to finetune this, for albanian language (The code for this on MMS was "sqi" if I'm not mistaken).

Hi @ylacombe any update on this?

Hey, I haven't made any official announcements yet, but you can already find what you want in the following library: https://github.com/ylacombe/finetune-hf-vits

Don't hesitate to give feedback and share your finetuned models if you can!

Hey, Thank you very much @ylacombe and the team. Appreciate. πŸ‘πŸΎ

Hi @ylacombe , hope you're doing good. Can you please help me, I want to finetune a MMS-TTS (facebook/mms-tts-urd-script_arabic), it's for urdu language. I actually want it to finetune on a specific speaker audio. How can I create a speaker embedding for the speaker and finetune the model so it provide me the audio of that particular speaker. Also, please tell me how can I do it if I want multiple speaker in the same model. Your help would be appreciated. Happy New Year!!!

@sanchit-gandhi Hi Brother, any update on the finetuning of MMS-TTS (facebook/mms-tts-urd-script-arabic)?

@sanchit-gandhi Hi Brother, any update on the finetuning of MMS-TTS (facebook/mms-tts-urd-script-arabic)?

Yeah, it's working great!!! thanks to @ylacombe

@syedmuhammad Thank you for the response, can you please refer me the link. Thanks

@syedmuhammad Thank you for the response, can you please refer me the link. Thanks

kindly refer the repo: https://github.com/ylacombe/finetune-hf-vits

@syedmuhammad Thanks, I will check this.
Have you your own training colab notebook for urdu language using the following model ?
facebook/mms-tts-urd-script_arabic

@syedmuhammad Thanks, I will check this.
Have you your own training colab notebook for urdu language using the following model ?
facebook/mms-tts-urd-script_arabic

Yes

@syedmuhammad Would you like to share it.

@syedmuhammad Would you like to share it.

You can email me at: syedmuhammad1111@gmail.com

during fieturning i got return tensor.to(device, non_blocking=non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: BatchEncoding.to() got an unexpected keyword argument 'non_blocking'

solution?

during fieturning i got return tensor.to(device, non_blocking=non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: BatchEncoding.to() got an unexpected keyword argument 'non_blocking'

solution?

@charbossly Maybe this solution will help: https://github.com/ylacombe/finetune-hf-vits/issues/22

Sign up or log in to comment