metadata

license: mit
datasets:
  - styletts2-community/multilingual-phonemes-10k-alpha
language:
  - fr
  - en
  - es
  - ca
  - de
  - el
  - fa
  - fi
  - pt
  - pl
  - ru
  - sv
  - uk
  - zh

Multilingual PL-BERT checkpoint

The checkpoint open-sourced here is trained by Papercup using the open-source PL-BERT model found here https://github.com/yl4579/PL-BERT. It is trained to be supported by StyleTTS2, which can be found here: https://github.com/yl4579/StyleTTS2. See below for the languages that it has been trained on (the languages correspond to the crowdsourced dataset found here https://huggingface.co/datasets/styletts2-community/multilingual-phonemes-10k-alpha).

Notable differences compared to the default PL-BERT checkpoint and config available here:

Because we are working with many languages, we are using a different tokenizer now: bert-base-multilingual-cased.
The PL-BERT model was trained on the data obtained from styletts2-community/multilingual-phonemes-10k-alpha for 1.1M iterations.
The token_maps.pkl file has changed (also open-sourced here).
We have changed the util.py file to deal with an error when loading new_state_dict["embeddings.position_ids"].

How do I train StyleTTS2 with this new PL-BERT checkpoint?

Simply create a new folder under Utils in your StyleTTS2 repository. Call it, for example, PLBERT-all-languages.
Copy paste into it config.yml, step_1100000.t7 and util.py.
Then, in your StyleTTS2 config file, change PLBERT_dir to Utils/PLBERT-all-languages.
Now, you need to create train and validation files. You will need to use espeak to create a file in the same format as the ones that exist in the Data folder of the StyleTTS2 repository. Careful! You will need to change the language argument to phonemise your text if it's not in English. You can find the correct language codes here. For example, Latin American Spanish is es-419

Voila, you can now train a multilingual StyleTTS2 model!

Thank you to Aaron (Yinghao) Li for these contributions.