# vakyansh-tts Text to Speech for Indic languages ## Pretrained Models The pretrained models are hosted in a seperate repository [here](https://github.com/Open-Speech-EkStep/vakyansh-models#tts-models-repo) ### 1. Installation and Setup for training Clone repo ``` git clone https://github.com/Open-Speech-EkStep/vakyansh-tts ``` Build conda virtual environment ``` cd ./vakyansh-tts conda create --name python=3.7 conda activate pip install -r requirement.txt ``` Install [apex](https://github.com/NVIDIA/apex); commit: 37cdaf4 for Mixed-precision training ``` cd .. git clone https://github.com/NVIDIA/apex cd apex git checkout 37cdaf4 pip install -v --disable-pip-version-check --no-cache-dir ./ cd ../vakyansh-tts ``` Build Monotonic Alignment Search Code (Cython) ``` bash install.sh ``` ### 1.1 Installation of tts_infer package In tts_infer package, we currently have two components: 1. Transliteration (AI4bharat's open sourced models) (Languages supported: {'hi', 'gu', 'mr', 'bn', 'te', 'ta', 'kn', 'pa', 'gom', 'mai', 'ml', 'sd', 'si', 'ur'} ) 2. Num to Word (Languages supported: {'en', 'hi', 'gu', 'mr', 'bn', 'te', 'ta', 'kn', 'or', 'pa'} ) ``` git clone https://github.com/Open-Speech-EkStep/vakyansh-tts cd vakyansh-tts bash install.sh python setup.py bdist_wheel pip install -e . cd tts_infer gsutil -m cp -r gs://vakyaansh-open-models/translit_models . ``` Usage: ``` from tts_infer.tts import TextToMel, MelToWav from tts_infer.transliterate import XlitEngine from tts_infer.num_to_word_on_sent import normalize_nums import re from scipy.io.wavfile import write text_to_mel = TextToMel(glow_model_dir='/path/to/glow-tts/checkpoint/dir', device='cuda') mel_to_wav = MelToWav(hifi_model_dir='/path/to/hifi/checkpoint/dir', device='cuda') def translit(text, lang): reg = re.compile(r'[a-zA-Z]') engine = XlitEngine(lang) words = [engine.translit_word(word, topk=1)[lang][0] if reg.match(word) else word for word in text.split()] updated_sent = ' '.join(words) return updated_sent def run_tts(text, lang): text = text.replace('ред', '.') # only for hindi models text_num_to_word = normalize_nums(text, lang) # converting numbers to words in lang text_num_to_word_and_transliterated = translit(text_num_to_word, lang) # transliterating english words to lang mel = text_to_mel.generate_mel(text_num_to_word_and_transliterated) audio, sr = mel_to_wav.generate_wav(mel) write(filename='temp.wav', rate=sr, data=audio) # for saving wav file, if needed return (sr, audio) ``` ### 2. Spectogram Training (glow-tts) ``` cd ./scripts bash train_glow.sh ``` ### 3. Genrate Mels ``` cd ./scripts bash generate_mels.sh ``` ### 4. Vocoder Training (hifi-gan) ``` cd ./scripts bash train_hifi.sh ``` ### 4. Inference ``` cd ./scripts bash infer.sh ```