utrobinmv
/

tts_ru_free_hf_vits_high_multispeaker

Inference Endpoints

Model card Files Files and versions Community

utrobinmv commited on May 25

Commit

ee94471

•

1 Parent(s): ae30f55

feat add readme

Files changed (1) hide show

README.md +78 -0

README.md ADDED Viewed

	@@ -0,0 +1,78 @@

+---
+language:
+- ru
+tags:
+- vits
+license: apache-2.0
+pipeline_tag: text-to-speech
+---
+# Text to Speech Russian free multispeaker model
+This is a multiple speakers text-to-speech model for the Russian language. It works on plain text with punctuation separation, and does not require prior conversion of the text into phonemes.
+The model with multiple speakers has two voices: 0 - woman, 1 - man.
+The text accepts lowercase.
+The model is trained to place accents on her own. But to improve the quality of generation, we recommend putting accents in the text before vowel letters.
+Usage example using PyTorch:
+```python
+from transformers import VitsModel, AutoTokenizer
+import torch
+import scipy
+from ruaccent import RUAccent
+device = 'cuda' #  'cpu' or 'cuda'
+speaker = 1 # 0-woman, 1-man
+# load model
+model_name = "utrobinmv/tts_ru_free_hf_vits_high_multispeaker"
+model = VitsModel.from_pretrained(model_name).to(device)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model.eval()
+# text with accents
+text = """Ночью двадцать тр+етьего июня начал извергаться самый высокий
+действующий вулк+ан в Евразии - Кл+ючевской. Об этом сообщила руководитель
+Камчатской группы реагирования на вулканические извержения, ведущий
+научный сотрудник Института вулканологии и сейсмологии ДВО РАН +Ольга Гирина.
+«Зафиксированное ночью не просто свечение, а вершинное эксплозивное
+извержение стромболианского типа. Пока такое извержение никому не опасно:
+ни населению, ни авиации» пояснила ТАСС госпожа Гирина."""
+# the placement of accents
+text = text.lower()
+inputs = tokenizer(text, return_tensors="pt")
+with torch.no_grad():
+    output = model(**inputs.to(device), speaker_id=speaker).waveform
+    output = output.detach().cpu().numpy()
+scipy.io.wavfile.write("tts_audio.wav", rate=model.config.sampling_rate,
+                       data=output[0])
+```
+For displayed in a Jupyter Notebook / Google Colab:
+```python
+from IPython.display import Audio
+Audio(output, rate=model.config.sampling_rate)
+```
+##
+## Languages covered
+Russian (ru_RU)