utrobinmv
/

tts_ru_free_hf_vits_high_multispeaker

Inference Endpoints

Model card Files Files and versions Community

tts_ru_free_hf_vits_high_multispeaker / README.md

utrobinmv's picture

feat add readme

99aef7f 2 months ago

|

history blame contribute delete

No virus

2.44 kB

	---
	language:
	- ru
	tags:
	- vits
	license: apache-2.0
	pipeline_tag: text-to-speech
	---

	# Text to Speech Russian free multispeaker model

	This is a multiple speakers text-to-speech model for the Russian language. It works on plain text with punctuation separation, and does not require prior conversion of the text into phonemes.
	The model with multiple speakers has two voices: 0 - woman, 1 - man.

	The text accepts lowercase.

	The model is trained to place accents on her own. But to improve the quality of generation, we recommend putting accents in the text before vowel letters.



	Usage example using PyTorch:

	```python
	from transformers import VitsModel, AutoTokenizer
	import torch
	import scipy

	device = 'cuda' # 'cpu' or 'cuda'

	speaker = 1 # 0-woman, 1-man

	# load model
	model_name = "utrobinmv/tts_ru_free_hf_vits_high_multispeaker"

	model = VitsModel.from_pretrained(model_name).to(device)
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model.eval()

	# text with accents
	text = """Ночью двадцать тр+етьего июня начал извергаться самый высокий
	действующий вулк+ан в Евразии - Кл+ючевской. Об этом сообщила руководитель
	Камчатской группы реагирования на вулканические извержения, ведущий
	научный сотрудник Института вулканологии и сейсмологии ДВО РАН +Ольга Гирина.
	«Зафиксированное ночью не просто свечение, а вершинное эксплозивное
	извержение стромболианского типа. Пока такое извержение никому не опасно:
	ни населению, ни авиации» пояснила ТАСС госпожа Гирина."""

	# text lowercase
	text = text.lower()

	inputs = tokenizer(text, return_tensors="pt")

	with torch.no_grad():
	output = model(**inputs.to(device), speaker_id=speaker).waveform
	output = output.detach().cpu().numpy()

	scipy.io.wavfile.write("tts_audio.wav", rate=model.config.sampling_rate,
	data=output[0])
	```



	For displayed in a Jupyter Notebook / Google Colab:

	```python
	from IPython.display import Audio

	Audio(output, rate=model.config.sampling_rate)
	```

	##



	## Languages covered

	Russian (ru_RU)