voidful
/

mhubert-unit-tts

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

mhubert-unit-tts / README.md

voidful's picture

Create README.md

c92d3b5 over 1 year ago

|

1.53 kB

	---
	datasets:
	- librispeech_asr
	language:
	- en
	metrics:
	- wer
	tags:
	- hubert
	- tts
	---
	# voidful/mhubert-unit-tts

	voidful/mhubert-unit-tts

	This repository provides a text to unit model form mhubert and trained with bart model.
	The model was trained on the LibriSpeech ASR dataset for the English language and
	Train epoch 13: `WER:30.41` `CER: 20.22`


	Hubert Code TTS Example
	```python
	import asrp
	import nlp2
	import IPython.display as ipd
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	nlp2.download_file(
	'https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000',
	'./')


	tokenizer = AutoTokenizer.from_pretrained("voidful/mhubert-unit-tts")
	model = AutoModelForSeq2SeqLM.from_pretrained("voidful/mhubert-unit-tts")
	model.eval()
	cs = asrp.Code2Speech(tts_checkpoint='./g_00500000', vocoder='hifigan')

	inputs = tokenizer(["The quick brown fox jumps over the lazy dog."], return_tensors="pt")
	code = tokenizer.batch_decode(model.generate(**inputs,max_length=1024))[0]
	code = [int(i) for i in code.replace("</s>","").replace("<s>","").split("v_tok_")[1:]]
	print(code)
	ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate)
	```

	Datasets
	The model was trained on the LibriSpeech ASR dataset for the English language.

	Language
	The model is trained for the English language.

	Metrics
	The model's performance is evaluated using Word Error Rate (WER).

	Tags
	The model can be tagged with "hubert" and "tts".