|
--- |
|
datasets: |
|
- librispeech_asr |
|
language: |
|
- en |
|
metrics: |
|
- wer |
|
tags: |
|
- hubert |
|
- tts |
|
--- |
|
# voidful/mhubert-unit-tts |
|
|
|
voidful/mhubert-unit-tts |
|
|
|
This repository provides a text to unit model form mhubert and trained with bart model. |
|
The model was trained on the LibriSpeech ASR dataset for the English language and |
|
Train epoch 13: `WER:30.41` `CER: 20.22` |
|
|
|
|
|
Hubert Code TTS Example |
|
```python |
|
import asrp |
|
import nlp2 |
|
import IPython.display as ipd |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
nlp2.download_file( |
|
'https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000', |
|
'./') |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("voidful/mhubert-unit-tts") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("voidful/mhubert-unit-tts") |
|
model.eval() |
|
cs = asrp.Code2Speech(tts_checkpoint='./g_00500000', vocoder='hifigan') |
|
|
|
inputs = tokenizer(["The quick brown fox jumps over the lazy dog."], return_tensors="pt") |
|
code = tokenizer.batch_decode(model.generate(**inputs,max_length=1024))[0] |
|
code = [int(i) for i in code.replace("</s>","").replace("<s>","").split("v_tok_")[1:]] |
|
print(code) |
|
ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate) |
|
``` |
|
|
|
Datasets |
|
The model was trained on the LibriSpeech ASR dataset for the English language. |
|
|
|
Language |
|
The model is trained for the English language. |
|
|
|
Metrics |
|
The model's performance is evaluated using Word Error Rate (WER). |
|
|
|
Tags |
|
The model can be tagged with "hubert" and "tts". |
|
|