voidful commited on
Commit
c92d3b5
1 Parent(s): 58a480e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - librispeech_asr
4
+ language:
5
+ - en
6
+ metrics:
7
+ - wer
8
+ tags:
9
+ - hubert
10
+ - tts
11
+ ---
12
+ # voidful/mhubert-unit-tts
13
+
14
+ voidful/mhubert-unit-tts
15
+
16
+ This repository provides a text to unit model form mhubert and trained with bart model.
17
+ The model was trained on the LibriSpeech ASR dataset for the English language and
18
+ Train epoch 13: `WER:30.41` `CER: 20.22`
19
+
20
+
21
+ Hubert Code TTS Example
22
+ ```python
23
+ import asrp
24
+ import nlp2
25
+ import IPython.display as ipd
26
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
27
+ nlp2.download_file(
28
+ 'https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000',
29
+ './')
30
+
31
+
32
+ tokenizer = AutoTokenizer.from_pretrained("voidful/mhubert-unit-tts")
33
+ model = AutoModelForSeq2SeqLM.from_pretrained("voidful/mhubert-unit-tts")
34
+ model.eval()
35
+ cs = asrp.Code2Speech(tts_checkpoint='./g_00500000', vocoder='hifigan')
36
+
37
+ inputs = tokenizer(["The quick brown fox jumps over the lazy dog."], return_tensors="pt")
38
+ code = tokenizer.batch_decode(model.generate(**inputs,max_length=1024))[0]
39
+ code = [int(i) for i in code.replace("</s>","").replace("<s>","").split("v_tok_")[1:]]
40
+ print(code)
41
+ ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate)
42
+ ```
43
+
44
+ Datasets
45
+ The model was trained on the LibriSpeech ASR dataset for the English language.
46
+
47
+ Language
48
+ The model is trained for the English language.
49
+
50
+ Metrics
51
+ The model's performance is evaluated using Word Error Rate (WER).
52
+
53
+ Tags
54
+ The model can be tagged with "hubert" and "tts".