sravyapopuri388 commited on
Commit
caa5dc8
1 Parent(s): e4d7e74

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+
3
+ ---
4
+ # xm_transformer_600m-es_en-multi_domain
5
+
6
+ [W2V2-Transformer](https://aclanthology.org/2021.acl-long.68/) speech-to-text translation model from fairseq S2T ([paper](https://arxiv.org/abs/2010.05171)/[code](https://github.com/pytorch/fairseq/tree/main/examples/speech_to_text)):
7
+ - Spanish-English
8
+ - Trained on mTEDx, CoVoST 2, EuroParl-ST, VoxPopuli, Multilingual LibriSpeech, Common Voice v7 and CCMatrix
9
+ - Speech synthesis with [facebook/fastspeech2-en-ljspeech](https://huggingface.co/facebook/fastspeech2-en-ljspeech)
10
+
11
+ ## Usage
12
+ ```python
13
+ from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
14
+ from fairseq.models.text_to_speech.hub_interface import S2THubInterface
15
+ from fairseq.models.text_to_speech.hub_interface import TTSHubInterface
16
+ import IPython.display as ipd
17
+ import torchaudio
18
+
19
+
20
+ models, cfg, task = load_model_ensemble_and_task_from_hf_hub(
21
+ "facebook/xm_transformer_600m-es_en-multi_domain",
22
+ arg_overrides={"config_yaml": "config.yaml"},
23
+ )
24
+ model = models[0]
25
+ generator = task.build_generator(model, cfg)
26
+
27
+
28
+ # requires 16000Hz mono channel audio
29
+ audio, _ = torchaudio.load("/path/to/an/audio/file")
30
+
31
+ sample = S2THubInterface.get_model_input(task, audio)
32
+ text = S2THubInterface.get_prediction(task, model, generator, sample)
33
+
34
+ # speech synthesis
35
+ tts_models, tts_cfg, tts_task = load_model_ensemble_and_task_from_hf_hub(
36
+ f"facebook/fastspeech2-en-ljspeech",
37
+ arg_overrides={"vocoder": "griffin_lim", "fp16": False},
38
+ )
39
+ tts_model = tts_models[0]
40
+ TTSHubInterface.update_cfg_with_data_cfg(tts_cfg, tts_task.data_cfg)
41
+ tts_generator = tts_task.build_generator([tts_model], tts_cfg)
42
+
43
+ tts_sample = TTSHubInterface.get_model_input(tts_task, text)
44
+ wav, sr = TTSHubInterface.get_prediction(
45
+ tts_task, tts_model, tts_generator, tts_sample
46
+ )
47
+
48
+ ipd.Audio(wav, rate=rate)