lpw commited on
Commit
0d0f9d2
1 Parent(s): 7e94b53

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -20
README.md CHANGED
@@ -9,17 +9,14 @@ tags:
9
  - speech-to-speech-translation
10
 
11
  datasets:
12
- - mtedx
13
- - covost2
14
- - europarl_st
15
- - voxpopuli
16
 
17
  ---
18
  ## xm_transformer_unity_en-hk
19
 
20
- Speech-to-speech translation model from fairseq S2UT ([paper](https://arxiv.org/abs/2204.02967)/[code](https://github.com/facebookresearch/fairseq/blob/main/examples/speech_to_speech/docs/enhanced_direct_s2st_discrete_units.md)):
21
  - English-Hokkien
22
- - Trained on mTEDx, CoVoST 2, Europarl-ST and VoxPopuli
23
  - Speech synthesis with [facebook/unit_hifigan_mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj_dur](https://huggingface.co/facebook/unit_hifigan_mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj_dur)
24
 
25
  ## Usage
@@ -89,17 +86,3 @@ wav, sr = tts_model.get_prediction(tts_sample)
89
 
90
  ipd.Audio(wav, rate=sr)
91
  ```
92
-
93
- ## Citation
94
- ```bibtex
95
- @misc{https://doi.org/10.48550/arxiv.2204.02967,
96
- doi = {10.48550/ARXIV.2204.02967},
97
- url = {https://arxiv.org/abs/2204.02967},
98
- author = {Popuri, Sravya and Chen, Peng-Jen and Wang, Changhan and Pino, Juan and Adi, Yossi and Gu, Jiatao and Hsu, Wei-Ning and Lee, Ann},
99
- keywords = {Computation and Language (cs.CL), Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering},
100
- title = {Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation},
101
- publisher = {arXiv},
102
- year = {2022},
103
- copyright = {arXiv.org perpetual, non-exclusive license}
104
- }
105
- ```
 
9
  - speech-to-speech-translation
10
 
11
  datasets:
12
+ - MuST-C
 
 
 
13
 
14
  ---
15
  ## xm_transformer_unity_en-hk
16
 
17
+ Speech-to-speech translation model from fairseq:
18
  - English-Hokkien
19
+ - Trained on MuST-C
20
  - Speech synthesis with [facebook/unit_hifigan_mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj_dur](https://huggingface.co/facebook/unit_hifigan_mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj_dur)
21
 
22
  ## Usage
 
86
 
87
  ipd.Audio(wav, rate=sr)
88
  ```