johntsi commited on
Commit
372d33f
1 Parent(s): 30d1714

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -12
README.md CHANGED
@@ -229,8 +229,6 @@ language_details: >-
229
  license: mit
230
  metrics:
231
  - bleu
232
- datasets:
233
- - mozilla-foundation/common_voice_8_0
234
  pipeline_tag: automatic-speech-recognition
235
  tags:
236
  - zeroswot
@@ -265,7 +263,7 @@ The compression module is a light-weight transformer that takes as input the hid
265
 
266
  ## Version
267
 
268
- This version of ZeroSwot is trained with ASR data from CommonVoice, and adapted [wav2vec2.0-large](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) to the [nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) model.
269
 
270
  We have more versions available:
271
 
@@ -305,9 +303,9 @@ processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h-lv60
305
  tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
306
 
307
  # Load ZeroSwot Encoder
308
- commit_hash = "eafabee295ea1c8b45483d1fd26bd747d9a7d937"
309
  zeroswot_encoder = AutoModel.from_pretrained(
310
- "johntsi/ZeroSwot-Medium_asr-cv_en-to-200", trust_remote_code=True, revision=commit_hash,
311
  )
312
  zeroswot_encoder.eval()
313
  zeroswot_encoder.to("cuda")
@@ -335,14 +333,24 @@ print(translation)
335
 
336
  ## Results
337
 
338
- BLEU scores on CoVoST-2 test compared to supervised SOTA models [XLS-R-1B](https://huggingface.co/facebook/wav2vec2-xls-r-1b) and [SeamlessM4T-Medium](https://huggingface.co/facebook/seamless-m4t-medium). You can refer to Table 5 of the Results section in the paper for more details.
339
 
340
- | Models | ZS | Size (B) | Ar | Ca | Cy | De | Et | Fa | Id | Ja | Lv | Mn | Sl | Sv | Ta | Tr | Zh | Average |
341
- |:--------------:|:----:|:----------:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:-------:|
342
- | [XLS-R-1B](https://huggingface.co/facebook/wav2vec2-xls-r-1b) | ✗ | 1.0 | 19.2 | 32.1 | **31.8** | 26.2 | 22.4 | 21.3 | 30.3 | 39.9 | 22.0 | 14.9 | 25.4 | 32.3 | 18.1 | 17.1 | 36.7 | 26.0 |
343
- | [SeamlessM4T-Medium](https://huggingface.co/facebook/seamless-m4t-medium) | | 1.2 | 20.8 | 37.3 | 29.9 | **31.4** | 23.3 | 17.2 | 34.8 | 37.5 | 19.5 | 12.9 | 29.0 | 37.3 | 18.9 | **19.8** | 30.0 | 26.6 |
344
- | [ZeroSwot-M_asr-cv](https://huggingface.co/johntsi/ZeroSwot-Medium_asr-cv_en-to-200) | ✓ | 0.35/0.95 | 17.6 | 32.5 | 18.0 | 29.9 | 20.4 | 16.3 | 32.4 | 32.0 | 13.3 | 10.0 | 25.2 | 34.4 | 17.8 | 15.6 | 30.5 | 23.1 |
345
- | [ZeroSwot-M_asr-cv_mt-covost2](https://huggingface.co/johntsi/ZeroSwot-Medium_asr-cv_mt-covost2_en-to-200) | ✓ | 0.35/0.95 | **24.4** | **38.7** | 28.8 | 31.2 | **26.2** | **26.0** | **36.0** | **46.0** | **24.8** | **19.0** | **31.6** | **37.8** | **24.4** | 18.6 | **39.0** | **30.2** |
 
 
 
 
 
 
 
 
 
 
346
 
347
  ## Citation
348
 
 
229
  license: mit
230
  metrics:
231
  - bleu
 
 
232
  pipeline_tag: automatic-speech-recognition
233
  tags:
234
  - zeroswot
 
263
 
264
  ## Version
265
 
266
+ This version of ZeroSwot is trained with ASR data from MuST-C v1.0, and adapted [wav2vec2.0-large](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) to the [nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) model.
267
 
268
  We have more versions available:
269
 
 
303
  tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
304
 
305
  # Load ZeroSwot Encoder
306
+ commit_hash = "30d17145fd8e040430bbfcf74a011070fa83debd"
307
  zeroswot_encoder = AutoModel.from_pretrained(
308
+ "johntsi/ZeroSwot-Medium_asr-mustc_en-to-200", trust_remote_code=True, revision=commit_hash,
309
  )
310
  zeroswot_encoder.eval()
311
  zeroswot_encoder.to("cuda")
 
333
 
334
  ## Results
335
 
336
+ BLEU scores on CoVoST-2 test compared to _supervised_ SOTA models from the literature. You can refer to Table 5 of the Results section in the paper for more details.
337
 
338
+ | Models | ZS | Size (B) | De | Es | Fr | It | Nl | Pt | Ro | Ru | Average |
339
+ |:-----------------------:|:----:|:----------:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:-------:|
340
+ | Chimera (Han et al., 2021) | | 0.15 | 27.1 | 30.6 | 35.6 | 25.0 | 29.2 | 30.2 | 24.0 | 17.4 | 27.4 |
341
+ | STEMM (Fang et al., 2022) | ✗ | 0.15 | 28.7 | 31.0 | 37.4 | 25.8 | 30.5 | 31.7 | 24.5 | 17.8 | 28.4 |
342
+ | SpeechUT (Zhang et al., 2022) | | 0.15 | 30.1 | 33.6 | 41.4 | - | - | - | - | - | - |
343
+ | Siamese-PT (Le et al., 2023) | | 0.25 | 27.9 | 31.8 | 39.2 | 27.7 | 31.7 | 34.2 | 27.0 | 18.5 | 29.8 |
344
+ | CRESS (Fang and Feng, 2023) | ✗ | 0.15 | 29.4 | 33.2 | 40.1 | 27.6 | 32.2 | 33.6 | 26.4 | 19.7 | 30.3 |
345
+ | SimRegCR (Gao et al., 2023b) | ✗ | 0.15 | 29.2 | 33.0 | 40.0 | 28.2 | 32.7 | 34.2 | 26.7 | 20.1 | 30.5 |
346
+ | LST (LLaMA2-13B) (Zhang et al., 2023)| ✗ | 13 | 30.4 | 35.3 | **41.6** | - | - | - | - | - | - |
347
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
348
+ | [ZeroSwot-Medium_asr-cv](https://huggingface.co/johntsi/ZeroSwot-Medium_asr-cv_en-to-200) | ✓ | 0.35/0.95 | 24.8 | 30.0 | 32.6 | 24.1 | 28.6 | 28.8 | 22.9 | 16.4 | 26.0 |
349
+ | [ZeroSwot-Medium_asr-mustc](https://huggingface.co/johntsi/ZeroSwot-Medium_asr-mustc_en-to-200) | ✓ | 0.35/0.95 | 28.5 | 33.1 | 37.5 | 28.2 | 32.3 | 32.9 | 26.0 | 18.7 | 29.6 |
350
+ | [ZeroSwot-Medium_asr-mustc_mt-mustc](https://huggingface.co/johntsi/ZeroSwot-Medium_asr-mustc_mt-mustc_en-to-8) | ✓ | 0.35/0.95†| 30.5 | 34.9 | 39.4 | 30.6 | 35.0 | 37.1 | 27.8 | 20.3 | 31.9 |
351
+ | [ZeroSwot-Large_asr-cv](https://huggingface.co/johntsi/ZeroSwot-Large_asr-cv_en-to-200) | ✓ | 0.35/1.65 | 26.5 | 31.1 | 33.5 | 25.4 | 29.9 | 30.6 | 24.3 | 18.0 | 27.4 |
352
+ | [ZeroSwot-Large_asr-mustc](https://huggingface.co/johntsi/ZeroSwot-Large_asr-mustc_en-to-200)| ✓ | 0.35/1.65 | 30.1 | 34.8 | 38.9 | 29.8 | 34.4 | 35.3 | 27.6 | 20.4 | 31.4 |
353
+ | [ZeroSwot-Large_asr-mustc_mt-mustc](https://huggingface.co/johntsi/ZeroSwot-Large_asr-mustc_mt-mustc_en-to-8)| ✓ | 0.35/1.65†| **31.2** | **35.8** | 40.5 | **31.4** | **36.3** | **38.3** | **28.0** | **21.5** | **32.9** |
354
 
355
  ## Citation
356