oovword commited on
Commit
4c7d944
·
verified ·
1 Parent(s): a741cff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -28
README.md CHANGED
@@ -1,37 +1,20 @@
1
  ---
2
- base_model: openai/whisper-small
 
 
3
  language:
4
  - uk
5
  - en
6
- datasets: oovword/speech-translation-uk-en
7
- pipeline_tag: speech-translation
8
- license: apache-2.0
9
  metrics:
10
  - bleu
11
  - chrf
 
 
 
12
  inference: true
13
  library_name: transformers
14
- model-index:
15
- - name: uk2en-speech-translation
16
- results:
17
- - task:
18
- type: speech-translation
19
- dataset:
20
- name: Half-Synthetic Speech Dataset for Ukrainian-to-English Translation
21
- type: oovword/speech-translation-uk-en
22
- metrics:
23
- - type: bleu
24
- value: 22.34
25
- name: BLEU
26
- - task:
27
- type: translation, speech-translation
28
- dataset:
29
- name: Half-Synthetic Speech Dataset for Ukrainian-to-English Translation
30
- type: oovword/speech-translation-uk-en
31
- metrics:
32
- - type: chrf
33
- value: 48.1
34
- name: ChrF++
35
  ---
36
 
37
  # Model Card
@@ -59,13 +42,16 @@ The model accepts mono-channel audio files with the sampling rate of 16kHz.
59
  ```python
60
  import torchaudio
61
 
 
62
  from transformers import WhisperForConditionalGeneration, WhisperProcessor
63
 
64
  model = WhisperForConditionalGeneration.from_pretrained('whisper-uk2en-speech-translation')
65
  processor = WhisperProcessor.from_pretrained('whisper-uk2en-speech-translation')
66
 
67
  # Audio files in `datasets` format
68
- inputs = processor(sample['audio']['array'].squeeze(), sampling_rate=16000, return_tensors='pt', return_attention_mask=True)
 
 
69
  with torch.inference_mode():
70
  predictions = model.generate(**inputs)
71
  sample['translation'] = processor.batch_decode(predictions, skip_special_tokens=True)[0].strip()
@@ -88,7 +74,7 @@ The following datasets, all licensed under CC-BY-4.0, were used for the model fi
88
 
89
  The Fleurs dataset only contains authentic human speech and translations.
90
  For the `elevenlabs` dataset, the Ukrainian text was generated by ChatGPT and later voiced by the `elevenlabs` TTS model. The transcripts were machine-translated into English by Azure Translator.
91
- Speech and Ukrainian transcripts in the ML Spoken Words dataset are authentic human data; the English text is machine-translated from Ukrainian by Azure Translator.
92
  **NOTE:** English translations were not human-verified or proofread due to time limitations and, as such, may contain mistakes and inaccuracies.
93
 
94
  Total (train): 10390 samples
@@ -146,4 +132,4 @@ url= {https://github.com/skypro1111/pflowtts_pytorch_uk}
146
  author={Mazumder, Mark and Chitlangia, Sharad and Banbury, Colby and Kang, Yiping and Ciro, Juan Manuel and Achorn, Keith and Galvez, Daniel and Sabini, Mark and Mattson, Peter and Kanter, David and others},
147
  booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
148
  year={2021}
149
- }
 
1
  ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - oovword/speech-translation-uk-en
5
  language:
6
  - uk
7
  - en
 
 
 
8
  metrics:
9
  - bleu
10
  - chrf
11
+ base_model:
12
+ - openai/whisper-small
13
+ pipeline_tag: translation
14
  inference: true
15
  library_name: transformers
16
+ tags:
17
+ - speech-translation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
19
 
20
  # Model Card
 
42
  ```python
43
  import torchaudio
44
 
45
+ from datasets import load_dataset
46
  from transformers import WhisperForConditionalGeneration, WhisperProcessor
47
 
48
  model = WhisperForConditionalGeneration.from_pretrained('whisper-uk2en-speech-translation')
49
  processor = WhisperProcessor.from_pretrained('whisper-uk2en-speech-translation')
50
 
51
  # Audio files in `datasets` format
52
+ test_dataset = load_dataset('your-dataset-name-goes-here', split='test')
53
+ sample = test_dataset[123]['audio']
54
+ inputs = processor(sample['array'].squeeze(), sampling_rate=16000, return_tensors='pt', return_attention_mask=True)
55
  with torch.inference_mode():
56
  predictions = model.generate(**inputs)
57
  sample['translation'] = processor.batch_decode(predictions, skip_special_tokens=True)[0].strip()
 
74
 
75
  The Fleurs dataset only contains authentic human speech and translations.
76
  For the `elevenlabs` dataset, the Ukrainian text was generated by ChatGPT and later voiced by the `elevenlabs` TTS model. The transcripts were machine-translated into English by Azure Translator.
77
+ Ukrainian peech and transcripts in the ML Spoken Words dataset are the authentic human data; the English text is machine-translated from Ukrainian by Azure Translator.
78
  **NOTE:** English translations were not human-verified or proofread due to time limitations and, as such, may contain mistakes and inaccuracies.
79
 
80
  Total (train): 10390 samples
 
132
  author={Mazumder, Mark and Chitlangia, Sharad and Banbury, Colby and Kang, Yiping and Ciro, Juan Manuel and Achorn, Keith and Galvez, Daniel and Sabini, Mark and Mattson, Peter and Kanter, David and others},
133
  booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
134
  year={2021}
135
+ }