Update README.md
Browse files
README.md
CHANGED
@@ -1,37 +1,20 @@
|
|
1 |
---
|
2 |
-
|
|
|
|
|
3 |
language:
|
4 |
- uk
|
5 |
- en
|
6 |
-
datasets: oovword/speech-translation-uk-en
|
7 |
-
pipeline_tag: speech-translation
|
8 |
-
license: apache-2.0
|
9 |
metrics:
|
10 |
- bleu
|
11 |
- chrf
|
|
|
|
|
|
|
12 |
inference: true
|
13 |
library_name: transformers
|
14 |
-
|
15 |
-
-
|
16 |
-
results:
|
17 |
-
- task:
|
18 |
-
type: speech-translation
|
19 |
-
dataset:
|
20 |
-
name: Half-Synthetic Speech Dataset for Ukrainian-to-English Translation
|
21 |
-
type: oovword/speech-translation-uk-en
|
22 |
-
metrics:
|
23 |
-
- type: bleu
|
24 |
-
value: 22.34
|
25 |
-
name: BLEU
|
26 |
-
- task:
|
27 |
-
type: translation, speech-translation
|
28 |
-
dataset:
|
29 |
-
name: Half-Synthetic Speech Dataset for Ukrainian-to-English Translation
|
30 |
-
type: oovword/speech-translation-uk-en
|
31 |
-
metrics:
|
32 |
-
- type: chrf
|
33 |
-
value: 48.1
|
34 |
-
name: ChrF++
|
35 |
---
|
36 |
|
37 |
# Model Card
|
@@ -59,13 +42,16 @@ The model accepts mono-channel audio files with the sampling rate of 16kHz.
|
|
59 |
```python
|
60 |
import torchaudio
|
61 |
|
|
|
62 |
from transformers import WhisperForConditionalGeneration, WhisperProcessor
|
63 |
|
64 |
model = WhisperForConditionalGeneration.from_pretrained('whisper-uk2en-speech-translation')
|
65 |
processor = WhisperProcessor.from_pretrained('whisper-uk2en-speech-translation')
|
66 |
|
67 |
# Audio files in `datasets` format
|
68 |
-
|
|
|
|
|
69 |
with torch.inference_mode():
|
70 |
predictions = model.generate(**inputs)
|
71 |
sample['translation'] = processor.batch_decode(predictions, skip_special_tokens=True)[0].strip()
|
@@ -88,7 +74,7 @@ The following datasets, all licensed under CC-BY-4.0, were used for the model fi
|
|
88 |
|
89 |
The Fleurs dataset only contains authentic human speech and translations.
|
90 |
For the `elevenlabs` dataset, the Ukrainian text was generated by ChatGPT and later voiced by the `elevenlabs` TTS model. The transcripts were machine-translated into English by Azure Translator.
|
91 |
-
|
92 |
**NOTE:** English translations were not human-verified or proofread due to time limitations and, as such, may contain mistakes and inaccuracies.
|
93 |
|
94 |
Total (train): 10390 samples
|
@@ -146,4 +132,4 @@ url= {https://github.com/skypro1111/pflowtts_pytorch_uk}
|
|
146 |
author={Mazumder, Mark and Chitlangia, Sharad and Banbury, Colby and Kang, Yiping and Ciro, Juan Manuel and Achorn, Keith and Galvez, Daniel and Sabini, Mark and Mattson, Peter and Kanter, David and others},
|
147 |
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
|
148 |
year={2021}
|
149 |
-
}
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- oovword/speech-translation-uk-en
|
5 |
language:
|
6 |
- uk
|
7 |
- en
|
|
|
|
|
|
|
8 |
metrics:
|
9 |
- bleu
|
10 |
- chrf
|
11 |
+
base_model:
|
12 |
+
- openai/whisper-small
|
13 |
+
pipeline_tag: translation
|
14 |
inference: true
|
15 |
library_name: transformers
|
16 |
+
tags:
|
17 |
+
- speech-translation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
---
|
19 |
|
20 |
# Model Card
|
|
|
42 |
```python
|
43 |
import torchaudio
|
44 |
|
45 |
+
from datasets import load_dataset
|
46 |
from transformers import WhisperForConditionalGeneration, WhisperProcessor
|
47 |
|
48 |
model = WhisperForConditionalGeneration.from_pretrained('whisper-uk2en-speech-translation')
|
49 |
processor = WhisperProcessor.from_pretrained('whisper-uk2en-speech-translation')
|
50 |
|
51 |
# Audio files in `datasets` format
|
52 |
+
test_dataset = load_dataset('your-dataset-name-goes-here', split='test')
|
53 |
+
sample = test_dataset[123]['audio']
|
54 |
+
inputs = processor(sample['array'].squeeze(), sampling_rate=16000, return_tensors='pt', return_attention_mask=True)
|
55 |
with torch.inference_mode():
|
56 |
predictions = model.generate(**inputs)
|
57 |
sample['translation'] = processor.batch_decode(predictions, skip_special_tokens=True)[0].strip()
|
|
|
74 |
|
75 |
The Fleurs dataset only contains authentic human speech and translations.
|
76 |
For the `elevenlabs` dataset, the Ukrainian text was generated by ChatGPT and later voiced by the `elevenlabs` TTS model. The transcripts were machine-translated into English by Azure Translator.
|
77 |
+
Ukrainian peech and transcripts in the ML Spoken Words dataset are the authentic human data; the English text is machine-translated from Ukrainian by Azure Translator.
|
78 |
**NOTE:** English translations were not human-verified or proofread due to time limitations and, as such, may contain mistakes and inaccuracies.
|
79 |
|
80 |
Total (train): 10390 samples
|
|
|
132 |
author={Mazumder, Mark and Chitlangia, Sharad and Banbury, Colby and Kang, Yiping and Ciro, Juan Manuel and Achorn, Keith and Galvez, Daniel and Sabini, Mark and Mattson, Peter and Kanter, David and others},
|
133 |
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
|
134 |
year={2021}
|
135 |
+
}
|