oovword
/

whisper-uk2en-speech-translation

@@ -1,37 +1,20 @@
 ---
-base_model: openai/whisper-small
 language:
 - uk
 - en
-datasets: oovword/speech-translation-uk-en
-pipeline_tag: speech-translation
-license: apache-2.0
 metrics:
 - bleu
 - chrf
 inference: true
 library_name: transformers
-model-index:
-- name: uk2en-speech-translation
-  results:
-  - task:
-      type: speech-translation
-    dataset:
-      name: Half-Synthetic Speech Dataset for Ukrainian-to-English Translation
-      type: oovword/speech-translation-uk-en
-    metrics:
-    - type: bleu
-      value: 22.34
-      name: BLEU
-  - task:
-      type: translation, speech-translation
-    dataset:
-      name: Half-Synthetic Speech Dataset for Ukrainian-to-English Translation
-      type: oovword/speech-translation-uk-en
-    metrics:
-    - type: chrf
-      value: 48.1
-      name: ChrF++
 ---
 # Model Card
@@ -59,13 +42,16 @@ The model accepts mono-channel audio files with the sampling rate of 16kHz.
 ```python
 import torchaudio
 from transformers import WhisperForConditionalGeneration, WhisperProcessor
 model = WhisperForConditionalGeneration.from_pretrained('whisper-uk2en-speech-translation')
 processor = WhisperProcessor.from_pretrained('whisper-uk2en-speech-translation')
 # Audio files in `datasets` format
-inputs = processor(sample['audio']['array'].squeeze(), sampling_rate=16000, return_tensors='pt', return_attention_mask=True)
 with torch.inference_mode():
     predictions = model.generate(**inputs)
 sample['translation'] = processor.batch_decode(predictions, skip_special_tokens=True)[0].strip()
@@ -88,7 +74,7 @@ The following datasets, all licensed under CC-BY-4.0, were used for the model fi
 The Fleurs dataset only contains authentic human speech and translations.
 For the `elevenlabs` dataset, the Ukrainian text was generated by ChatGPT and later voiced by the `elevenlabs` TTS model. The transcripts were machine-translated into English by Azure Translator.
-Speech and Ukrainian transcripts in the ML Spoken Words dataset are authentic human data; the English text is machine-translated from Ukrainian by Azure Translator.
 **NOTE:** English translations were not human-verified or proofread due to time limitations and, as such, may contain mistakes and inaccuracies.
 Total (train): 10390 samples
@@ -146,4 +132,4 @@ url= {https://github.com/skypro1111/pflowtts_pytorch_uk}
   author={Mazumder, Mark and Chitlangia, Sharad and Banbury, Colby and Kang, Yiping and Ciro, Juan Manuel and Achorn, Keith and Galvez, Daniel and Sabini, Mark and Mattson, Peter and Kanter, David and others},
   booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
   year={2021}
-}

 ---
+license: apache-2.0
+datasets:
+- oovword/speech-translation-uk-en
 language:
 - uk
 - en
 metrics:
 - bleu
 - chrf
+base_model:
+- openai/whisper-small
+pipeline_tag: translation
 inference: true
 library_name: transformers
+tags:
+- speech-translation
 ---
 # Model Card
 ```python
 import torchaudio
+from datasets import load_dataset
 from transformers import WhisperForConditionalGeneration, WhisperProcessor
 model = WhisperForConditionalGeneration.from_pretrained('whisper-uk2en-speech-translation')
 processor = WhisperProcessor.from_pretrained('whisper-uk2en-speech-translation')
 # Audio files in `datasets` format
+test_dataset = load_dataset('your-dataset-name-goes-here', split='test')
+sample = test_dataset[123]['audio']
+inputs = processor(sample['array'].squeeze(), sampling_rate=16000, return_tensors='pt', return_attention_mask=True)
 with torch.inference_mode():
     predictions = model.generate(**inputs)
 sample['translation'] = processor.batch_decode(predictions, skip_special_tokens=True)[0].strip()
 The Fleurs dataset only contains authentic human speech and translations.
 For the `elevenlabs` dataset, the Ukrainian text was generated by ChatGPT and later voiced by the `elevenlabs` TTS model. The transcripts were machine-translated into English by Azure Translator.
+Ukrainian peech and transcripts in the ML Spoken Words dataset are the authentic human data; the English text is machine-translated from Ukrainian by Azure Translator.
 **NOTE:** English translations were not human-verified or proofread due to time limitations and, as such, may contain mistakes and inaccuracies.
 Total (train): 10390 samples
   author={Mazumder, Mark and Chitlangia, Sharad and Banbury, Colby and Kang, Yiping and Ciro, Juan Manuel and Achorn, Keith and Galvez, Daniel and Sabini, Mark and Mattson, Peter and Kanter, David and others},
   booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
   year={2021}
+}