--- language: - uk license: apache-2.0 datasets: - mozilla-foundation/common_voice_11_0 model-index: - name: ukrainian-data2vec-asr results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 11.0 type: mozilla-foundation/common_voice_11_0 config: uk split: test args: uk metrics: - name: Wer type: wer value: 17.042283338786351 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 11.0 type: mozilla-foundation/common_voice_11_0 config: uk split: validation args: uk metrics: - name: Wer type: wer value: 17.634350000973198 --- # Respeecher/ukrainian-data2vec-asr This model is a fine-tuned version of [Respeecher/ukrainian-data2vec](https://huggingface.co/Respeecher/ukrainian-data2vec) on the [Common Voice 11.0 dataset Ukrainian Train part](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/viewer/uk/train). It achieves the following results: - eval_wer: 17.634350000973198 - test_wer: 17.042283338786351 ## How to Get Started with the Model ```python from transformers import AutoProcessor, Data2VecAudioForCTC import torch from datasets import load_dataset, Audio dataset = load_dataset("mozilla-foundation/common_voice_11_0", "uk", split="test") # Resample dataset = dataset.cast_column("audio", Audio(sampling_rate=16_000)) processor = AutoProcessor.from_pretrained("Respeecher/ukrainian-data2vec-asr") model = Data2VecAudioForCTC.from_pretrained("Respeecher/ukrainian-data2vec-asr") model.eval() sampling_rate = dataset.features["audio"].sampling_rate inputs = processor(dataset[1]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits predicted_ids = torch.argmax(logits, dim=-1) transcription = processor.batch_decode(predicted_ids) transcription[0] ``` ## Training Details Training code and instructions are available on [our github](https://github.com/respeecher/ukrainian_asr)