File size: 3,613 Bytes
8522a10
 
 
 
 
 
 
 
 
caba6c9
8522a10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d3ffde3
 
 
 
 
6a7794e
 
 
d3ffde3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27bf67c
d3ffde3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
language:
- sk
license: apache-2.0
tags:
- automatic-speech-recognition
- mozilla-foundation/common_voice_8_0
- robust-speech-event
- xlsr-fine-tuning-week
- hf-asr-leaderboard
datasets:
- common_voice
model-index:
- name: Slovak comodoro Wav2Vec2 XLSR 300M CV8
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 8
      type: mozilla-foundation/common_voice_8_0
      args: sk
    metrics:
    - name: Test WER
      type: wer
      value: 49.6
    - name: Test CER
      type: cer
      value: 13.3
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Robust Speech Event - Dev Data
      type: speech-recognition-community-v2/dev_data
      args: sk
    metrics:
    - name: Test WER
      type: wer
      value: 81.7
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Robust Speech Event - Test Data
      type: speech-recognition-community-v2/eval_data
      args: sk
    metrics:
    - name: Test WER
      type: wer
      value: 80.26
---


# wav2vec2-xls-r-300m-cs-cv8

This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice 8.0 dataset.

It achieves the following results on the evaluation set:



- WER: 0.49575384615384616

- CER: 0.13333333333333333



## Usage



The model can be used directly (without a language model) as follows:



```python

import torch

import torchaudio

from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

test_dataset = load_dataset("mozilla-foundation/common_voice_8_0", "sk", split="test[:2%]")



processor = Wav2Vec2Processor.from_pretrained("comodoro/wav2vec2-xls-r-300m-sk-cv8")
model = Wav2Vec2ForCTC.from_pretrained("comodoro/wav2vec2-xls-r-300m-sk-cv8")



resampler = torchaudio.transforms.Resample(48_000, 16_000)



# Preprocessing the datasets.

# We need to read the aduio files as arrays

def speech_file_to_array_fn(batch):

	speech_array, sampling_rate = torchaudio.load(batch["path"])

	batch["speech"] = resampler(speech_array).squeeze().numpy()
	return batch


test_dataset = test_dataset.map(speech_file_to_array_fn)
inputs = processor(test_dataset[:2]["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)

with torch.no_grad():

	logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits

predicted_ids = torch.argmax(logits, dim=-1)



print("Prediction:", processor.batch_decode(predicted_ids))

print("Reference:", test_dataset[:2]["sentence"])
```



## Evaluation



The model can be evaluated using the attached `eval.py` script:

```
python eval.py --model_id comodoro/wav2vec2-xls-r-300m-sk-cv8 --dataset mozilla-foundation/common_voice_8_0 --split test --config sk
```



## Training and evaluation data



The Common Voice 8.0 `train` and `validation` datasets were used for training



### Training hyperparameters



The following hyperparameters were used during training:



- learning_rate: 7e-4

- train_batch_size: 32

- eval_batch_size: 8

- seed: 42

- gradient_accumulation_steps: 20

- total_train_batch_size: 640

- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08

- lr_scheduler_type: linear

- lr_scheduler_warmup_steps: 500

- num_epochs: 50

- mixed_precision_training: Native AMP



### Framework versions



- Transformers 4.16.0.dev0

- Pytorch 1.10.1+cu102

- Datasets 1.17.1.dev0

- Tokenizers 0.11.0