File size: 4,153 Bytes
457a0fd
 
 
 
 
 
 
0ae7457
 
ecb2fbf
 
457a0fd
cde920d
457a0fd
5287f1b
 
0ae7457
 
5287f1b
 
 
 
 
 
0ae7457
 
 
 
 
 
457a0fd
 
 
 
 
 
 
 
 
a8a5494
 
 
 
 
5931bf3
 
457a0fd
 
 
 
 
 
 
 
 
 
 
c532f98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
457a0fd
 
 
 
 
 
 
eb50e66
 
457a0fd
eb50e66
457a0fd
 
 
 
eb50e66
457a0fd
 
 
 
9e332cb
 
 
 
 
 
 
457a0fd
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
language:
- or
license: apache-2.0
tags:
- automatic-speech-recognition
- generated_from_trainer
- hf-asr-leaderboard
- mozilla-foundation/common_voice_7_0
- or
- robust-speech-event
datasets:
- mozilla-foundation/common_voice_7_0
model-index:
- name: XLS-R-300M - Odia
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 7
      type: mozilla-foundation/common_voice_7_0
      args: or
    metrics:
    - name: Test WER
      type: wer
      value: 97.91
    - name: Test CER
      type: cer
      value: 247.09
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# wav2vec2-large-xls-r-300m-odia

This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_7_0 - OR dataset.
It achieves the following results on the evaluation set:

```
python eval.py --model_id ./ --dataset mozilla-foundation/common_voice_7_0 --config as --split test --log_outputs
```

- WER: 1.0921052631578947
- CER: 2.5547945205479454

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

Training machine details

- Platform: Linux-5.11.0-37-generic-x86_64-with-glibc2.10
- CPU cores: 60
- Python version: 3.8.8
- PyTorch version: 1.10.1+cu102
- GPU is visible: True
- Transformers version: 4.16.0.dev0
- Datasets version: 1.17.1.dev0
- soundfile version: 0.10.3

Training script

```bash
python run_speech_recognition_ctc.py \
	--dataset_name="mozilla-foundation/common_voice_7_0" \
	--model_name_or_path="facebook/wav2vec2-xls-r-300m" \
	--dataset_config_name="or" \
	--output_dir="./wav2vec2-large-xls-r-300m-odia" \
	--overwrite_output_dir \
	--num_train_epochs="120" \
	--per_device_train_batch_size="16" \
	--per_device_eval_batch_size="16" \
	--gradient_accumulation_steps="2" \
	--learning_rate="7.5e-5" \
	--warmup_steps="500" \
	--length_column_name="input_length" \
	--evaluation_strategy="steps" \
	--text_column_name="sentence" \
	--chars_to_ignore , ? . ! \- \; \: \" “ % ‘ ” � — \’ … \– \' \’ \– \
	--save_steps="500" \
	--eval_steps="500" \
	--logging_steps="100" \
	--layerdrop="0.0" \
	--activation_dropout="0.1" \
	--save_total_limit="3" \
	--freeze_feature_encoder \
	--feat_proj_dropout="0.0" \
	--mask_time_prob="0.75" \
	--mask_time_length="10" \
	--mask_feature_prob="0.25" \
	--mask_feature_length="64" \
	--gradient_checkpointing \
	--use_auth_token \
	--fp16 \
	--group_by_length \
	--do_train --do_eval \
  --push_to_hub
```
    

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 7.5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 120.0
- mixed_precision_training: Native AMP

### Training results

|    |   eval_loss |   eval_wer |   eval_runtime |   eval_samples_per_second |   eval_steps_per_second |   epoch |
|---:|------------:|-----------:|---------------:|--------------------------:|------------------------:|--------:|
|  0 |    3.35224  |   0.998972 |         5.0475 |                    22.189 |                   1.387 |   29.41 |
|  1 |    1.33679  |   0.938335 |         5.0633 |                    22.12  |                   1.382 |   58.82 |
|  2 |    0.737202 |   0.957862 |         5.0913 |                    21.998 |                   1.375 |   88.24 |
|  3 |    0.658212 |   0.96814  |         5.0953 |                    21.981 |                   1.374 |  117.65 |
|  4 |    0.658    |   0.9712   |         5.0953 |                    22.115 |                   1.382 |  120    |


### Framework versions

- Transformers 4.16.0.dev0
- Pytorch 1.10.1+cu102
- Datasets 1.17.1.dev0
- Tokenizers 0.11.0