File size: 3,107 Bytes
d84f4ed
d73da22
ce64b72
61752de
d84f4ed
d73da22
 
d84f4ed
5ab43de
d84f4ed
5ab43de
d84f4ed
5ab43de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d84f4ed
 
 
d73da22
5ab43de
d73da22
 
d84f4ed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6caeebe
628520f
 
d84f4ed
074846f
628520f
d84f4ed
 
6caeebe
3042312
d84f4ed
 
 
 
55a2553
 
6caeebe
 
 
 
 
 
 
 
 
d84f4ed
 
 
 
628520f
 
6caeebe
d84f4ed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
language:
- sv
license: cc0-1.0
tags:
- automatic-speech-recognition
- mozilla-foundation/common_voice_9_0
- generated_from_trainer
- sv
datasets:
- mozilla-foundation/common_voice_9_0
model-index:
- name: XLS-R-300M - Swedish
  results:
  - task: 
      name: Automatic Speech Recognition 
      type: automatic-speech-recognition
    dataset:
      name: mozilla-foundation/common_voice_9_0
      type: mozilla-foundation/common_voice_9_0
      split: test
      args: sv-SE
      WER: 
    metrics:
       - name: Test WER
         type: wer
         value: 7.72
       - name: Test CER
         type: cer
         value: 2.61
  - task: 
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: speech-recognition-community-v2/dev_data
      type: speech-recognition-community-v2/dev_data
      split: validation
      args: sv
    metrics:
       - name: Test WER
         type: wer
         value: 16.23
       - name: Test CER
         type: cer
         value: 8.21
  - task: 
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: speech-recognition-community-v2/dev_data
      type: speech-recognition-community-v2/dev_data
      split: test
      args: sv
    metrics:
       - name: Test WER
         type: wer
         value: 15.08
       - name: Test CER
         type: cer
         value: 7.51
---
# 

This model is a fine-tuned version of [KBLab/wav2vec2-large-voxrex](https://huggingface.co/KBLab/wav2vec2-large-voxrex) on the MOZILLA-FOUNDATION/COMMON_VOICE_9_0 - SV-SE dataset.
It achieves the following results on the evaluation set ("test" split, without LM):
- Loss: 0.1318
- Wer: 0.1121

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 7.5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 100.0
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Wer    |
|:-------------:|:-----:|:----:|:---------------:|:------:|
| 2.9099        | 10.42 | 1000 | 2.8369          | 1.0    |
| 1.0745        | 20.83 | 2000 | 0.1957          | 0.1673 |
| 0.934         | 31.25 | 3000 | 0.1579          | 0.1389 |
| 0.8691        | 41.66 | 4000 | 0.1457          | 0.1290 |
| 0.8328        | 52.08 | 5000 | 0.1435          | 0.1205 |
| 0.8068        | 62.5  | 6000 | 0.1350          | 0.1191 |
| 0.7822        | 72.91 | 7000 | 0.1347          | 0.1155 |
| 0.7769        | 83.33 | 8000 | 0.1321          | 0.1131 |
| 0.7678        | 93.75 | 9000 | 0.1321          | 0.1115 |


### Framework versions

- Transformers 4.17.0.dev0
- Pytorch 1.10.2+cu102
- Datasets 2.2.2
- Tokenizers 0.11.0