File size: 4,665 Bytes
35ba9b4
 
31ef44f
35ba9b4
 
 
 
e239037
35ba9b4
ffc0566
 
 
 
 
 
 
 
 
 
e239037
 
 
 
 
 
 
 
0ce4a96
 
35ba9b4
 
 
 
 
e87b7b2
35ba9b4
ea12842
35ba9b4
 
 
 
 
 
 
 
5e72f3c
35ba9b4
 
 
5e72f3c
35ba9b4
 
 
ea12842
35ba9b4
 
 
5e72f3c
 
35ba9b4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4e7858d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
license: apache-2.0
base_model: openai/whisper-small
tags:
- generated_from_trainer
metrics:
- wer
- cer
model-index:
- name: whisper-small
  results:
  - task:
      name: Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 15
      type: artyomboyko/common_voice_15_0_RU
      args: ru
    metrics:
    - name: Test WER
      type: wer
      value: 12.675
    - name: Test CER
      type: cer
      value: 3.7305
language:
- ru
datasets:
- artyomboyko/common_voice_15_0_RU
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Whisper-small-ru-v2

This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on an Russian part of the Common Voice 15 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1329
- Wer: 12.6750
- Cer: 3.7305
- Learning Rate: 0.0000

## Model description

Same as [openai/whisper-small](https://huggingface.co/openai/whisper-small).

## Intended uses & limitations

Same as [openai/whisper-small](https://huggingface.co/openai/whisper-small)

## Training and evaluation data

Fine-tunned on an [Russian part of the Common Voice 15 dataset](https://huggingface.co/datasets/artyomboyko/common_voice_15_0_RU).

## Training procedure

According to the article ["Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers"](https://huggingface.co/blog/fine-tune-whisper)

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-08
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 250
- training_steps: 15000
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Wer     | Cer    | Rate   |
|:-------------:|:-----:|:-----:|:---------------:|:-------:|:------:|:------:|
| 0.0661        | 0.09  | 500   | 0.1358          | 12.9097 | 3.8217 | 0.0000 |
| 0.0616        | 0.17  | 1000  | 0.1357          | 12.9620 | 3.8949 | 0.0000 |
| 0.0601        | 0.26  | 1500  | 0.1357          | 12.8795 | 3.8225 | 0.0000 |
| 0.0666        | 0.35  | 2000  | 0.1353          | 12.9481 | 3.8871 | 0.0000 |
| 0.0669        | 0.43  | 2500  | 0.1352          | 12.8284 | 3.8283 | 0.0000 |
| 0.0665        | 0.52  | 3000  | 0.1351          | 12.8203 | 3.7833 | 0.0000 |
| 0.0649        | 0.61  | 3500  | 0.1349          | 12.8098 | 3.7824 | 0.0000 |
| 0.0607        | 0.69  | 4000  | 0.1347          | 12.8110 | 3.8105 | 0.0000 |
| 0.0636        | 0.78  | 4500  | 0.1345          | 12.7994 | 3.7893 | 0.0000 |
| 0.063         | 0.87  | 5000  | 0.1342          | 12.8319 | 3.8084 | 0.0000 |
| 0.0589        | 0.95  | 5500  | 0.1341          | 12.8807 | 3.8551 | 0.0000 |
| 0.0734        | 1.04  | 6000  | 0.1341          | 12.7691 | 3.7604 | 0.0000 |
| 0.0577        | 1.13  | 6500  | 0.1340          | 12.7645 | 3.7602 | 0.0000 |
| 0.052         | 1.21  | 7000  | 0.1340          | 12.7610 | 3.7655 | 0.0000 |
| 0.0626        | 1.3   | 7500  | 0.1339          | 12.7657 | 3.7593 | 0.0000 |
| 0.0617        | 1.39  | 8000  | 0.1338          | 12.7912 | 3.8268 | 0.0000 |
| 0.063         | 1.47  | 8500  | 0.1337          | 12.7343 | 3.7573 | 0.0000 |
| 0.0668        | 1.56  | 9000  | 0.1336          | 12.7308 | 3.7198 | 0.0000 |
| 0.0634        | 1.65  | 9500  | 0.1335          | 12.7215 | 3.7400 | 0.0000 |
| 0.0604        | 1.73  | 10000 | 0.1333          | 12.7192 | 3.7515 | 0.0000 |
| 0.0707        | 1.82  | 10500 | 0.1333          | 12.7052 | 3.7568 | 0.0000 |
| 0.0639        | 1.91  | 11000 | 0.1332          | 12.6983 | 3.7617 | 0.0000 |
| 0.0617        | 1.99  | 11500 | 0.1331          | 12.6936 | 3.7402 | 0.0000 |
| 0.0601        | 2.08  | 12000 | 0.1330          | 12.6901 | 3.7586 | 0.0000 |
| 0.0632        | 2.17  | 12500 | 0.1330          | 12.6785 | 3.7279 | 0.0000 |
| 0.0626        | 2.25  | 13000 | 0.1330          | 12.6808 | 3.7333 | 0.0000 |
| 0.066         | 2.34  | 13500 | 0.1329          | 12.6704 | 3.7512 | 0.0000 |
| 0.0674        | 2.42  | 14000 | 0.1329          | 12.6599 | 3.7384 | 0.0000 |
| 0.0637        | 2.51  | 14500 | 0.1329          | 12.6797 | 3.7428 | 0.0000 |
| 0.0641        | 2.6   | 15000 | 0.1329          | 12.6750 | 3.7305 | 0.0000 |


### Framework versions

- Transformers 4.36.0.dev0
- Pytorch 2.1.1+cu121
- Datasets 2.15.0
- Tokenizers 0.15.0