File size: 2,297 Bytes
128fb53
dcfda80
 
128fb53
dcfda80
 
 
ff76274
dcfda80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ff76274
 
128fb53
dcfda80
ff76274
 
 
 
 
cb5737c
ffebdb0
dcfda80
ffebdb0
3213906
 
 
 
 
dcfda80
cb5737c
 
 
 
 
dcfda80
 
ffebdb0
dcfda80
ff76274
cb5737c
 
 
 
 
dcfda80
cb5737c
 
 
 
 
dcfda80
ff76274
cb5737c
 
 
 
 
dcfda80
cb5737c
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
language:
- fi
license: apache-2.0
tags:
- whisper-event
- finnish
- speech-recognition
datasets:
- mozilla-foundation/common_voice_11_0
- google/fleurs
metrics:
- wer
- cer
model-index:
- name: Whisper Large V3 Finnish
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 11.0
      type: mozilla-foundation/common_voice_11_0
      config: fi
      split: test
      args: fi
    metrics:
    - name: Wer
      type: wer
      value: 8.23
    - name: Cer
      type: cer
      value: 1.43
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: fi_fi
      split: test
      args: fi_fi
    metrics:
    - name: Wer
      type: wer
      value: 8.21
    - name: Cer
      type: cer
      value: 3.23
library_name: transformers
pipeline_tag: automatic-speech-recognition
---

<h3>This is our improved Whisper v3 model that is now finetuned from OpenAI Whisper Large V3 </h3>
<p>We improve from our previously finetuned Whisper V2 model in the following manner<a>https://huggingface.co/Finnish-NLP/whisper-large-v2-finnish</a> </p>
<p>CV11 (Common Voice 11 test set) WER (Word error rate) 10.42 --> 8.23</p>
<p>Fleurs (A speech recognition test set by Google) WER (Word error rate) 10.20 --> 8.21</p>
<p>Model was trained on Nvidia RTX4080 for 32k steps with batch size 8, gradient accumulation 2</p>

<br>

<h3> Original OpenAI Whisper Large V3</h3>
- CV11
  - WER: 14.81
  - WER NORMALIZED: 10.82
  - CER: 2.7
  - CER NORMALIZED: 2.07

- Fleurs
  - WER: 12.04
  - WER NORMALIZED: 9.63
  - CER: 2.48
  - CER NORMALIZED: 3.64


<h3> After Finetuning with Finnish data our V3 got these scores on the test set:</h3>

  - @14000 finetuning steps
    - CV11
      - WER: 11.36
      - WER NORMALIZED: 8.31
      - CER: 1.93
      - CER NORMALIZED: 1.48

    - Fleurs
      - WER: 10.2
      - WER NORMALIZED: 8.56
      - CER: 2.26
      - CER NORMALIZED: 3.54

  - @32000 finetuning steps
    - CV11
      - WER: 11.47
      - WER NORMALIZED: 8.23
      - CER: 1.91
      - CER NORMALIZED: 1.43

    - Fleurs
      - WER: 10.1
      - WER NORMALIZED: 8.21
      - CER: 2.2
      - CER NORMALIZED: 3.23