jonatasgrosman commited on
Commit
5f50cff
1 Parent(s): 466e44d

update README + add evaluation

Browse files
README.md CHANGED
@@ -9,6 +9,7 @@ datasets:
9
  - mozilla-foundation/common_voice_11_0
10
  metrics:
11
  - wer
 
12
  model-index:
13
  - name: Whisper Large Chinese (Mandarin)
14
  results:
@@ -19,76 +20,81 @@ model-index:
19
  name: mozilla-foundation/common_voice_11_0 zh-CN
20
  type: mozilla-foundation/common_voice_11_0
21
  config: zh-CN
22
- split: validation[:1000]
23
  args: zh-CN
24
  metrics:
25
- - name: Wer
26
  type: wer
27
- value: 51.67420814479639
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ---
29
 
30
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
31
- should probably proofread and complete it, then remove this comment. -->
32
-
33
  # Whisper Large Chinese (Mandarin)
34
 
35
- This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on the mozilla-foundation/common_voice_11_0 zh-CN dataset.
36
- It achieves the following results on the evaluation set:
37
- - Loss: 0.2435
38
- - Wer: 51.6742
39
- - Cer: 8.5279
40
-
41
- ## Model description
42
 
43
- More information needed
44
 
45
- ## Intended uses & limitations
46
 
47
- More information needed
48
 
49
- ## Training and evaluation data
 
 
 
50
 
51
- More information needed
 
 
 
 
 
52
 
53
- ## Training procedure
54
 
55
- ### Training hyperparameters
56
 
57
- The following hyperparameters were used during training:
58
- - learning_rate: 5e-06
59
- - train_batch_size: 16
60
- - eval_batch_size: 8
61
- - seed: 42
62
- - gradient_accumulation_steps: 2
63
- - total_train_batch_size: 32
64
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
65
- - lr_scheduler_type: linear
66
- - lr_scheduler_warmup_steps: 2000
67
- - training_steps: 20000
68
- - mixed_precision_training: Native AMP
69
 
70
- ### Training results
71
 
72
- | Training Loss | Epoch | Step | Validation Loss | Wer | Cer |
73
- |:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|
74
- | 0.3314 | 0.83 | 1000 | 0.2110 | 65.7014 | 10.8047 |
75
- | 0.2747 | 1.66 | 2000 | 0.2005 | 58.1900 | 9.4191 |
76
- | 0.1989 | 2.49 | 3000 | 0.1983 | 56.1991 | 9.0939 |
77
- | 0.1142 | 3.31 | 4000 | 0.2076 | 55.0226 | 9.1589 |
78
- | 0.0747 | 4.14 | 5000 | 0.2131 | 56.3801 | 9.0483 |
79
- | 0.0709 | 4.97 | 6000 | 0.2165 | 54.6606 | 8.9768 |
80
- | 0.0432 | 5.8 | 7000 | 0.2222 | 54.0271 | 8.9508 |
81
- | 0.0261 | 6.63 | 8000 | 0.2299 | 54.4796 | 9.0353 |
82
- | 0.0152 | 7.46 | 9000 | 0.2290 | 52.7602 | 8.8076 |
83
- | 0.0054 | 8.28 | 10000 | 0.2435 | 51.6742 | 8.5279 |
84
- | 0.0028 | 9.11 | 11000 | 0.2421 | 53.0317 | 8.9833 |
85
- | 0.0045 | 9.94 | 12000 | 0.2462 | 52.9412 | 8.7751 |
86
- | 0.0016 | 10.77 | 13000 | 0.2501 | 52.3077 | 8.9573 |
87
 
 
 
 
 
 
 
88
 
89
- ### Framework versions
90
 
91
- - Transformers 4.26.0.dev0
92
- - Pytorch 1.13.1+cu117
93
- - Datasets 2.7.1.dev0
94
- - Tokenizers 0.13.2
 
 
 
 
 
 
9
  - mozilla-foundation/common_voice_11_0
10
  metrics:
11
  - wer
12
+ - cer
13
  model-index:
14
  - name: Whisper Large Chinese (Mandarin)
15
  results:
20
  name: mozilla-foundation/common_voice_11_0 zh-CN
21
  type: mozilla-foundation/common_voice_11_0
22
  config: zh-CN
23
+ split: test
24
  args: zh-CN
25
  metrics:
26
+ - name: WER
27
  type: wer
28
+ value: 55.02141421204441
29
+ - name: CER
30
+ type: cer
31
+ value: 9.550758567294045
32
+ - task:
33
+ name: Automatic Speech Recognition
34
+ type: automatic-speech-recognition
35
+ dataset:
36
+ name: google/fleurs cmn_hans_cn
37
+ type: google/fleurs
38
+ config: cmn_hans_cn
39
+ split: test
40
+ args: cmn_hans_cn
41
+ metrics:
42
+ - name: WER
43
+ type: wer
44
+ value: 70.62596203181118
45
+ - name: CER
46
+ type: cer
47
+ value: 11.761282471826888
48
  ---
49
 
 
 
 
50
  # Whisper Large Chinese (Mandarin)
51
 
52
+ This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on Chinese (Mandarin) using the train and validation splits of [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0). Not all validation split data were used during training, I extracted 1k samples from the validation split to be used for evaluation during fine-tuning. When using this model, make sure that your speech input is sampled at 16kHz.
 
 
 
 
 
 
53
 
54
+ ## Usage
55
 
56
+ ```python
57
 
58
+ from transformers import pipeline
59
 
60
+ transcriber = pipeline(
61
+ "automatic-speech-recognition",
62
+ model="jonatasgrosman/whisper-large-zh-cv11"
63
+ )
64
 
65
+ transcriber.model.config.forced_decoder_ids = (
66
+ transcriber.tokenizer.get_decoder_prompt_ids(
67
+ language="zh"
68
+ task="transcribe"
69
+ )
70
+ )
71
 
72
+ transcription = transcriber("path/to/my_audio.wav")
73
 
74
+ ```
75
 
76
+ ## Evaluation
 
 
 
 
 
 
 
 
 
 
 
77
 
78
+ We perform evaluation of the model using the test split of two datasets, the [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) (same dataset used for the fine-tuning) and the [Fleurs](https://huggingface.co/datasets/google/fleurs) (dataset not seen during the fine-tuning). As Whisper can transcribe casing and punctuation, I performed the model evaluation in 2 different scenarios, one using the raw text and the other using the normalized text (lowercase + removal of punctuations). Additionally, for the Fleurs dataset, I evaluated the model in a scenario where there are no transcriptions of numerical values since the way these values are described in this dataset is different from how they are described in the dataset used in fine-tuning (Common Voice), so it is expected that this difference in the way of describing numerical values will affect the performance of the model for this type of transcription in Fleurs.
79
 
80
+ ### Common Voice 11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
+ | | CER | WER |
83
+ | --- | --- | --- |
84
+ | [jonatasgrosman/whisper-large-zh-cv11](https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11) | 9.31 | 55.94 |
85
+ | [jonatasgrosman/whisper-large-zh-cv11](https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11) + text normalization | 9.55 | 55.02 |
86
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 33.33 | 101.80 |
87
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization | 29.90 | 95.91 |
88
 
89
+ ### Fleurs
90
 
91
+ | | CER | WER |
92
+ | --- | --- | --- |
93
+ | [jonatasgrosman/whisper-large-zh-cv11](https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11) | 15.00 | 93.45 |
94
+ | [jonatasgrosman/whisper-large-zh-cv11](https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11) + text normalization | 11.76 | 70.63 |
95
+ | [jonatasgrosman/whisper-large-zh-cv11](https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11) + keep only non-numeric samples | 10.95 | 87.91 |
96
+ | [jonatasgrosman/whisper-large-zh-cv11](https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11) + text normalization + keep only non-numeric samples | 7.83 | 62.12 |
97
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 23.49 | 101.28 |
98
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization | 17.58 | 83.22 |
99
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + keep only non-numeric samples | 21.03 | 101.95 |
100
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization + keep only non-numeric samples | 15.22 | 79.28 |
evaluation_cv11_test.json CHANGED
@@ -2,6 +2,8 @@
2
  "raw": {
3
  "cer": 0.09311360578930278,
4
  "wer": 0.5594405594405595,
 
 
5
  "references": [
6
  "否",
7
  "宋朝末年年间定居粉岭围。",
@@ -21172,6 +21174,8 @@
21172
  "normalized": {
21173
  "cer": 0.09550758567294045,
21174
  "wer": 0.5502141421204441,
 
 
21175
  "references": [
21176
  "否",
21177
  "宋朝末年年间定居粉岭围",
2
  "raw": {
3
  "cer": 0.09311360578930278,
4
  "wer": 0.5594405594405595,
5
+ "non_numeric_samples_cer": 0.09311360578930278,
6
+ "non_numeric_samples_wer": 0.5594405594405595,
7
  "references": [
8
  "否",
9
  "宋朝末年年间定居粉岭围。",
21174
  "normalized": {
21175
  "cer": 0.09550758567294045,
21176
  "wer": 0.5502141421204441,
21177
+ "non_numeric_samples_cer": 0.09550758567294045,
21178
+ "non_numeric_samples_wer": 0.5502141421204441,
21179
  "references": [
21180
  "否",
21181
  "宋朝末年年间定居粉岭围",
evaluation_fleurs_test.json CHANGED
@@ -2,6 +2,8 @@
2
  "raw": {
3
  "cer": 0.1500187149095446,
4
  "wer": 0.9344808439755691,
 
 
5
  "references": [
6
  "1940 年 8 月 15 日,盟军攻入法国南部,这次进攻被称为“龙骑兵行动”。",
7
  "该群岛位于南极半岛以北 120 公里处。最大的岛屿是乔治国王岛,这里是“繁星村(Villa Las Estrellas)”的定居点。",
@@ -1900,6 +1902,8 @@
1900
  "normalized": {
1901
  "cer": 0.11761282471826888,
1902
  "wer": 0.7062596203181118,
 
 
1903
  "references": [
1904
  "1940 年 8 月 15 日 盟军攻入法国南部 这次进攻被称为 龙骑兵行动",
1905
  "该群岛位于南极半岛以北 120 公里处 最大的岛屿是乔治国王岛 这里是 繁星村 villa las estrellas 的定居点",
2
  "raw": {
3
  "cer": 0.1500187149095446,
4
  "wer": 0.9344808439755691,
5
+ "non_numeric_samples_cer": 0.10947220549869556,
6
+ "non_numeric_samples_wer": 0.8790983606557377,
7
  "references": [
8
  "1940 年 8 月 15 日,盟军攻入法国南部,这次进攻被称为“龙骑兵行动”。",
9
  "该群岛位于南极半岛以北 120 公里处。最大的岛屿是乔治国王岛,这里是“繁星村(Villa Las Estrellas)”的定居点。",
1902
  "normalized": {
1903
  "cer": 0.11761282471826888,
1904
  "wer": 0.7062596203181118,
1905
+ "non_numeric_samples_cer": 0.07828692280578076,
1906
+ "non_numeric_samples_wer": 0.6211941478845393,
1907
  "references": [
1908
  "1940 年 8 月 15 日 盟军攻入法国南部 这次进攻被称为 龙骑兵行动",
1909
  "该群岛位于南极半岛以北 120 公里处 最大的岛屿是乔治国王岛 这里是 繁星村 villa las estrellas 的定居点",
evaluation_whisper-large-v2_cv11_test.json CHANGED
@@ -2,6 +2,8 @@
2
  "raw": {
3
  "cer": 0.33334879621468666,
4
  "wer": 1.0180495180495182,
 
 
5
  "references": [
6
  "否",
7
  "宋朝末年年间定居粉岭围。",
@@ -21172,6 +21174,8 @@
21172
  "normalized": {
21173
  "cer": 0.299039578225524,
21174
  "wer": 0.9590944847478368,
 
 
21175
  "references": [
21176
  "否",
21177
  "宋朝末年年间定居粉岭围",
2
  "raw": {
3
  "cer": 0.33334879621468666,
4
  "wer": 1.0180495180495182,
5
+ "non_numeric_samples_cer": 0.33334879621468666,
6
+ "non_numeric_samples_wer": 1.0180495180495182,
7
  "references": [
8
  "否",
9
  "宋朝末年年间定居粉岭围。",
21174
  "normalized": {
21175
  "cer": 0.299039578225524,
21176
  "wer": 0.9590944847478368,
21177
+ "non_numeric_samples_cer": 0.299039578225524,
21178
+ "non_numeric_samples_wer": 0.9590944847478368,
21179
  "references": [
21180
  "否",
21181
  "宋朝末年年间定居粉岭围",
evaluation_whisper-large-v2_fleurs_test.json CHANGED
@@ -2,6 +2,8 @@
2
  "raw": {
3
  "cer": 0.23488459139114162,
4
  "wer": 1.0127706829539145,
 
 
5
  "references": [
6
  "1940 年 8 月 15 日,盟军攻入法国南部,这次进攻被称为“龙骑兵行动”。",
7
  "该群岛位于南极半岛以北 120 公里处。最大的岛屿是乔治国王岛,这里是“繁星村(Villa Las Estrellas)”的定居点。",
@@ -1900,6 +1902,8 @@
1900
  "normalized": {
1901
  "cer": 0.17581080366118196,
1902
  "wer": 0.8322216521292971,
 
 
1903
  "references": [
1904
  "1940 年 8 月 15 日 盟军攻入法国南部 这次进攻被称为 龙骑兵行动",
1905
  "该群岛位于南极半岛以北 120 公里处 最大的岛屿是乔治国王岛 这里是 繁星村 villa las estrellas 的定居点",
2
  "raw": {
3
  "cer": 0.23488459139114162,
4
  "wer": 1.0127706829539145,
5
+ "non_numeric_samples_cer": 0.21031507124222357,
6
+ "non_numeric_samples_wer": 1.0194672131147542,
7
  "references": [
8
  "1940 年 8 月 15 日,盟军攻入法国南部,这次进攻被称为“龙骑兵行动”。",
9
  "该群岛位于南极半岛以北 120 公里处。最大的岛屿是乔治国王岛,这里是“繁星村(Villa Las Estrellas)”的定居点。",
1902
  "normalized": {
1903
  "cer": 0.17581080366118196,
1904
  "wer": 0.8322216521292971,
1905
+ "non_numeric_samples_cer": 0.15216778286922805,
1906
+ "non_numeric_samples_wer": 0.7928034796362199,
1907
  "references": [
1908
  "1940 年 8 月 15 日 盟军攻入法国南部 这次进攻被称为 龙骑兵行动",
1909
  "该群岛位于南极半岛以北 120 公里处 最大的岛屿是乔治国王岛 这里是 繁星村 villa las estrellas 的定居点",