marinone94 commited on
Commit
cd9f127
1 Parent(s): af19f2e
Files changed (6) hide show
  1. README.md +13 -20
  2. all_results.json +3 -3
  3. config.json +1 -1
  4. eval_results.json +3 -4
  5. run.sh +2 -1
  6. tokenizer_config.json +1 -1
README.md CHANGED
@@ -7,35 +7,36 @@ tags:
7
  - generated_from_trainer
8
  datasets:
9
  - mozilla-foundation/common_voice_11_0
10
- metrics:
11
- - wer
12
  model-index:
13
  - name: Whisper Medium Swedish
14
  results:
15
  - task:
16
- name: Automatic Speech Recognition
17
  type: automatic-speech-recognition
 
18
  dataset:
19
- name: mozilla-foundation/common_voice_11_0 sv-SE
20
  type: mozilla-foundation/common_voice_11_0
21
  config: sv-SE
22
  split: test
23
- args: sv-SE
24
  metrics:
25
  - name: Wer
26
  type: wer
27
- value: 11.37780883775938
28
  ---
29
 
30
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
31
- should probably proofread and complete it, then remove this comment. -->
32
 
33
  # Whisper Medium Swedish
34
 
35
- This model is a fine-tuned version of [marinone94/whisper-medium-nordic](https://huggingface.co/marinone94/whisper-medium-nordic) on the mozilla-foundation/common_voice_11_0 sv-SE dataset.
36
  It achieves the following results on the evaluation set:
37
- - Loss: 0.2970
38
- - Wer: 11.3778
 
 
 
 
39
 
40
  ## Model description
41
 
@@ -61,17 +62,9 @@ The following hyperparameters were used during training:
61
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
62
  - lr_scheduler_type: linear
63
  - lr_scheduler_warmup_steps: 250
64
- - training_steps: 2500
65
  - mixed_precision_training: Native AMP
66
 
67
- ### Training results
68
-
69
- | Training Loss | Epoch | Step | Validation Loss | Wer |
70
- |:-------------:|:-----:|:----:|:---------------:|:-------:|
71
- | 0.0146 | 3.02 | 1000 | 0.2546 | 11.9423 |
72
- | 0.0017 | 6.04 | 2000 | 0.2970 | 11.3778 |
73
-
74
-
75
  ### Framework versions
76
 
77
  - Transformers 4.26.0.dev0
 
7
  - generated_from_trainer
8
  datasets:
9
  - mozilla-foundation/common_voice_11_0
10
+ - babelbox/babelbox_voice
11
+ - google/fleurs
12
  model-index:
13
  - name: Whisper Medium Swedish
14
  results:
15
  - task:
 
16
  type: automatic-speech-recognition
17
+ name: Automatic Speech Recognition
18
  dataset:
19
+ name: mozilla-foundation/common_voice_11_0
20
  type: mozilla-foundation/common_voice_11_0
21
  config: sv-SE
22
  split: test
 
23
  metrics:
24
  - name: Wer
25
  type: wer
26
+ value: 9.89
27
  ---
28
 
 
 
29
 
30
  # Whisper Medium Swedish
31
 
32
+ This model is a fine-tuned version of [Whisper Medium Nordic](https://huggingface.co/marinone94/whisper-medium-nordic) on the [mozilla-foundation/common_voice_11_0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) (train+validation), the [babelbox/babelbox_voice](https://huggingface.co/datasets/babelbox/babelbox_voice) (NST SV - train split) and the [google/fleurs](https://huggingface.co/datasets/google/fleurs) (sv_se - train+validation+test) datasets.
33
  It achieves the following results on the evaluation set:
34
+ - eval_loss: 0.2483
35
+ - eval_wer: 9.8914
36
+ - eval_runtime: 2924.8709
37
+ - eval_samples_per_second: 1.733
38
+ - eval_steps_per_second: 0.108
39
+ - step: 0
40
 
41
  ## Model description
42
 
 
62
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
63
  - lr_scheduler_type: linear
64
  - lr_scheduler_warmup_steps: 250
65
+ - training_steps: 5000
66
  - mixed_precision_training: Native AMP
67
 
 
 
 
 
 
 
 
 
68
  ### Framework versions
69
 
70
  - Transformers 4.26.0.dev0
all_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 1.0,
3
  "eval_loss": 0.24834245443344116,
4
- "eval_runtime": 2999.4256,
5
- "eval_samples_per_second": 1.69,
6
- "eval_steps_per_second": 0.106,
7
  "eval_wer": 9.891409525857435,
8
  "train_loss": 0.025400285175442697,
9
  "train_runtime": 51804.3597,
 
1
  {
2
  "epoch": 1.0,
3
  "eval_loss": 0.24834245443344116,
4
+ "eval_runtime": 2924.8709,
5
+ "eval_samples_per_second": 1.733,
6
+ "eval_steps_per_second": 0.108,
7
  "eval_wer": 9.891409525857435,
8
  "train_loss": 0.025400285175442697,
9
  "train_runtime": 51804.3597,
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "marinone94/whisper-medium-nordic",
3
  "activation_dropout": 0.0,
4
  "activation_function": "gelu",
5
  "architectures": [
 
1
  {
2
+ "_name_or_path": ".",
3
  "activation_dropout": 0.0,
4
  "activation_function": "gelu",
5
  "architectures": [
eval_results.json CHANGED
@@ -1,8 +1,7 @@
1
  {
2
- "epoch": 1.0,
3
  "eval_loss": 0.24834245443344116,
4
- "eval_runtime": 2999.4256,
5
- "eval_samples_per_second": 1.69,
6
- "eval_steps_per_second": 0.106,
7
  "eval_wer": 9.891409525857435
8
  }
 
1
  {
 
2
  "eval_loss": 0.24834245443344116,
3
+ "eval_runtime": 2924.8709,
4
+ "eval_samples_per_second": 1.733,
5
+ "eval_steps_per_second": 0.108,
6
  "eval_wer": 9.891409525857435
7
  }
run.sh CHANGED
@@ -1,5 +1,5 @@
1
  python run_speech_recognition_seq2seq_streaming.py \
2
- --model_name_or_path="marinone94/whisper-medium-nordic" \
3
  --dataset_train_name="mozilla-foundation/common_voice_11_0,babelbox/babelbox_voice,google/fleurs" \
4
  --dataset_train_config_name="sv-SE,nst,sv_se" \
5
  --language="swedish" \
@@ -30,6 +30,7 @@ python run_speech_recognition_seq2seq_streaming.py \
30
  --load_best_model_at_end \
31
  --gradient_checkpointing \
32
  --fp16 \
 
33
  --predict_with_generate \
34
  --do_normalize_eval \
35
  --streaming \
 
1
  python run_speech_recognition_seq2seq_streaming.py \
2
+ --model_name_or_path="." \
3
  --dataset_train_name="mozilla-foundation/common_voice_11_0,babelbox/babelbox_voice,google/fleurs" \
4
  --dataset_train_config_name="sv-SE,nst,sv_se" \
5
  --language="swedish" \
 
30
  --load_best_model_at_end \
31
  --gradient_checkpointing \
32
  --fp16 \
33
+ --do_eval \
34
  --predict_with_generate \
35
  --do_normalize_eval \
36
  --streaming \
tokenizer_config.json CHANGED
@@ -19,7 +19,7 @@
19
  },
20
  "errors": "replace",
21
  "model_max_length": 1024,
22
- "name_or_path": "marinone94/whisper-medium-nordic",
23
  "pad_token": null,
24
  "processor_class": "WhisperProcessor",
25
  "return_attention_mask": false,
 
19
  },
20
  "errors": "replace",
21
  "model_max_length": 1024,
22
+ "name_or_path": ".",
23
  "pad_token": null,
24
  "processor_class": "WhisperProcessor",
25
  "return_attention_mask": false,