File size: 5,306 Bytes
a856454
52ed255
 
a856454
52ed255
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a856454
52ed255
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
language:
- el
license: apache-2.0
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
metrics:
- wer
model-index:
- name: whisper-sm-el-xs
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: mozilla-foundation/common_voice_11_0 el
      type: mozilla-foundation/common_voice_11_0
      config: el
      split: test
      args: el
    metrics:
    - name: Wer
      type: wer
      value: 20.63521545319465
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Whisper-Small (el) for Transcription

This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the mozilla-foundation/common_voice_11_0 el dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4805
- Wer: 20.6352

## Model description

This model is trained for transcription on the Greek subset on mozilla-foundation/common_voice_11_0 interleaved splits train+eval

## Intended uses & limitations

This is part of the Whisper Finetuning Event (December 2022)

## Training and evaluation data

Training used interleaved splits: train + evaluation. 
Evaluation was done on the test split.
Data was streamed from Hugging Face's Hub.

## Training procedure

The script used has been uploaded in the files of this space
The command to run it was:
```
python ./run_speech_recognition_seq2seq_streaming.py \
                --model_name_or_path   "openai/whisper-small" \
                --model_revision       "main" \
                --do_train             True \
                --do_eval              True \
                --use_auth_token       False \
                --freeze_encoder       False \
                --model_index_name     "whisper-sm-el-xs" \
                --dataset_name         "mozilla-foundation/common_voice_11_0" \
                --dataset_config_name  "el" \
                --audio_column_name    "audio" \
                --text_column_name     "sentence" \
                --max_duration_in_seconds 30 \
                --train_split_name    "train+validation" \
                --eval_split_name      "test" \
                --do_lower_case         False \
                --do_remove_punctuation False \
                --do_normalize_eval     True \
                --language              "greek" \
                --task                  "transcribe" \
                --shuffle_buffer_size   500 \
                --output_dir             "./data/finetuningRuns/whisper-sm-el-xs" \
                --per_device_train_batch_size 16 \
                --gradient_accumulation_steps 4  \
                --learning_rate          1e-5 \
                --warmup_steps           500 \
                --max_steps              5000 \
                --gradient_checkpointing True \
                --fp16                   True \
                --evaluation_strategy    "steps" \
                --per_device_eval_batch_size 8 \
                --predict_with_generate  True \
                --generation_max_length  225 \
                --save_steps             1000 \
                --eval_steps             1000 \
                --logging_steps          25 \
                --report_to              "tensorboard" \
                --load_best_model_at_end True \
                --metric_for_best_model  "wer" \
                --greater_is_better      False \
                --push_to_hub            False \
                --overwrite_output_dir    True 
```
### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 5000
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Wer     |
|:-------------:|:-----:|:----:|:---------------:|:-------:|
| 0.0024        | 18.01 | 1000 | 0.4246          | 21.0438 |
| 0.0003        | 37.01 | 2000 | 0.4805          | 20.6352 |
| 0.0001        | 56.01 | 3000 | 0.5102          | 20.8395 |
| 0.0001        | 75.0  | 4000 | 0.5296          | 21.0717 |
| 0.0001        | 94.0  | 5000 | 0.5375          | 21.0253 |

Here is the summary from the log of the run:

```
***** train metrics *****
  epoch                    =        94.0
  train_loss               =      0.0222
  train_runtime            = 23:06:13.19
  train_samples_per_second =       3.847
  train_steps_per_second   =        0.06
12/08/2022 11:20:17 - INFO - __main__ - *** Evaluate ***

***** eval metrics *****
  epoch                   =       94.0
  eval_loss               =     0.4805
  eval_runtime            = 0:23:03.68
  eval_samples_per_second =      1.226
  eval_steps_per_second   =      0.153
  eval_wer                =    20.6352
Thu 08 Dec 2022 11:43:22 AM EST
```

### Framework versions

- Transformers 4.26.0.dev0
- Pytorch 1.13.0
- Datasets 2.7.1.dev0
- Tokenizers 0.12.1