File size: 4,514 Bytes
61178d6
98620d9
 
61178d6
98620d9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61178d6
98620d9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
language:
- el
license: apache-2.0
tags:
- whisper-event
- generated_from_trainer
- whisper-large
- mozilla-foundation/common_voice_11_0
- greek
datasets:
- mozilla-foundation/common_voice_11_0
- google/fleurs
metrics:
- wer
model-index:
- name: whisper-lg-el-intlv-xs-2
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: mozilla-foundation/common_voice_11_0 el
      type: mozilla-foundation/common_voice_11_0
      config: el
      split: test
    metrics:
    - name: Wer
      type: wer
      value: 9.50037147102526
---

# whisper-lg-el-intlv-xs-2

This model is a fine-tuned version of [farsipal/whisper-lg-el-intlv-xs](https://huggingface.co/farsipal/whisper-lg-el-intlv-xs) on the mozilla-foundation/common_voice_11_0,google/fleurs el,el_gr dataset.
It achieves the following results on the evaluation set:
- Loss: 0.2872
- Wer: 9.5004

## Model description

The model was trained on two interleaved datasets for transcription in the Greek language.

## Intended uses & limitations

Transcription in the Greek language

## Training and evaluation data

Training was performed on two interleaved datasets. Testing was performed on common voice 11.0 (el) test only.

## Training procedure
```
                --model_name_or_path   'farsipal/whisper-lg-el-intlv-xs' \
                --model_revision   main \
                --do_train   True \
                --do_eval   True \
                --use_auth_token   False \
                --freeze_feature_encoder   False \
                --freeze_encoder   False \
                --model_index_name   'whisper-lg-el-intlv-xs-2' \
                --dataset_name 'mozilla-foundation/common_voice_11_0,google/fleurs' \
                --dataset_config_name 'el,el_gr' \
                --train_split_name  'train+validation,train+validation' \
                --eval_split_name   'test,-' \
                --text_column_name  'sentence,transcription' \
                --audio_column_name 'audio,audio' \
                --streaming   False \
                --max_duration_in_seconds   30 \
                --do_lower_case   False \
                --do_remove_punctuation   False \
                --do_normalize_eval   True \
                --language   greek \
                --task transcribe \
                --shuffle_buffer_size   500 \
                --output_dir   './data/finetuningRuns/whisper-lg-el-intlv-xs-2' \
                --overwrite_output_dir   True \
                --per_device_train_batch_size   8 \
                --gradient_accumulation_steps  4 \
                --learning_rate   3.5e-6 \
                --dropout         0.15 \
                --attention_dropout 0.05 \
                --warmup_steps   500 \
                --max_steps   5000 \
                --eval_steps   1000 \
                --gradient_checkpointing   True \
                --cache_dir   '~/.cache' \
                --fp16   True \
                --evaluation_strategy   steps \
                --per_device_eval_batch_size   8 \
                --predict_with_generate   True \
                --generation_max_length   225 \
                --save_steps   1000 \
                --logging_steps   25 \
                --report_to   tensorboard \
                --load_best_model_at_end   True \
                --metric_for_best_model   wer \
                --greater_is_better   False \
                --push_to_hub   False  \
                --dataloader_num_workers 6
```

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3.5e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 5000
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Wer     |
|:-------------:|:-----:|:----:|:---------------:|:-------:|
| 0.0813        | 2.49  | 1000 | 0.2147          | 10.8284 |
| 0.0379        | 4.98  | 2000 | 0.2439          | 10.0111 |
| 0.0195        | 7.46  | 3000 | 0.2767          | 9.8811  |
| 0.0126        | 9.95  | 4000 | 0.2872          | 9.5004  |
| 0.0103        | 12.44 | 5000 | 0.3021          | 9.6954  |


### Framework versions

- Transformers 4.26.0.dev0
- Pytorch 1.13.0+cu117
- Datasets 2.8.1.dev0
- Tokenizers 0.13.2