Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia datasets. The datasets are augmented in two ways: noise augmentation, and truncating low-amplitude samples. The best model checkpoint (this version) based on ChrF is at step 2800, epoch 1.2259, and it achieves the following results on the evaluation set:

Loss: 1.3547
Bleu: 32.57
Chrf: 47.04
Wer: 62.0891

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Hardware

1 NVIDIA A100-SXM4-80GB

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 0
training_steps: 3000
mixed_precision_training: Native AMP
generation_max_length: 225

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Chrf	Wer
2.3533	0.0438	100	1.7789	6.29	25.08	148.7618
1.9035	0.0876	200	1.5122	18.21	34.02	85.6821
1.5357	0.1313	300	1.3983	14.01	33.7	93.3363
1.3056	0.1751	400	1.3447	18.12	37.35	95.0023
1.1177	0.2189	500	1.3168	18.47	38.44	95.3624
0.984	0.2627	600	1.3202	26.82	41.23	67.3120
0.8945	0.3065	700	1.2947	26.73	42.53	67.1319
0.7508	0.3503	800	1.2476	25.67	42.06	74.2008
0.7127	0.3940	900	1.2630	22.59	41.05	75.7767
0.5944	0.4378	1000	1.2726	22.37	40.31	82.4854
0.4972	0.4816	1100	1.2898	22.88	42.52	82.5304
0.4517	0.5254	1200	1.2509	27.99	44.42	64.1603
0.3885	0.5692	1300	1.2887	29.58	44.8	63.1247
0.3337	0.6130	1400	1.2645	30.05	45.5	62.6294
0.2852	0.6567	1500	1.2972	28.2	43.57	68.6628
0.2583	0.7005	1600	1.2716	28.21	45.06	73.6155
0.2016	0.7443	1700	1.3346	27.55	43.21	74.3809
0.1883	0.7881	1800	1.3124	21.45	41.83	94.1018
0.1514	0.8319	1900	1.3178	28.2	44.09	63.7551
0.1311	0.8757	2000	1.3246	27.33	43.25	74.3359
0.1128	0.9194	2100	1.3464	25.21	42.93	83.2508
0.0994	0.9632	2200	1.3315	30.51	45.74	64.7456
0.0512	1.0070	2300	1.3377	30.89	46.44	63.3498
0.0447	1.0508	2400	1.3587	28.72	44.36	64.3404
0.0368	1.0946	2500	1.3619	31.53	46.56	61.9541
0.0281	1.1384	2600	1.3596	30.98	46.45	70.4638
0.0273	1.1821	2700	1.3656	32.09	46.85	62.1792
0.0287	1.2259	2800	1.3547	32.57	47.04	62.0891
0.025	1.2697	2900	1.3539	26.94	45.43	81.1796
0.0263	1.3135	3000	1.3512	30.11	46.73	71.4993

Framework versions

Transformers 4.40.2
Pytorch 2.2.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

ymoslem
/

whisper-small-ga2en-v5.2.1-r

Whisper Small GA-EN Speech Translation

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Hardware

Training hyperparameters

Training results

Framework versions

Model tree for ymoslem/whisper-small-ga2en-v5.2.1-r

Datasets used to train ymoslem/whisper-small-ga2en-v5.2.1-r

Evaluation results