Whisper Medium GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia, augmented with noise dataset. The datasets are augmented in two ways: noise augmentation, and truncating low-amplitude samples. The best model checkpoint (this version) based on ChrF is at step 2900, epoch 0.6349, and it achieves the following results on the evaluation set:

Loss: 1.1883
Bleu: 32.88
Chrf: 51.52
Wer: 62.0441

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 0.02
training_steps: 3000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Chrf	Wer
2.4487	0.0219	100	1.9518	8.34	24.49	117.2445
2.11	0.0438	200	1.6630	15.32	32.12	84.0612
1.9757	0.0657	300	1.5366	10.86	33.42	131.7875
1.7964	0.0876	400	1.4825	19.81	36.71	81.9451
1.6422	0.1095	500	1.4432	18.83	40.4	84.0162
1.3839	0.1314	600	1.4169	24.91	40.87	69.0230
1.352	0.1533	700	1.4340	25.01	41.57	71.5894
1.2434	0.1752	800	1.3813	24.05	41.29	73.7506
1.2223	0.1970	900	1.3578	25.89	41.61	70.5988
1.0414	0.2189	1000	1.3075	27.45	44.17	68.2575
0.9199	0.2408	1100	1.3022	23.14	44.3	84.1513
0.8648	0.2627	1200	1.3050	23.36	43.37	72.4448
0.8469	0.2846	1300	1.2853	28.37	45.97	67.1319
0.7649	0.3065	1400	1.2755	28.56	46.76	66.0964
0.7321	0.3284	1500	1.2750	27.23	46.1	69.3381
0.6541	0.3503	1600	1.2557	30.02	48.06	65.6011
0.6107	0.3722	1700	1.2520	30.41	49.23	64.2954
0.5738	0.3941	1800	1.2435	32.45	50.27	63.4399
0.4983	0.4160	1900	1.2007	31.17	48.58	64.0702
0.4439	0.4379	2000	1.2140	32.29	50.37	60.6033
0.367	0.4598	2100	1.2230	29.54	49.14	67.7172
0.2807	0.4817	2200	1.2277	33.1	51.21	62.9446
0.2621	0.5036	2300	1.2441	30.59	49.49	64.8807
0.2965	0.5255	2400	1.1969	31.82	49.67	63.5299
0.236	0.5473	2500	1.2275	31.17	50.29	65.1959
0.229	0.5692	2600	1.2008	30.02	50.27	70.6439
0.164	0.5911	2700	1.2192	31.37	50.57	63.6200
0.1786	0.6130	2800	1.1965	31.81	50.13	62.8546
0.1987	0.6349	2900	1.1883	32.88	51.52	62.0441
0.1633	0.6568	3000	1.1903	32.01	50.38	62.7645

Framework versions

Transformers 4.40.0
Pytorch 2.0.1+cu118
Datasets 2.19.0
Tokenizers 0.19.1

ymoslem
/

whisper-medium-ga2en-v4

Whisper Medium GA-EN Speech Translation

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ymoslem/whisper-medium-ga2en-v4

Datasets used to train ymoslem/whisper-medium-ga2en-v4

Collection including ymoslem/whisper-medium-ga2en-v4

Speech Translation (Irish-English)

Evaluation results