mms-1b-toigen-baseline-model

This model is a fine-tuned version of facebook/mms-1b-all on the TOIGEN - TOI dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 30.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Wer
13.6402	0.8163	100	3.4942	1.0229
3.8366	1.6286	200	0.5119	0.5733
1.0697	2.4408	300	0.3992	0.5292
0.8653	3.2531	400	0.3590	0.4804
0.8315	4.0653	500	0.3377	0.4521
0.7544	4.8816	600	0.3275	0.4396
0.7324	5.6939	700	0.3208	0.4233
0.6181	6.5061	800	0.3130	0.4154
0.7027	7.3184	900	0.3070	0.4075
0.6316	8.1306	1000	0.3032	0.4046
0.6276	8.9469	1100	0.2943	0.4037
0.6312	9.7592	1200	0.2938	0.4042
0.5482	10.5714	1300	0.2933	0.4004
0.5582	11.3837	1400	0.2979	0.3946
0.594	12.1959	1500	0.2907	0.3904
0.5565	13.0082	1600	0.2973	0.3912
0.5375	13.8245	1700	0.2907	0.385
0.5488	14.6367	1800	0.2880	0.3842
0.5245	15.4490	1900	0.2902	0.3833
0.524	16.2612	2000	0.2874	0.3692
0.5159	17.0735	2100	0.2849	0.3767
0.4802	17.8898	2200	0.2927	0.3738
0.481	18.7020	2300	0.2876	0.3792
0.5105	19.5143	2400	0.2842	0.3754
0.5144	20.3265	2500	0.2845	0.3692
0.4674	21.1388	2600	0.2840	0.3658
0.4604	21.9551	2700	0.2855	0.37
0.4823	22.7673	2800	0.2852	0.3717
0.4502	23.5796	2900	0.2833	0.3708
0.4545	24.3918	3000	0.2825	0.3717
0.4799	25.2041	3100	0.2825	0.3638
0.4552	26.0163	3200	0.2848	0.3679
0.4415	26.8327	3300	0.2843	0.3625
0.4331	27.6449	3400	0.2850	0.3613
0.4373	28.4571	3500	0.2843	0.3617