berel_finetuned_on_HB

This model is a fine-tuned version of dicta-il/BEREL on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss
6.0361	0.2153	500	8.6317
8.6649	0.4307	1000	8.6194
8.7148	0.6460	1500	8.6911
8.6618	0.8613	2000	8.6723
8.6132	1.0767	2500	8.6721
8.5388	1.2920	3000	8.5971
8.5592	1.5073	3500	8.6900
8.596	1.7227	4000	8.6105
8.5493	1.9380	4500	8.6614
8.5079	2.1533	5000	8.6734
8.5	2.3686	5500	8.5905
8.4708	2.5840	6000	8.6090
8.4942	2.7993	6500	8.5891
8.4416	3.0146	7000	8.6171
8.5195	3.2300	7500	8.5915
8.4364	3.4453	8000	8.5914
8.4295	3.6606	8500	8.5389
8.4079	3.8760	9000	8.5505
8.4404	4.0913	9500	8.5658
8.4014	4.3066	10000	8.5453
8.4392	4.5220	10500	8.5669
8.3726	4.7373	11000	8.5404
8.3884	4.9526	11500	8.5718