berel_finetuned_on_HB_3_epochs

This model is a fine-tuned version of dicta-il/BEREL on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3

Training Loss	Epoch	Step	Validation Loss
7.139	0.2153	500	8.6571
8.6879	0.4307	1000	8.6309
8.6526	0.6460	1500	8.6215
8.631	0.8613	2000	8.6143
8.7418	1.0767	2500	8.6499
8.6406	1.2920	3000	8.6562
8.577	1.5073	3500	8.5696
8.6103	1.7227	4000	8.6186
8.5473	1.9380	4500	8.5872
8.4976	2.1533	5000	8.5632
8.4777	2.3686	5500	8.6354
8.4708	2.5840	6000	8.5897
8.4545	2.7993	6500	8.5156