20230822185044

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3482
Accuracy: 0.4729

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.3580	0.5379
0.5102	2.0	624	0.3670	0.5415
0.5102	3.0	936	0.4888	0.4765
0.4569	4.0	1248	0.3742	0.4982
0.4403	5.0	1560	0.3796	0.5379
0.4403	6.0	1872	0.3602	0.5776
0.4215	7.0	2184	0.4013	0.5415
0.4215	8.0	2496	0.3596	0.5884
0.4166	9.0	2808	0.3447	0.5487
0.3885	10.0	3120	0.3395	0.6101
0.3885	11.0	3432	0.3395	0.6354
0.3776	12.0	3744	0.3568	0.5343
0.4274	13.0	4056	0.5923	0.4729
0.4274	14.0	4368	0.3503	0.5668
0.4138	15.0	4680	0.3605	0.5523
0.4138	16.0	4992	0.3491	0.5451
0.4025	17.0	5304	0.3728	0.5379
0.394	18.0	5616	0.4029	0.4729
0.394	19.0	5928	0.3682	0.4729
0.3892	20.0	6240	0.3484	0.5054
0.3839	21.0	6552	0.3485	0.4765
0.3839	22.0	6864	0.3467	0.5343
0.3782	23.0	7176	0.3471	0.5307
0.3782	24.0	7488	0.3565	0.4693
0.3757	25.0	7800	0.3483	0.5343
0.3737	26.0	8112	0.3495	0.5271
0.3737	27.0	8424	0.3550	0.4729
0.3724	28.0	8736	0.3544	0.4729
0.3696	29.0	9048	0.3478	0.5307
0.3696	30.0	9360	0.3519	0.5271
0.3693	31.0	9672	0.3515	0.5271
0.3693	32.0	9984	0.3487	0.4729
0.3674	33.0	10296	0.3492	0.5379
0.3628	34.0	10608	0.3555	0.4729
0.3628	35.0	10920	0.3550	0.4729
0.3635	36.0	11232	0.3686	0.4729
0.3636	37.0	11544	0.3488	0.4801
0.3636	38.0	11856	0.3484	0.4874
0.3595	39.0	12168	0.3477	0.4910
0.3595	40.0	12480	0.3486	0.5307
0.3598	41.0	12792	0.3488	0.4801
0.3594	42.0	13104	0.3614	0.4729
0.3594	43.0	13416	0.3476	0.5199
0.3586	44.0	13728	0.3482	0.4729
0.3581	45.0	14040	0.3519	0.4729
0.3581	46.0	14352	0.3494	0.4729
0.3579	47.0	14664	0.3613	0.4729
0.3579	48.0	14976	0.3480	0.4729
0.3573	49.0	15288	0.3480	0.4729
0.3564	50.0	15600	0.3487	0.4729
0.3564	51.0	15912	0.3529	0.4729
0.3561	52.0	16224	0.3515	0.4729
0.3554	53.0	16536	0.3475	0.4946
0.3554	54.0	16848	0.3489	0.5271
0.3535	55.0	17160	0.3488	0.4729
0.3535	56.0	17472	0.3478	0.5018
0.3542	57.0	17784	0.3491	0.4729
0.354	58.0	18096	0.3485	0.4729
0.354	59.0	18408	0.3483	0.4729
0.3529	60.0	18720	0.3482	0.4729

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822185044

20230822185044

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822185044

Evaluation results