20230822155557

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3488
Accuracy: 0.5307

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.3548	0.4729
0.3737	2.0	624	0.3480	0.5199
0.3737	3.0	936	0.3486	0.5162
0.3718	4.0	1248	0.3495	0.5235
0.3714	5.0	1560	0.3505	0.4729
0.3714	6.0	1872	0.3487	0.5235
0.3686	7.0	2184	0.3496	0.4729
0.3686	8.0	2496	0.3505	0.4729
0.3684	9.0	2808	0.3502	0.5235
0.3679	10.0	3120	0.3491	0.5054
0.3679	11.0	3432	0.3515	0.4729
0.3659	12.0	3744	0.3496	0.5162
0.3649	13.0	4056	0.3517	0.4729
0.3649	14.0	4368	0.3543	0.4729
0.3651	15.0	4680	0.3513	0.4729
0.3651	16.0	4992	0.3489	0.5235
0.363	17.0	5304	0.3537	0.5235
0.3613	18.0	5616	0.3487	0.5307
0.3613	19.0	5928	0.3495	0.5126
0.3645	20.0	6240	0.3530	0.5199
0.359	21.0	6552	0.3497	0.5235
0.359	22.0	6864	0.3487	0.5235
0.3614	23.0	7176	0.3511	0.5235
0.3614	24.0	7488	0.3491	0.5271
0.3617	25.0	7800	0.3493	0.5199
0.3611	26.0	8112	0.3491	0.5271
0.3611	27.0	8424	0.3581	0.4729
0.3583	28.0	8736	0.3496	0.5343
0.3583	29.0	9048	0.3492	0.5162
0.3583	30.0	9360	0.3493	0.4404
0.3564	31.0	9672	0.3494	0.5343
0.3564	32.0	9984	0.3489	0.5199
0.3567	33.0	10296	0.3490	0.5343
0.3561	34.0	10608	0.3486	0.5271
0.3561	35.0	10920	0.3492	0.5307
0.3556	36.0	11232	0.3503	0.4765
0.3556	37.0	11544	0.3497	0.5307
0.3556	38.0	11856	0.3494	0.5379
0.3561	39.0	12168	0.3488	0.5235
0.3561	40.0	12480	0.3503	0.5271
0.3558	41.0	12792	0.3489	0.5343
0.3579	42.0	13104	0.3508	0.4729
0.3579	43.0	13416	0.3505	0.5271
0.3547	44.0	13728	0.3493	0.5379
0.3567	45.0	14040	0.3519	0.4729
0.3567	46.0	14352	0.3497	0.4729
0.3548	47.0	14664	0.3499	0.4729
0.3548	48.0	14976	0.3492	0.5343
0.3563	49.0	15288	0.3491	0.5307
0.3552	50.0	15600	0.3489	0.5235
0.3552	51.0	15912	0.3487	0.5162
0.3557	52.0	16224	0.3496	0.4513
0.3555	53.0	16536	0.3488	0.5307
0.3555	54.0	16848	0.3489	0.5271
0.3542	55.0	17160	0.3488	0.5162
0.3542	56.0	17472	0.3488	0.5343
0.3545	57.0	17784	0.3494	0.5379
0.3543	58.0	18096	0.3489	0.5126
0.3543	59.0	18408	0.3489	0.5162
0.3553	60.0	18720	0.3488	0.5307

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822155557

20230822155557

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822155557

Evaluation results