20230822124929

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3407
Accuracy: 0.6570

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.3734	0.5307
0.4216	2.0	624	0.3802	0.4729
0.4216	3.0	936	0.4299	0.4765
0.3883	4.0	1248	0.3490	0.5451
0.3918	5.0	1560	0.3461	0.5884
0.3918	6.0	1872	0.3599	0.5523
0.3764	7.0	2184	0.3565	0.5451
0.3764	8.0	2496	0.3611	0.5018
0.3794	9.0	2808	0.4040	0.5415
0.3778	10.0	3120	0.3622	0.4729
0.3778	11.0	3432	0.4954	0.4693
0.3813	12.0	3744	0.3602	0.4765
0.3718	13.0	4056	0.3453	0.5415
0.3718	14.0	4368	0.3640	0.5343
0.3701	15.0	4680	0.3589	0.4838
0.3701	16.0	4992	0.3700	0.5632
0.371	17.0	5304	0.4147	0.5343
0.3644	18.0	5616	0.3505	0.5740
0.3644	19.0	5928	0.3736	0.4874
0.3667	20.0	6240	0.3637	0.5704
0.3629	21.0	6552	0.3412	0.6209
0.3629	22.0	6864	0.3451	0.6282
0.3574	23.0	7176	0.3626	0.6065
0.3574	24.0	7488	0.3732	0.4874
0.3565	25.0	7800	0.3427	0.6173
0.3525	26.0	8112	0.3855	0.5812
0.3525	27.0	8424	0.3384	0.6498
0.3523	28.0	8736	0.3408	0.6282
0.3505	29.0	9048	0.3548	0.6101
0.3505	30.0	9360	0.3861	0.5921
0.3509	31.0	9672	0.3710	0.5993
0.3509	32.0	9984	0.3897	0.5993
0.3494	33.0	10296	0.3535	0.6354
0.3459	34.0	10608	0.3389	0.6282
0.3459	35.0	10920	0.3397	0.6209
0.3429	36.0	11232	0.3450	0.6101
0.3432	37.0	11544	0.3925	0.6065
0.3432	38.0	11856	0.3294	0.6715
0.341	39.0	12168	0.3442	0.6390
0.341	40.0	12480	0.3421	0.6462
0.3392	41.0	12792	0.3371	0.6390
0.3392	42.0	13104	0.3326	0.6534
0.3392	43.0	13416	0.3714	0.6282
0.337	44.0	13728	0.3535	0.6245
0.3352	45.0	14040	0.3548	0.6245
0.3352	46.0	14352	0.3361	0.6570
0.3335	47.0	14664	0.3329	0.6859
0.3335	48.0	14976	0.3423	0.6462
0.3329	49.0	15288	0.3356	0.6534
0.3308	50.0	15600	0.3398	0.6643
0.3308	51.0	15912	0.3374	0.6679
0.3291	52.0	16224	0.3315	0.6787
0.3284	53.0	16536	0.3650	0.6318
0.3284	54.0	16848	0.3537	0.6282
0.3257	55.0	17160	0.3480	0.6426
0.3257	56.0	17472	0.3424	0.6570
0.3274	57.0	17784	0.3413	0.6679
0.3265	58.0	18096	0.3442	0.6390
0.3265	59.0	18408	0.3417	0.6534
0.326	60.0	18720	0.3407	0.6570

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822124929

20230822124929

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822124929

Evaluation results