20230822135401

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3478
Accuracy: 0.6065

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.3502	0.5451
0.3914	2.0	624	0.3937	0.4729
0.3914	3.0	936	0.3710	0.4729
0.3806	4.0	1248	0.3529	0.4693
0.3775	5.0	1560	0.3489	0.5487
0.3775	6.0	1872	0.3466	0.5451
0.3668	7.0	2184	0.4554	0.5379
0.3668	8.0	2496	0.3811	0.5451
0.3698	9.0	2808	0.3497	0.5271
0.3659	10.0	3120	0.3462	0.5199
0.3659	11.0	3432	0.4239	0.4729
0.3675	12.0	3744	0.3535	0.5126
0.3617	13.0	4056	0.3470	0.5090
0.3617	14.0	4368	0.3630	0.5054
0.3624	15.0	4680	0.3506	0.5235
0.3624	16.0	4992	0.3747	0.5487
0.359	17.0	5304	0.3704	0.5487
0.3576	18.0	5616	0.3538	0.5343
0.3576	19.0	5928	0.3597	0.5415
0.3612	20.0	6240	0.3637	0.5596
0.359	21.0	6552	0.3487	0.5704
0.359	22.0	6864	0.3591	0.5415
0.3566	23.0	7176	0.3946	0.5523
0.3566	24.0	7488	0.3627	0.5018
0.3551	25.0	7800	0.3540	0.5523
0.353	26.0	8112	0.3461	0.5343
0.353	27.0	8424	0.3469	0.5596
0.3517	28.0	8736	0.3471	0.5993
0.3549	29.0	9048	0.3504	0.5632
0.3549	30.0	9360	0.3559	0.5812
0.3523	31.0	9672	0.3769	0.5560
0.3523	32.0	9984	0.3473	0.5704
0.3514	33.0	10296	0.3632	0.5704
0.3513	34.0	10608	0.3503	0.5848
0.3513	35.0	10920	0.3464	0.5560
0.3512	36.0	11232	0.3493	0.5740
0.3494	37.0	11544	0.3479	0.6101
0.3494	38.0	11856	0.3464	0.6029
0.3478	39.0	12168	0.3495	0.6101
0.3478	40.0	12480	0.3462	0.6065
0.3479	41.0	12792	0.3519	0.6065
0.3472	42.0	13104	0.3420	0.5704
0.3472	43.0	13416	0.3555	0.5740
0.3456	44.0	13728	0.3471	0.5957
0.3448	45.0	14040	0.3434	0.5776
0.3448	46.0	14352	0.3401	0.6209
0.3439	47.0	14664	0.3439	0.5776
0.3439	48.0	14976	0.3523	0.5921
0.3442	49.0	15288	0.3466	0.6137
0.3437	50.0	15600	0.3549	0.5776
0.3437	51.0	15912	0.3417	0.6173
0.3413	52.0	16224	0.3409	0.6209
0.3416	53.0	16536	0.3607	0.5884
0.3416	54.0	16848	0.3574	0.5848
0.3401	55.0	17160	0.3494	0.5812
0.3401	56.0	17472	0.3480	0.6137
0.3395	57.0	17784	0.3434	0.6029
0.3399	58.0	18096	0.3454	0.5993
0.3399	59.0	18408	0.3477	0.5957
0.3398	60.0	18720	0.3478	0.6065

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822135401

20230822135401

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822135401

Evaluation results