20230822125408

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3480
Accuracy: 0.5271

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.3715	0.4729
0.6635	2.0	624	0.4551	0.5271
0.6635	3.0	936	0.4100	0.4729
0.6659	4.0	1248	0.5179	0.4729
0.5379	5.0	1560	0.4588	0.5271
0.5379	6.0	1872	0.3934	0.5271
0.4954	7.0	2184	0.4644	0.5271
0.4954	8.0	2496	0.6469	0.5271
0.4707	9.0	2808	0.3908	0.5271
0.4825	10.0	3120	0.4247	0.4729
0.4825	11.0	3432	0.3479	0.5271
0.4683	12.0	3744	0.3917	0.4729
0.4456	13.0	4056	0.3580	0.5271
0.4456	14.0	4368	0.3641	0.4729
0.4571	15.0	4680	0.3922	0.4729
0.4571	16.0	4992	0.3587	0.5271
0.434	17.0	5304	0.3769	0.4729
0.4707	18.0	5616	0.3520	0.5271
0.4707	19.0	5928	0.3489	0.5271
0.4863	20.0	6240	0.3593	0.5271
0.4673	21.0	6552	0.8486	0.5271
0.4673	22.0	6864	0.3714	0.5271
0.4746	23.0	7176	0.3496	0.5271
0.4746	24.0	7488	0.3694	0.4729
0.4365	25.0	7800	0.3542	0.5271
0.4254	26.0	8112	0.4693	0.5271
0.4254	27.0	8424	0.3827	0.4729
0.4293	28.0	8736	0.3866	0.4729
0.4221	29.0	9048	0.3484	0.5271
0.4221	30.0	9360	0.4155	0.5271
0.4128	31.0	9672	0.3497	0.5271
0.4128	32.0	9984	0.3560	0.4729
0.4064	33.0	10296	0.4237	0.5271
0.4039	34.0	10608	0.3890	0.4729
0.4039	35.0	10920	0.3478	0.5271
0.4026	36.0	11232	0.3497	0.5271
0.4037	37.0	11544	0.3748	0.5271
0.4037	38.0	11856	0.3533	0.5271
0.3933	39.0	12168	0.3547	0.4729
0.3933	40.0	12480	0.3565	0.4729
0.3935	41.0	12792	0.3601	0.4729
0.3896	42.0	13104	0.3571	0.4729
0.3896	43.0	13416	0.3490	0.5271
0.3841	44.0	13728	0.3499	0.5271
0.3836	45.0	14040	0.3624	0.5271
0.3836	46.0	14352	0.3484	0.5271
0.3785	47.0	14664	0.3582	0.4729
0.3785	48.0	14976	0.3541	0.4729
0.3775	49.0	15288	0.3500	0.5271
0.3727	50.0	15600	0.3544	0.4729
0.3727	51.0	15912	0.3481	0.5271
0.3713	52.0	16224	0.3600	0.4729
0.3694	53.0	16536	0.3494	0.5271
0.3694	54.0	16848	0.3502	0.5271
0.3664	55.0	17160	0.3482	0.5271
0.3664	56.0	17472	0.3482	0.5271
0.3636	57.0	17784	0.3480	0.5271
0.3612	58.0	18096	0.3478	0.5271
0.3612	59.0	18408	0.3480	0.5271
0.3589	60.0	18720	0.3480	0.5271

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822125408

20230822125408

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822125408

Evaluation results