20230822105327

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3487
Accuracy: 0.4729

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.6780	0.5307
0.8761	2.0	624	0.3516	0.4982
0.8761	3.0	936	0.4775	0.4874
0.685	4.0	1248	0.3842	0.5162
0.4946	5.0	1560	0.7400	0.5271
0.4946	6.0	1872	0.3490	0.5307
0.5112	7.0	2184	0.4549	0.5271
0.5112	8.0	2496	0.4590	0.4729
0.4328	9.0	2808	0.4122	0.4729
0.5336	10.0	3120	0.3692	0.4729
0.5336	11.0	3432	0.3493	0.5271
0.4659	12.0	3744	0.4285	0.4729
0.4383	13.0	4056	0.3805	0.4729
0.4383	14.0	4368	0.3634	0.5271
0.4394	15.0	4680	0.3485	0.5271
0.4394	16.0	4992	0.4393	0.4729
0.4432	17.0	5304	0.3694	0.5271
0.4138	18.0	5616	0.3503	0.4874
0.4138	19.0	5928	0.3916	0.4729
0.4213	20.0	6240	0.3495	0.4693
0.4042	21.0	6552	0.3493	0.5090
0.4042	22.0	6864	0.3556	0.5307
0.4177	23.0	7176	0.3697	0.4729
0.4177	24.0	7488	0.3484	0.4765
0.3925	25.0	7800	0.3665	0.5271
0.4006	26.0	8112	0.3669	0.5271
0.4006	27.0	8424	0.3556	0.4729
0.397	28.0	8736	0.3529	0.4729
0.3926	29.0	9048	0.3477	0.4729
0.3926	30.0	9360	0.5391	0.5271
0.39	31.0	9672	0.3504	0.4729
0.39	32.0	9984	0.3494	0.5271
0.3902	33.0	10296	0.3549	0.5271
0.3824	34.0	10608	0.3707	0.4729
0.3824	35.0	10920	0.3559	0.4729
0.3805	36.0	11232	0.3578	0.4729
0.38	37.0	11544	0.3612	0.5271
0.38	38.0	11856	0.3517	0.4729
0.3784	39.0	12168	0.3487	0.4910
0.3784	40.0	12480	0.3606	0.4729
0.3751	41.0	12792	0.3520	0.5271
0.3718	42.0	13104	0.3477	0.5199
0.3718	43.0	13416	0.3498	0.4729
0.371	44.0	13728	0.3729	0.4729
0.3723	45.0	14040	0.3592	0.5271
0.3723	46.0	14352	0.3502	0.4621
0.3688	47.0	14664	0.3516	0.4729
0.3688	48.0	14976	0.3505	0.4729
0.3641	49.0	15288	0.3526	0.4729
0.3645	50.0	15600	0.3488	0.4729
0.3645	51.0	15912	0.3482	0.4729
0.3636	52.0	16224	0.3557	0.4729
0.3621	53.0	16536	0.3484	0.4729
0.3621	54.0	16848	0.3509	0.5271
0.3581	55.0	17160	0.3519	0.4729
0.3581	56.0	17472	0.3479	0.5090
0.3573	57.0	17784	0.3480	0.4729
0.3553	58.0	18096	0.3489	0.4729
0.3553	59.0	18408	0.3479	0.4729
0.3545	60.0	18720	0.3487	0.4729

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822105327

20230822105327

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822105327

Evaluation results