20230822145721

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3478
Accuracy: 0.5271

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.3504	0.5235
0.3893	2.0	624	0.3582	0.4729
0.3893	3.0	936	0.3531	0.5271
0.3878	4.0	1248	0.3627	0.4729
0.3764	5.0	1560	0.3488	0.5271
0.3764	6.0	1872	0.3529	0.5271
0.3735	7.0	2184	0.3598	0.5271
0.3735	8.0	2496	0.3609	0.5271
0.3703	9.0	2808	0.3605	0.4729
0.3684	10.0	3120	0.3562	0.5271
0.3684	11.0	3432	0.4032	0.4729
0.3687	12.0	3744	0.3752	0.4729
0.3667	13.0	4056	0.3566	0.4729
0.3667	14.0	4368	0.3499	0.5271
0.3689	15.0	4680	0.3503	0.5271
0.3689	16.0	4992	0.3539	0.5271
0.3663	17.0	5304	0.3485	0.5271
0.3677	18.0	5616	0.3617	0.5271
0.3677	19.0	5928	0.3666	0.4729
0.3716	20.0	6240	0.3562	0.5271
0.3671	21.0	6552	0.3573	0.5271
0.3671	22.0	6864	0.3900	0.5271
0.3642	23.0	7176	0.3554	0.5271
0.3642	24.0	7488	0.3594	0.4729
0.3649	25.0	7800	0.3498	0.5271
0.3639	26.0	8112	0.3646	0.4729
0.3639	27.0	8424	0.3498	0.5271
0.3615	28.0	8736	0.3504	0.5271
0.3606	29.0	9048	0.3485	0.5271
0.3606	30.0	9360	0.3479	0.5271
0.3623	31.0	9672	0.3498	0.5271
0.3623	32.0	9984	0.3478	0.5271
0.3623	33.0	10296	0.3545	0.5271
0.3603	34.0	10608	0.3483	0.5271
0.3603	35.0	10920	0.3481	0.5271
0.3604	36.0	11232	0.3495	0.5271
0.3586	37.0	11544	0.3507	0.5271
0.3586	38.0	11856	0.3486	0.5271
0.3593	39.0	12168	0.3492	0.5271
0.3593	40.0	12480	0.3492	0.5271
0.359	41.0	12792	0.3485	0.5271
0.3584	42.0	13104	0.3579	0.4729
0.3584	43.0	13416	0.3480	0.5271
0.3606	44.0	13728	0.3479	0.5271
0.3568	45.0	14040	0.3530	0.5271
0.3568	46.0	14352	0.3499	0.5271
0.3589	47.0	14664	0.3547	0.4729
0.3589	48.0	14976	0.3499	0.5271
0.3589	49.0	15288	0.3478	0.5271
0.3573	50.0	15600	0.3481	0.5271
0.3573	51.0	15912	0.3487	0.5271
0.3569	52.0	16224	0.3481	0.5271
0.3572	53.0	16536	0.3480	0.5271
0.3572	54.0	16848	0.3481	0.5271
0.3558	55.0	17160	0.3478	0.5271
0.3558	56.0	17472	0.3479	0.5271
0.3557	57.0	17784	0.3484	0.5271
0.3558	58.0	18096	0.3478	0.5271
0.3558	59.0	18408	0.3478	0.5271
0.3548	60.0	18720	0.3478	0.5271

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822145721

20230822145721

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822145721

Evaluation results