20230822124255

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3479
Accuracy: 0.5271

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.4745	0.5271
0.4082	2.0	624	0.3528	0.5307
0.4082	3.0	936	0.4075	0.4729
0.3905	4.0	1248	0.3634	0.4729
0.3831	5.0	1560	0.3585	0.5271
0.3831	6.0	1872	0.3679	0.5271
0.3797	7.0	2184	0.3550	0.5271
0.3797	8.0	2496	0.4011	0.5271
0.3796	9.0	2808	0.3515	0.5271
0.3836	10.0	3120	0.3478	0.5271
0.3836	11.0	3432	0.3494	0.5271
0.3815	12.0	3744	0.3707	0.4729
0.3769	13.0	4056	0.3625	0.4729
0.3769	14.0	4368	0.3498	0.5271
0.3761	15.0	4680	0.3550	0.4729
0.3761	16.0	4992	0.4420	0.5271
0.3776	17.0	5304	0.3529	0.5271
0.3704	18.0	5616	0.3486	0.5271
0.3704	19.0	5928	0.3670	0.4729
0.3765	20.0	6240	0.3586	0.5271
0.3721	21.0	6552	0.3490	0.5271
0.3721	22.0	6864	0.3729	0.5271
0.3689	23.0	7176	0.3798	0.5271
0.3689	24.0	7488	0.3861	0.4729
0.3698	25.0	7800	0.3498	0.5271
0.369	26.0	8112	0.3698	0.4729
0.369	27.0	8424	0.3507	0.5271
0.3658	28.0	8736	0.3494	0.5271
0.3662	29.0	9048	0.3479	0.5271
0.3662	30.0	9360	0.3504	0.5271
0.3666	31.0	9672	0.3577	0.5271
0.3666	32.0	9984	0.3509	0.5271
0.3637	33.0	10296	0.3483	0.5271
0.3647	34.0	10608	0.3493	0.5271
0.3647	35.0	10920	0.3482	0.5271
0.364	36.0	11232	0.3490	0.5271
0.3635	37.0	11544	0.3478	0.5271
0.3635	38.0	11856	0.3479	0.5271
0.3634	39.0	12168	0.3501	0.5271
0.3634	40.0	12480	0.3478	0.5271
0.3643	41.0	12792	0.3479	0.5271
0.3645	42.0	13104	0.3655	0.4729
0.3645	43.0	13416	0.3512	0.5271
0.363	44.0	13728	0.3491	0.5271
0.3602	45.0	14040	0.3569	0.4729
0.3602	46.0	14352	0.3571	0.4729
0.3616	47.0	14664	0.3522	0.5307
0.3616	48.0	14976	0.3485	0.5271
0.3601	49.0	15288	0.3485	0.5271
0.3606	50.0	15600	0.3481	0.5271
0.3606	51.0	15912	0.3484	0.5271
0.3592	52.0	16224	0.3478	0.5271
0.3587	53.0	16536	0.3485	0.5271
0.3587	54.0	16848	0.3483	0.5271
0.3583	55.0	17160	0.3480	0.5271
0.3583	56.0	17472	0.3478	0.5271
0.358	57.0	17784	0.3485	0.5271
0.3574	58.0	18096	0.3478	0.5271
0.3574	59.0	18408	0.3479	0.5271
0.3567	60.0	18720	0.3479	0.5271

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822124255

20230822124255

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822124255

Evaluation results