20230822120608

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 19.9899
Accuracy: 0.5271

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	27.0766	0.5271
29.447	2.0	624	24.0887	0.4729
29.447	3.0	936	23.9640	0.5271
27.7172	4.0	1248	22.0260	0.4729
26.4345	5.0	1560	22.0502	0.4729
26.4345	6.0	1872	22.9337	0.5271
27.0832	7.0	2184	21.2859	0.5271
27.0832	8.0	2496	21.4709	0.4729
25.6523	9.0	2808	20.3539	0.5271
25.5288	10.0	3120	21.2982	0.5271
25.5288	11.0	3432	22.0599	0.5271
25.9846	12.0	3744	22.1000	0.5271
26.609	13.0	4056	24.1133	0.4729
26.609	14.0	4368	22.4392	0.4729
26.7751	15.0	4680	22.0514	0.4729
26.7751	16.0	4992	21.4413	0.5271
25.8484	17.0	5304	21.6759	0.5271
25.7937	18.0	5616	21.2726	0.5271
25.7937	19.0	5928	21.2489	0.5271
25.6479	20.0	6240	21.1881	0.5271
25.6144	21.0	6552	21.0354	0.5271
25.6144	22.0	6864	21.0688	0.4729
25.4368	23.0	7176	21.2154	0.4729
25.4368	24.0	7488	21.2348	0.4729
25.5564	25.0	7800	21.1510	0.5271
25.5495	26.0	8112	21.3992	0.5271
25.5495	27.0	8424	21.4035	0.4729
25.4536	28.0	8736	20.9643	0.5271
25.3641	29.0	9048	20.7780	0.4729
25.3641	30.0	9360	21.4761	0.5271
25.4089	31.0	9672	21.1053	0.4729
25.4089	32.0	9984	21.1557	0.5271
25.6056	33.0	10296	21.0180	0.5271
25.5078	34.0	10608	21.1026	0.4729
25.5078	35.0	10920	21.3723	0.4729
25.6607	36.0	11232	21.4309	0.4729
25.9641	37.0	11544	21.4083	0.5271
25.9641	38.0	11856	21.2875	0.5271
25.6756	39.0	12168	21.4538	0.5271
25.6756	40.0	12480	21.1870	0.4729
25.4709	41.0	12792	21.0796	0.5271
25.2913	42.0	13104	20.9412	0.5271
25.2913	43.0	13416	20.8932	0.5271
25.1541	44.0	13728	20.9172	0.4729
25.0679	45.0	14040	20.6787	0.5271
25.0679	46.0	14352	20.6308	0.4729
24.965	47.0	14664	20.5240	0.5271
24.965	48.0	14976	20.6378	0.4729
24.8969	49.0	15288	20.5030	0.4729
24.8319	50.0	15600	20.3257	0.5271
24.8319	51.0	15912	20.2990	0.5271
24.7301	52.0	16224	20.3661	0.4729
24.6644	53.0	16536	20.2088	0.5271
24.6644	54.0	16848	20.1543	0.5271
24.5917	55.0	17160	20.0860	0.4729
24.5917	56.0	17472	20.0672	0.5271
24.5505	57.0	17784	20.0518	0.5271
24.5065	58.0	18096	20.0036	0.5271
24.5065	59.0	18408	19.9939	0.5271
24.4773	60.0	18720	19.9899	0.5271

Framework versions

Transformers 4.30.0
Pytorch 2.0.1+cu117
Datasets 2.14.4
Tokenizers 0.13.3

Onutoa
/

20230822120608

20230822120608

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train Onutoa/20230822120608

Evaluation results