20230824103950

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6377
Accuracy: 0.7401

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.9784	0.5307
0.905	2.0	624	0.6756	0.5126
0.905	3.0	936	0.7039	0.5379
0.7844	4.0	1248	0.6938	0.5090
0.7863	5.0	1560	0.7988	0.5487
0.7863	6.0	1872	0.7152	0.5993
0.7505	7.0	2184	0.7856	0.6173
0.7505	8.0	2496	0.6053	0.6606
0.7043	9.0	2808	0.6424	0.5957
0.7083	10.0	3120	0.7874	0.6354
0.7083	11.0	3432	0.6513	0.6390
0.6321	12.0	3744	0.5910	0.7148
0.6204	13.0	4056	0.5993	0.7112
0.6204	14.0	4368	0.5440	0.7292
0.5835	15.0	4680	0.5542	0.7184
0.5835	16.0	4992	0.6144	0.7329
0.5634	17.0	5304	0.5821	0.6968
0.5461	18.0	5616	0.6826	0.5776
0.5461	19.0	5928	0.5617	0.7148
0.5275	20.0	6240	0.7824	0.6643
0.4726	21.0	6552	0.6157	0.7437
0.4726	22.0	6864	0.6498	0.7076
0.465	23.0	7176	0.6576	0.7292
0.465	24.0	7488	0.5731	0.7184
0.4375	25.0	7800	0.7370	0.7220
0.4182	26.0	8112	0.5957	0.7148
0.4182	27.0	8424	0.6041	0.7256
0.4008	28.0	8736	0.5790	0.7184
0.392	29.0	9048	0.6321	0.7329
0.392	30.0	9360	0.6253	0.7148
0.3691	31.0	9672	0.6031	0.7329
0.3691	32.0	9984	0.5903	0.7148
0.3659	33.0	10296	0.6663	0.7329
0.3375	34.0	10608	0.6000	0.7292
0.3375	35.0	10920	0.5734	0.7256
0.3372	36.0	11232	0.6547	0.7329
0.3242	37.0	11544	0.6508	0.7401
0.3242	38.0	11856	0.6472	0.7365
0.3199	39.0	12168	0.6785	0.7365
0.3199	40.0	12480	0.6019	0.7365
0.3014	41.0	12792	0.5783	0.7329
0.3011	42.0	13104	0.6245	0.7329
0.3011	43.0	13416	0.6497	0.7292
0.2909	44.0	13728	0.6170	0.7365
0.2725	45.0	14040	0.6515	0.7437
0.2725	46.0	14352	0.6511	0.7365
0.286	47.0	14664	0.6303	0.7292
0.286	48.0	14976	0.6408	0.7365
0.2713	49.0	15288	0.7056	0.7292
0.2574	50.0	15600	0.6540	0.7365
0.2574	51.0	15912	0.5996	0.7256
0.2735	52.0	16224	0.6616	0.7329
0.2646	53.0	16536	0.6601	0.7365
0.2646	54.0	16848	0.6284	0.7329
0.2494	55.0	17160	0.6420	0.7329
0.2494	56.0	17472	0.6434	0.7401
0.2512	57.0	17784	0.6324	0.7437
0.2452	58.0	18096	0.6028	0.7365
0.2452	59.0	18408	0.6412	0.7401
0.2491	60.0	18720	0.6377	0.7401

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824103950

20230824103950

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824103950

Evaluation results