20230824164051

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.8794
Accuracy: 0.7437

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.6969	0.5307
No log	2.0	312	1.5361	0.4693
No log	3.0	468	0.6771	0.5235
0.9695	4.0	624	0.6518	0.5776
0.9695	5.0	780	0.6275	0.5921
0.9695	6.0	936	0.6502	0.5668
0.8132	7.0	1092	0.8188	0.6137
0.8132	8.0	1248	0.6405	0.6570
0.8132	9.0	1404	0.5421	0.7076
0.7231	10.0	1560	0.7011	0.6751
0.7231	11.0	1716	0.6935	0.5993
0.7231	12.0	1872	0.5169	0.7365
0.6369	13.0	2028	0.5523	0.7329
0.6369	14.0	2184	0.5481	0.7292
0.6369	15.0	2340	0.7431	0.6606
0.6369	16.0	2496	0.6122	0.6787
0.5638	17.0	2652	0.5637	0.6931
0.5638	18.0	2808	0.5423	0.7437
0.5638	19.0	2964	0.5347	0.7401
0.5228	20.0	3120	1.6782	0.6354
0.5228	21.0	3276	0.7799	0.6715
0.5228	22.0	3432	0.6873	0.7581
0.4829	23.0	3588	0.6712	0.7329
0.4829	24.0	3744	0.7390	0.7329
0.4829	25.0	3900	0.6802	0.7509
0.4251	26.0	4056	0.5530	0.7076
0.4251	27.0	4212	0.6421	0.7112
0.4251	28.0	4368	0.9956	0.6859
0.395	29.0	4524	0.6741	0.7545
0.395	30.0	4680	0.8871	0.7437
0.395	31.0	4836	0.9265	0.7040
0.395	32.0	4992	0.7189	0.7401
0.3336	33.0	5148	1.1324	0.7040
0.3336	34.0	5304	0.8782	0.7437
0.3336	35.0	5460	0.7878	0.7329
0.3015	36.0	5616	1.1890	0.7040
0.3015	37.0	5772	1.2719	0.7112
0.3015	38.0	5928	1.3208	0.6931
0.2669	39.0	6084	0.9818	0.7437
0.2669	40.0	6240	0.8321	0.7292
0.2669	41.0	6396	0.8419	0.7292
0.2429	42.0	6552	0.9276	0.7365
0.2429	43.0	6708	0.9748	0.7401
0.2429	44.0	6864	0.8934	0.7473
0.2131	45.0	7020	0.9008	0.7473
0.2131	46.0	7176	1.0459	0.7437
0.2131	47.0	7332	1.0222	0.7256
0.2131	48.0	7488	0.9317	0.7545
0.1962	49.0	7644	0.8401	0.7473
0.1962	50.0	7800	0.9513	0.7401
0.1962	51.0	7956	0.9327	0.7401
0.1794	52.0	8112	1.0218	0.7509
0.1794	53.0	8268	1.1332	0.7473
0.1794	54.0	8424	0.8851	0.7365
0.1566	55.0	8580	0.8323	0.7473
0.1566	56.0	8736	0.8375	0.7437
0.1566	57.0	8892	0.8490	0.7509
0.15	58.0	9048	0.9740	0.7509
0.15	59.0	9204	1.1271	0.7473
0.15	60.0	9360	1.1190	0.7437
0.1377	61.0	9516	1.0394	0.7509
0.1377	62.0	9672	0.9735	0.7509
0.1377	63.0	9828	0.9987	0.7437
0.1377	64.0	9984	0.9496	0.7473
0.1283	65.0	10140	1.0721	0.7365
0.1283	66.0	10296	0.8997	0.7617
0.1283	67.0	10452	1.0014	0.7581
0.1212	68.0	10608	1.0382	0.7509
0.1212	69.0	10764	0.9417	0.7437
0.1212	70.0	10920	0.9328	0.7437
0.1101	71.0	11076	0.9084	0.7509
0.1101	72.0	11232	0.9051	0.7545
0.1101	73.0	11388	0.8080	0.7581
0.1147	74.0	11544	0.9505	0.7437
0.1147	75.0	11700	0.8757	0.7437
0.1147	76.0	11856	0.9067	0.7509
0.1095	77.0	12012	0.8988	0.7473
0.1095	78.0	12168	0.8956	0.7473
0.1095	79.0	12324	0.8622	0.7473
0.1095	80.0	12480	0.8794	0.7437

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824164051

20230824164051

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824164051

Evaluation results