20230825024049

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6750
Accuracy: 0.7617

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	1.0743	0.4729
No log	2.0	312	0.6963	0.5271
No log	3.0	468	0.6584	0.5379
0.9697	4.0	624	0.8075	0.5379
0.9697	5.0	780	0.6045	0.6173
0.9697	6.0	936	0.5635	0.6462
0.8296	7.0	1092	0.8051	0.6354
0.8296	8.0	1248	0.5028	0.6787
0.8296	9.0	1404	0.5830	0.6570
0.7235	10.0	1560	0.5798	0.7004
0.7235	11.0	1716	0.8434	0.5054
0.7235	12.0	1872	0.7164	0.6570
0.6566	13.0	2028	0.5957	0.7112
0.6566	14.0	2184	0.4893	0.7617
0.6566	15.0	2340	0.5230	0.6751
0.6566	16.0	2496	0.7581	0.6282
0.6156	17.0	2652	0.5233	0.7437
0.6156	18.0	2808	0.8169	0.5993
0.6156	19.0	2964	0.5691	0.7581
0.5597	20.0	3120	0.5216	0.6895
0.5597	21.0	3276	0.5625	0.7256
0.5597	22.0	3432	0.6847	0.6895
0.518	23.0	3588	0.4864	0.7473
0.518	24.0	3744	0.5535	0.7617
0.518	25.0	3900	0.7351	0.6931
0.4661	26.0	4056	0.5020	0.7545
0.4661	27.0	4212	0.5132	0.7581
0.4661	28.0	4368	0.7423	0.7040
0.396	29.0	4524	0.4947	0.7545
0.396	30.0	4680	0.6220	0.7437
0.396	31.0	4836	0.6123	0.7437
0.396	32.0	4992	0.5141	0.7617
0.3842	33.0	5148	0.6979	0.7220
0.3842	34.0	5304	0.5813	0.7653
0.3842	35.0	5460	0.5639	0.7545
0.3473	36.0	5616	0.6147	0.7401
0.3473	37.0	5772	0.7640	0.7184
0.3473	38.0	5928	0.7093	0.7509
0.3189	39.0	6084	0.5635	0.7509
0.3189	40.0	6240	0.6134	0.7473
0.3189	41.0	6396	0.6238	0.7437
0.2882	42.0	6552	0.6768	0.7653
0.2882	43.0	6708	0.6504	0.7581
0.2882	44.0	6864	0.6762	0.7401
0.2758	45.0	7020	0.7442	0.7726
0.2758	46.0	7176	0.7323	0.7292
0.2758	47.0	7332	0.6010	0.7509
0.2758	48.0	7488	0.6571	0.7437
0.2347	49.0	7644	0.6066	0.7617
0.2347	50.0	7800	0.6876	0.7473
0.2347	51.0	7956	0.5945	0.7762
0.2343	52.0	8112	0.7166	0.7653
0.2343	53.0	8268	0.7535	0.7509
0.2343	54.0	8424	0.6777	0.7690
0.2107	55.0	8580	0.5962	0.7545
0.2107	56.0	8736	0.6697	0.7509
0.2107	57.0	8892	0.6426	0.7545
0.2081	58.0	9048	0.6783	0.7365
0.2081	59.0	9204	0.9118	0.7401
0.2081	60.0	9360	0.6387	0.7653
0.1895	61.0	9516	0.7557	0.7509
0.1895	62.0	9672	0.7595	0.7401
0.1895	63.0	9828	0.6978	0.7437
0.1895	64.0	9984	0.6016	0.7617
0.1873	65.0	10140	0.6893	0.7401
0.1873	66.0	10296	0.7575	0.7256
0.1873	67.0	10452	0.6249	0.7617
0.177	68.0	10608	0.6406	0.7509
0.177	69.0	10764	0.6802	0.7617
0.177	70.0	10920	0.7479	0.7329
0.1645	71.0	11076	0.7513	0.7437
0.1645	72.0	11232	0.6490	0.7762
0.1645	73.0	11388	0.7052	0.7256
0.1584	74.0	11544	0.6589	0.7726
0.1584	75.0	11700	0.6695	0.7473
0.1584	76.0	11856	0.6239	0.7690
0.1554	77.0	12012	0.6807	0.7473
0.1554	78.0	12168	0.6740	0.7509
0.1554	79.0	12324	0.6912	0.7473
0.1554	80.0	12480	0.6750	0.7617

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230825024049

20230825024049

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230825024049

Evaluation results