20230825024137

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5934
Accuracy: 0.7545

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.7579	0.5307
No log	2.0	312	0.8311	0.4838
No log	3.0	468	0.8071	0.4838
0.9372	4.0	624	0.6483	0.5632
0.9372	5.0	780	0.6240	0.5740
0.9372	6.0	936	0.6779	0.5343
0.9135	7.0	1092	0.8693	0.5632
0.9135	8.0	1248	0.6308	0.6245
0.9135	9.0	1404	0.6566	0.6462
0.7837	10.0	1560	0.5220	0.6787
0.7837	11.0	1716	0.6467	0.6390
0.7837	12.0	1872	0.5238	0.7220
0.6625	13.0	2028	0.5079	0.7040
0.6625	14.0	2184	0.5625	0.7148
0.6625	15.0	2340	0.4786	0.7148
0.6625	16.0	2496	0.7720	0.6426
0.6308	17.0	2652	0.4866	0.7004
0.6308	18.0	2808	0.4569	0.7329
0.6308	19.0	2964	0.4564	0.7329
0.6613	20.0	3120	0.6097	0.6823
0.6613	21.0	3276	0.5519	0.7112
0.6613	22.0	3432	0.6481	0.6679
0.5641	23.0	3588	0.5730	0.7040
0.5641	24.0	3744	0.5306	0.7076
0.5641	25.0	3900	0.9908	0.6606
0.5287	26.0	4056	0.4475	0.7545
0.5287	27.0	4212	0.4697	0.7473
0.5287	28.0	4368	0.5206	0.7040
0.5013	29.0	4524	0.4780	0.7401
0.5013	30.0	4680	0.6273	0.6787
0.5013	31.0	4836	0.6055	0.7076
0.5013	32.0	4992	0.4938	0.7401
0.4549	33.0	5148	0.5795	0.6931
0.4549	34.0	5304	0.5363	0.7473
0.4549	35.0	5460	0.6151	0.7473
0.4277	36.0	5616	0.6209	0.7184
0.4277	37.0	5772	0.6833	0.7365
0.4277	38.0	5928	0.5973	0.7220
0.4108	39.0	6084	0.5932	0.7581
0.4108	40.0	6240	0.4805	0.7437
0.4108	41.0	6396	0.5420	0.7401
0.3987	42.0	6552	0.5820	0.7617
0.3987	43.0	6708	0.5805	0.7292
0.3987	44.0	6864	0.6143	0.7473
0.3785	45.0	7020	0.5329	0.7292
0.3785	46.0	7176	0.7527	0.7329
0.3785	47.0	7332	0.7544	0.7256
0.3785	48.0	7488	0.6422	0.7292
0.3435	49.0	7644	0.7194	0.7401
0.3435	50.0	7800	0.5689	0.7401
0.3435	51.0	7956	0.5635	0.7329
0.3287	52.0	8112	0.6496	0.7473
0.3287	53.0	8268	0.6724	0.7220
0.3287	54.0	8424	0.7439	0.7220
0.3222	55.0	8580	0.5962	0.7365
0.3222	56.0	8736	0.5646	0.7437
0.3222	57.0	8892	0.6753	0.7401
0.2983	58.0	9048	0.5726	0.7401
0.2983	59.0	9204	0.7394	0.7256
0.2983	60.0	9360	0.5553	0.7473
0.2927	61.0	9516	0.6227	0.7256
0.2927	62.0	9672	0.6228	0.7365
0.2927	63.0	9828	0.7299	0.7365
0.2927	64.0	9984	0.6317	0.7329
0.2846	65.0	10140	0.5696	0.7401
0.2846	66.0	10296	0.6101	0.7509
0.2846	67.0	10452	0.5972	0.7437
0.266	68.0	10608	0.5472	0.7401
0.266	69.0	10764	0.6013	0.7437
0.266	70.0	10920	0.6242	0.7256
0.257	71.0	11076	0.5784	0.7509
0.257	72.0	11232	0.6293	0.7581
0.257	73.0	11388	0.6099	0.7509
0.2453	74.0	11544	0.6221	0.7401
0.2453	75.0	11700	0.6113	0.7437
0.2453	76.0	11856	0.5898	0.7401
0.2477	77.0	12012	0.5996	0.7545
0.2477	78.0	12168	0.6357	0.7509
0.2477	79.0	12324	0.5859	0.7509
0.2477	80.0	12480	0.5934	0.7545

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230825024137

20230825024137

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230825024137

Evaluation results