20230825183857

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5542
Accuracy: 0.7545

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	1.3372	0.5307
No log	2.0	312	0.6864	0.5162
No log	3.0	468	0.6919	0.4874
0.9682	4.0	624	0.6674	0.5451
0.9682	5.0	780	0.6774	0.5415
0.9682	6.0	936	0.5435	0.6498
0.8254	7.0	1092	0.7442	0.5235
0.8254	8.0	1248	0.4993	0.6679
0.8254	9.0	1404	0.5592	0.6570
0.741	10.0	1560	0.6748	0.6498
0.741	11.0	1716	0.9543	0.4729
0.741	12.0	1872	0.5518	0.7004
0.6941	13.0	2028	0.4643	0.7040
0.6941	14.0	2184	0.5154	0.7220
0.6941	15.0	2340	0.5493	0.6570
0.6941	16.0	2496	0.5450	0.6570
0.6291	17.0	2652	0.5940	0.7040
0.6291	18.0	2808	0.4530	0.6931
0.6291	19.0	2964	0.5100	0.7581
0.5831	20.0	3120	0.4821	0.6751
0.5831	21.0	3276	0.7629	0.6354
0.5831	22.0	3432	0.4882	0.7437
0.5334	23.0	3588	0.4779	0.7040
0.5334	24.0	3744	0.5483	0.7365
0.5334	25.0	3900	0.4978	0.7112
0.465	26.0	4056	0.4617	0.7220
0.465	27.0	4212	0.4768	0.7545
0.465	28.0	4368	0.5384	0.7545
0.4116	29.0	4524	0.4739	0.7401
0.4116	30.0	4680	0.7430	0.6895
0.4116	31.0	4836	0.7631	0.6426
0.4116	32.0	4992	0.4750	0.7365
0.3972	33.0	5148	0.5293	0.7509
0.3972	34.0	5304	0.5111	0.7545
0.3972	35.0	5460	0.4787	0.7617
0.3632	36.0	5616	0.5954	0.7617
0.3632	37.0	5772	0.6243	0.7509
0.3632	38.0	5928	0.6147	0.7256
0.334	39.0	6084	0.4867	0.7581
0.334	40.0	6240	0.5077	0.7545
0.334	41.0	6396	0.6957	0.7112
0.2964	42.0	6552	0.5827	0.7690
0.2964	43.0	6708	0.4632	0.7617
0.2964	44.0	6864	0.5142	0.7545
0.291	45.0	7020	0.5525	0.7617
0.291	46.0	7176	0.4876	0.7581
0.291	47.0	7332	0.5730	0.7617
0.291	48.0	7488	0.5040	0.7653
0.2478	49.0	7644	0.5468	0.7545
0.2478	50.0	7800	0.5621	0.7653
0.2478	51.0	7956	0.5678	0.7545
0.2549	52.0	8112	0.5960	0.7509
0.2549	53.0	8268	0.5923	0.7437
0.2549	54.0	8424	0.5902	0.7653
0.2303	55.0	8580	0.4664	0.7617
0.2303	56.0	8736	0.5903	0.7617
0.2303	57.0	8892	0.6671	0.7329
0.2122	58.0	9048	0.5309	0.7473
0.2122	59.0	9204	0.6262	0.7581
0.2122	60.0	9360	0.5361	0.7545
0.2039	61.0	9516	0.6225	0.7545
0.2039	62.0	9672	0.6425	0.7509
0.2039	63.0	9828	0.6376	0.7365
0.2039	64.0	9984	0.6124	0.7473
0.1952	65.0	10140	0.5522	0.7401
0.1952	66.0	10296	0.6943	0.7509
0.1952	67.0	10452	0.5358	0.7653
0.1855	68.0	10608	0.5289	0.7581
0.1855	69.0	10764	0.5713	0.7545
0.1855	70.0	10920	0.5293	0.7617
0.1792	71.0	11076	0.6354	0.7617
0.1792	72.0	11232	0.5219	0.7653
0.1792	73.0	11388	0.5897	0.7581
0.1683	74.0	11544	0.5471	0.7653
0.1683	75.0	11700	0.5273	0.7653
0.1683	76.0	11856	0.5517	0.7581
0.1711	77.0	12012	0.5440	0.7653
0.1711	78.0	12168	0.5506	0.7545
0.1711	79.0	12324	0.5671	0.7581
0.1711	80.0	12480	0.5542	0.7545

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230825183857

20230825183857

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230825183857

Evaluation results