20230824210941

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.8686
Accuracy: 0.7256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.8228	0.5307
No log	2.0	312	0.7014	0.5271
No log	3.0	468	0.9320	0.4657
0.9247	4.0	624	0.8551	0.5307
0.9247	5.0	780	0.8862	0.5235
0.9247	6.0	936	0.6306	0.6282
0.8754	7.0	1092	0.9270	0.5957
0.8754	8.0	1248	0.6627	0.6354
0.8754	9.0	1404	0.7200	0.6137
0.745	10.0	1560	0.5993	0.6751
0.745	11.0	1716	0.7300	0.6318
0.745	12.0	1872	0.7463	0.6823
0.6869	13.0	2028	0.8378	0.6029
0.6869	14.0	2184	0.6182	0.7076
0.6869	15.0	2340	0.9895	0.6209
0.6869	16.0	2496	0.7414	0.6859
0.6526	17.0	2652	0.6260	0.6931
0.6526	18.0	2808	0.5832	0.7365
0.6526	19.0	2964	0.6509	0.6968
0.5884	20.0	3120	0.7808	0.6751
0.5884	21.0	3276	0.6212	0.7437
0.5884	22.0	3432	0.8835	0.6354
0.5748	23.0	3588	0.8832	0.6570
0.5748	24.0	3744	0.8348	0.6679
0.5748	25.0	3900	0.8357	0.6859
0.5519	26.0	4056	0.5958	0.7256
0.5519	27.0	4212	0.5952	0.7365
0.5519	28.0	4368	0.6118	0.7256
0.5239	29.0	4524	0.8448	0.6823
0.5239	30.0	4680	0.6541	0.7112
0.5239	31.0	4836	0.9677	0.6390
0.5239	32.0	4992	0.7328	0.7076
0.4732	33.0	5148	0.8215	0.6643
0.4732	34.0	5304	0.7120	0.7112
0.4732	35.0	5460	0.7292	0.7437
0.4314	36.0	5616	0.7357	0.7220
0.4314	37.0	5772	1.0189	0.6606
0.4314	38.0	5928	0.7766	0.6787
0.4113	39.0	6084	0.9918	0.6679
0.4113	40.0	6240	0.8170	0.7329
0.4113	41.0	6396	0.7732	0.7184
0.3872	42.0	6552	0.7271	0.7653
0.3872	43.0	6708	0.8372	0.7365
0.3872	44.0	6864	0.8637	0.7148
0.3747	45.0	7020	0.8895	0.7220
0.3747	46.0	7176	1.3025	0.6931
0.3747	47.0	7332	0.8508	0.7437
0.3747	48.0	7488	0.9201	0.7220
0.3401	49.0	7644	1.0286	0.7184
0.3401	50.0	7800	0.8711	0.7365
0.3401	51.0	7956	1.0386	0.7256
0.3162	52.0	8112	0.8634	0.7401
0.3162	53.0	8268	0.9121	0.7184
0.3162	54.0	8424	0.8510	0.7292
0.3146	55.0	8580	0.8323	0.7329
0.3146	56.0	8736	1.1691	0.6968
0.3146	57.0	8892	0.9995	0.7292
0.3049	58.0	9048	0.8166	0.7184
0.3049	59.0	9204	1.0304	0.7184
0.3049	60.0	9360	0.8338	0.7184
0.2932	61.0	9516	0.8818	0.7220
0.2932	62.0	9672	1.0405	0.7184
0.2932	63.0	9828	0.9091	0.7112
0.2932	64.0	9984	0.9134	0.7256
0.2786	65.0	10140	0.8553	0.7329
0.2786	66.0	10296	0.9198	0.7365
0.2786	67.0	10452	0.8613	0.7329
0.2616	68.0	10608	0.8299	0.7292
0.2616	69.0	10764	0.9801	0.7148
0.2616	70.0	10920	0.8634	0.7256
0.2573	71.0	11076	0.8447	0.7509
0.2573	72.0	11232	0.8127	0.7437
0.2573	73.0	11388	0.8869	0.7256
0.248	74.0	11544	0.8170	0.7256
0.248	75.0	11700	0.9370	0.7220
0.248	76.0	11856	0.8273	0.7220
0.2513	77.0	12012	0.8745	0.7220
0.2513	78.0	12168	0.8785	0.7292
0.2513	79.0	12324	0.8585	0.7256
0.2513	80.0	12480	0.8686	0.7256

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824210941

20230824210941

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824210941

Evaluation results