20230824164037

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.7104
Accuracy: 0.7617

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.8540	0.5307
No log	2.0	312	0.6894	0.4838
No log	3.0	468	1.2065	0.4729
1.0004	4.0	624	0.6386	0.5487
1.0004	5.0	780	0.6979	0.5199
1.0004	6.0	936	0.6102	0.6173
0.8189	7.0	1092	0.9162	0.5848
0.8189	8.0	1248	0.7055	0.6282
0.8189	9.0	1404	0.5689	0.7004
0.7207	10.0	1560	1.0166	0.6282
0.7207	11.0	1716	0.8185	0.4946
0.7207	12.0	1872	0.5053	0.7148
0.6822	13.0	2028	0.5296	0.7184
0.6822	14.0	2184	0.6259	0.7040
0.6822	15.0	2340	0.9773	0.6426
0.6822	16.0	2496	0.7401	0.6462
0.6238	17.0	2652	0.4929	0.7148
0.6238	18.0	2808	0.5547	0.7256
0.6238	19.0	2964	0.5692	0.7220
0.5327	20.0	3120	0.9119	0.6498
0.5327	21.0	3276	0.6083	0.7004
0.5327	22.0	3432	0.5836	0.7112
0.4818	23.0	3588	0.5820	0.7292
0.4818	24.0	3744	0.5506	0.7292
0.4818	25.0	3900	0.6027	0.7256
0.4199	26.0	4056	0.5265	0.7437
0.4199	27.0	4212	0.6094	0.7076
0.4199	28.0	4368	0.6170	0.7220
0.4001	29.0	4524	0.5932	0.7329
0.4001	30.0	4680	0.6954	0.7220
0.4001	31.0	4836	0.6963	0.7437
0.4001	32.0	4992	0.6431	0.7545
0.3272	33.0	5148	0.9597	0.7040
0.3272	34.0	5304	0.6982	0.7365
0.3272	35.0	5460	0.6270	0.7437
0.2947	36.0	5616	1.0674	0.7004
0.2947	37.0	5772	0.8835	0.7256
0.2947	38.0	5928	0.9769	0.6859
0.266	39.0	6084	0.6855	0.7581
0.266	40.0	6240	0.7246	0.7509
0.266	41.0	6396	0.6901	0.7690
0.2254	42.0	6552	0.7170	0.7509
0.2254	43.0	6708	0.7532	0.7473
0.2254	44.0	6864	0.7347	0.7617
0.2188	45.0	7020	0.6478	0.7509
0.2188	46.0	7176	0.7903	0.7545
0.2188	47.0	7332	0.9367	0.7220
0.2188	48.0	7488	0.8417	0.7690
0.2166	49.0	7644	0.8226	0.7617
0.2166	50.0	7800	0.6278	0.7545
0.2166	51.0	7956	0.7471	0.7473
0.1828	52.0	8112	0.7728	0.7617
0.1828	53.0	8268	0.7733	0.7690
0.1828	54.0	8424	0.7554	0.7581
0.163	55.0	8580	0.8025	0.7653
0.163	56.0	8736	0.8769	0.7617
0.163	57.0	8892	0.6569	0.7473
0.1563	58.0	9048	0.7166	0.7653
0.1563	59.0	9204	0.8688	0.7617
0.1563	60.0	9360	0.7254	0.7617
0.1423	61.0	9516	0.8286	0.7545
0.1423	62.0	9672	0.7656	0.7545
0.1423	63.0	9828	0.8362	0.7617
0.1423	64.0	9984	0.7287	0.7617
0.1355	65.0	10140	0.8451	0.7581
0.1355	66.0	10296	0.6854	0.7617
0.1355	67.0	10452	0.7272	0.7581
0.1321	68.0	10608	0.6530	0.7617
0.1321	69.0	10764	0.8535	0.7653
0.1321	70.0	10920	0.7803	0.7653
0.1217	71.0	11076	0.7409	0.7617
0.1217	72.0	11232	0.7044	0.7617
0.1217	73.0	11388	0.6501	0.7653
0.1224	74.0	11544	0.7102	0.7617
0.1224	75.0	11700	0.7050	0.7617
0.1224	76.0	11856	0.7103	0.7617
0.1173	77.0	12012	0.6821	0.7617
0.1173	78.0	12168	0.7196	0.7617
0.1173	79.0	12324	0.7048	0.7617
0.1173	80.0	12480	0.7104	0.7617

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824164037

20230824164037

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824164037

Evaluation results