20230825091928

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1543
Accuracy: 0.7437

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.6113	0.5307
No log	2.0	312	0.9432	0.4693
No log	3.0	468	0.9610	0.4729
0.8937	4.0	624	0.5415	0.5487
0.8937	5.0	780	0.4722	0.6209
0.8937	6.0	936	0.4314	0.6390
0.7579	7.0	1092	0.7937	0.5704
0.7579	8.0	1248	0.4160	0.6282
0.7579	9.0	1404	0.3071	0.6787
0.7059	10.0	1560	0.4325	0.6498
0.7059	11.0	1716	0.7958	0.5090
0.7059	12.0	1872	0.3046	0.6823
0.654	13.0	2028	0.3405	0.7220
0.654	14.0	2184	0.2875	0.6751
0.654	15.0	2340	0.4266	0.6426
0.654	16.0	2496	0.5710	0.5957
0.6649	17.0	2652	0.3009	0.7256
0.6649	18.0	2808	0.7588	0.6534
0.6649	19.0	2964	0.2785	0.7292
0.5523	20.0	3120	0.2400	0.6895
0.5523	21.0	3276	0.2582	0.6859
0.5523	22.0	3432	0.3514	0.6462
0.511	23.0	3588	0.2163	0.7112
0.511	24.0	3744	0.2226	0.7076
0.511	25.0	3900	0.2138	0.7148
0.4948	26.0	4056	0.2851	0.7437
0.4948	27.0	4212	0.2584	0.7220
0.4948	28.0	4368	0.2217	0.7401
0.4342	29.0	4524	0.2014	0.7076
0.4342	30.0	4680	0.1907	0.7184
0.4342	31.0	4836	0.2176	0.7076
0.4342	32.0	4992	0.1863	0.7184
0.4098	33.0	5148	0.1862	0.7292
0.4098	34.0	5304	0.2253	0.7292
0.4098	35.0	5460	0.1960	0.7256
0.3743	36.0	5616	0.2416	0.7401
0.3743	37.0	5772	0.1988	0.7292
0.3743	38.0	5928	0.2031	0.7076
0.3477	39.0	6084	0.1847	0.7292
0.3477	40.0	6240	0.2001	0.7220
0.3477	41.0	6396	0.1955	0.7401
0.3221	42.0	6552	0.2075	0.7329
0.3221	43.0	6708	0.1751	0.7365
0.3221	44.0	6864	0.2256	0.7148
0.3034	45.0	7020	0.1913	0.7329
0.3034	46.0	7176	0.1867	0.7437
0.3034	47.0	7332	0.1842	0.7292
0.3034	48.0	7488	0.1719	0.7365
0.2656	49.0	7644	0.1810	0.7617
0.2656	50.0	7800	0.2172	0.7256
0.2656	51.0	7956	0.2065	0.7545
0.2676	52.0	8112	0.1682	0.7473
0.2676	53.0	8268	0.1819	0.7329
0.2676	54.0	8424	0.1703	0.7509
0.2396	55.0	8580	0.1971	0.7509
0.2396	56.0	8736	0.1889	0.7365
0.2396	57.0	8892	0.2933	0.6968
0.2355	58.0	9048	0.1650	0.7509
0.2355	59.0	9204	0.1760	0.7473
0.2355	60.0	9360	0.1553	0.7581
0.2196	61.0	9516	0.1707	0.7437
0.2196	62.0	9672	0.1933	0.7401
0.2196	63.0	9828	0.1726	0.7401
0.2196	64.0	9984	0.1654	0.7509
0.2114	65.0	10140	0.1783	0.7401
0.2114	66.0	10296	0.1724	0.7473
0.2114	67.0	10452	0.1647	0.7473
0.208	68.0	10608	0.1734	0.7437
0.208	69.0	10764	0.1640	0.7365
0.208	70.0	10920	0.1953	0.7329
0.2014	71.0	11076	0.1550	0.7509
0.2014	72.0	11232	0.1781	0.7509
0.2014	73.0	11388	0.1687	0.7365
0.1906	74.0	11544	0.1695	0.7473
0.1906	75.0	11700	0.1560	0.7509
0.1906	76.0	11856	0.1532	0.7509
0.1864	77.0	12012	0.1524	0.7401
0.1864	78.0	12168	0.1537	0.7545
0.1864	79.0	12324	0.1531	0.7509
0.1864	80.0	12480	0.1543	0.7437

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230825091928

20230825091928

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230825091928

Evaluation results