20230822202110

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1679
Accuracy: 0.7148

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.4220	0.5271
No log	2.0	312	0.2767	0.4729
No log	3.0	468	0.4345	0.4729
0.2507	4.0	624	0.2006	0.5343
0.2507	5.0	780	0.1797	0.4729
0.2507	6.0	936	0.2180	0.5271
0.2023	7.0	1092	0.1726	0.5054
0.2023	8.0	1248	0.1811	0.4729
0.2023	9.0	1404	0.1828	0.5451
0.2077	10.0	1560	0.1921	0.5343
0.2077	11.0	1716	0.1772	0.4838
0.2077	12.0	1872	0.1724	0.6462
0.189	13.0	2028	0.1718	0.5379
0.189	14.0	2184	0.1728	0.5126
0.189	15.0	2340	0.1775	0.5126
0.189	16.0	2496	0.1813	0.5596
0.1803	17.0	2652	0.1739	0.6318
0.1803	18.0	2808	0.1718	0.6137
0.1803	19.0	2964	0.1711	0.6390
0.1791	20.0	3120	0.1797	0.5957
0.1791	21.0	3276	0.1710	0.6859
0.1791	22.0	3432	0.1729	0.6643
0.1781	23.0	3588	0.1701	0.6823
0.1781	24.0	3744	0.1706	0.6390
0.1781	25.0	3900	0.1708	0.6859
0.1765	26.0	4056	0.1697	0.6643
0.1765	27.0	4212	0.1698	0.6715
0.1765	28.0	4368	0.1710	0.6426
0.176	29.0	4524	0.1710	0.6931
0.176	30.0	4680	0.1703	0.6968
0.176	31.0	4836	0.1725	0.6570
0.176	32.0	4992	0.1699	0.6715
0.1749	33.0	5148	0.1710	0.6895
0.1749	34.0	5304	0.1694	0.7220
0.1749	35.0	5460	0.1700	0.6534
0.1739	36.0	5616	0.1690	0.7112
0.1739	37.0	5772	0.1685	0.7220
0.1739	38.0	5928	0.1696	0.7040
0.1738	39.0	6084	0.1688	0.7148
0.1738	40.0	6240	0.1692	0.7220
0.1738	41.0	6396	0.1683	0.7365
0.1726	42.0	6552	0.1690	0.6679
0.1726	43.0	6708	0.1679	0.7076
0.1726	44.0	6864	0.1691	0.7184
0.1719	45.0	7020	0.1692	0.7292
0.1719	46.0	7176	0.1685	0.7329
0.1719	47.0	7332	0.1684	0.7184
0.1719	48.0	7488	0.1690	0.7112
0.1712	49.0	7644	0.1690	0.7292
0.1712	50.0	7800	0.1685	0.6931
0.1712	51.0	7956	0.1680	0.7256
0.1705	52.0	8112	0.1687	0.7076
0.1705	53.0	8268	0.1685	0.7184
0.1705	54.0	8424	0.1689	0.7365
0.1705	55.0	8580	0.1677	0.7148
0.1705	56.0	8736	0.1694	0.7220
0.1705	57.0	8892	0.1682	0.7256
0.1692	58.0	9048	0.1684	0.7148
0.1692	59.0	9204	0.1679	0.7148
0.1692	60.0	9360	0.1679	0.7148

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822202110

20230822202110

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822202110

Evaluation results