20230822202056

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1724
Accuracy: 0.7112

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.1785	0.5307
0.2552	2.0	624	0.1826	0.5054
0.2552	3.0	936	0.3328	0.4729
0.24	4.0	1248	0.2050	0.4729
0.2369	5.0	1560	0.1750	0.6065
0.2369	6.0	1872	0.1752	0.4765
0.2199	7.0	2184	0.1799	0.5921
0.2199	8.0	2496	0.1896	0.4729
0.1955	9.0	2808	0.1727	0.6245
0.185	10.0	3120	0.1734	0.5668
0.185	11.0	3432	0.1781	0.5812
0.184	12.0	3744	0.1711	0.6318
0.1819	13.0	4056	0.1783	0.4910
0.1819	14.0	4368	0.1703	0.6534
0.1793	15.0	4680	0.1697	0.6931
0.1793	16.0	4992	0.1710	0.6643
0.179	17.0	5304	0.1728	0.6534
0.1784	18.0	5616	0.1712	0.6498
0.1784	19.0	5928	0.1726	0.6065
0.1778	20.0	6240	0.1720	0.6679
0.1761	21.0	6552	0.1724	0.6606
0.1761	22.0	6864	0.1792	0.6534
0.1761	23.0	7176	0.1700	0.6715
0.1761	24.0	7488	0.1698	0.6679
0.1748	25.0	7800	0.1697	0.6968
0.1744	26.0	8112	0.1729	0.6859
0.1744	27.0	8424	0.1702	0.6570
0.1736	28.0	8736	0.1708	0.6931
0.1723	29.0	9048	0.1698	0.6787
0.1723	30.0	9360	0.1799	0.6462
0.1735	31.0	9672	0.1727	0.6751
0.1735	32.0	9984	0.1732	0.6498
0.1722	33.0	10296	0.1702	0.6751
0.1709	34.0	10608	0.1707	0.6968
0.1709	35.0	10920	0.1714	0.6968
0.1697	36.0	11232	0.1712	0.6751
0.1696	37.0	11544	0.1788	0.6570
0.1696	38.0	11856	0.1703	0.6787
0.1697	39.0	12168	0.1735	0.6751
0.1697	40.0	12480	0.1740	0.6787
0.1683	41.0	12792	0.1710	0.6895
0.1688	42.0	13104	0.1724	0.7076
0.1688	43.0	13416	0.1718	0.7004
0.1679	44.0	13728	0.1736	0.7040
0.1681	45.0	14040	0.1720	0.7040
0.1681	46.0	14352	0.1717	0.7076
0.1664	47.0	14664	0.1710	0.6895
0.1664	48.0	14976	0.1766	0.6895
0.1662	49.0	15288	0.1729	0.7040
0.1655	50.0	15600	0.1704	0.7076
0.1655	51.0	15912	0.1711	0.7184
0.1665	52.0	16224	0.1709	0.7040
0.1651	53.0	16536	0.1711	0.6931
0.1651	54.0	16848	0.1736	0.7040
0.1646	55.0	17160	0.1712	0.7112
0.1646	56.0	17472	0.1740	0.7076
0.1647	57.0	17784	0.1723	0.7076
0.1642	58.0	18096	0.1715	0.7004
0.1642	59.0	18408	0.1727	0.7076
0.1643	60.0	18720	0.1724	0.7112

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822202056

20230822202056

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822202056

Evaluation results