20230824043649

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.0771
Accuracy: 0.7365

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 4
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.4513	1.0	623	0.4036	0.4729
0.321	2.0	1246	0.3454	0.4729
0.339	3.0	1869	0.1727	0.5271
0.3594	4.0	2492	0.4321	0.4729
0.3103	5.0	3115	0.2311	0.5415
0.3042	6.0	3738	0.1428	0.6679
0.2996	7.0	4361	0.2423	0.5668
0.274	8.0	4984	0.1331	0.6895
0.2824	9.0	5607	0.1173	0.6931
0.2458	10.0	6230	0.1350	0.6968
0.2005	11.0	6853	0.1456	0.5884
0.1689	12.0	7476	0.1289	0.6787
0.1644	13.0	8099	0.1109	0.6931
0.1578	14.0	8722	0.1143	0.7040
0.1502	15.0	9345	0.1178	0.6968
0.141	16.0	9968	0.0974	0.6968
0.1365	17.0	10591	0.0980	0.6787
0.1327	18.0	11214	0.1128	0.6931
0.1352	19.0	11837	0.1543	0.6390
0.1324	20.0	12460	0.0938	0.7184
0.1274	21.0	13083	0.0907	0.7112
0.1244	22.0	13706	0.1093	0.7112
0.1227	23.0	14329	0.1061	0.7076
0.1142	24.0	14952	0.0972	0.7112
0.1094	25.0	15575	0.0872	0.7184
0.1099	26.0	16198	0.0904	0.7292
0.1086	27.0	16821	0.0912	0.7040
0.1083	28.0	17444	0.0850	0.7148
0.1061	29.0	18067	0.0832	0.7184
0.1008	30.0	18690	0.0951	0.7292
0.1036	31.0	19313	0.0879	0.7220
0.1024	32.0	19936	0.0850	0.7220
0.0945	33.0	20559	0.0828	0.7220
0.0961	34.0	21182	0.0838	0.7329
0.0935	35.0	21805	0.0814	0.7256
0.097	36.0	22428	0.0812	0.7329
0.0925	37.0	23051	0.0810	0.7292
0.0911	38.0	23674	0.0826	0.7256
0.0855	39.0	24297	0.0815	0.7329
0.0895	40.0	24920	0.0826	0.7329
0.0847	41.0	25543	0.0821	0.7292
0.0864	42.0	26166	0.0797	0.7292
0.0848	43.0	26789	0.0823	0.7256
0.0817	44.0	27412	0.0791	0.7329
0.0829	45.0	28035	0.0795	0.7220
0.0826	46.0	28658	0.0789	0.7365
0.0816	47.0	29281	0.0783	0.7220
0.0821	48.0	29904	0.0796	0.7437
0.0798	49.0	30527	0.0800	0.7220
0.0782	50.0	31150	0.0784	0.7437
0.079	51.0	31773	0.0784	0.7401
0.0797	52.0	32396	0.0795	0.7329
0.0804	53.0	33019	0.0784	0.7365
0.0762	54.0	33642	0.0770	0.7329
0.0727	55.0	34265	0.0777	0.7365
0.0749	56.0	34888	0.0786	0.7329
0.0737	57.0	35511	0.0773	0.7292
0.0734	58.0	36134	0.0776	0.7292
0.0737	59.0	36757	0.0777	0.7365
0.0736	60.0	37380	0.0771	0.7365

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824043649

20230824043649

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824043649

Evaluation results