20230823184639

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1007
Accuracy: 0.7184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.2166	0.5307
0.2471	2.0	624	0.1849	0.5199
0.2471	3.0	936	0.2081	0.4729
0.2218	4.0	1248	0.1789	0.4910
0.221	5.0	1560	0.2006	0.4946
0.221	6.0	1872	0.1834	0.5632
0.2009	7.0	2184	0.1840	0.5523
0.2009	8.0	2496	0.1722	0.5415
0.1974	9.0	2808	0.1734	0.5668
0.1963	10.0	3120	0.1574	0.6245
0.1963	11.0	3432	0.2281	0.4982
0.1897	12.0	3744	0.1829	0.4982
0.1851	13.0	4056	0.1629	0.5379
0.1851	14.0	4368	0.1433	0.6498
0.1835	15.0	4680	0.1490	0.6426
0.1835	16.0	4992	0.1646	0.5812
0.1745	17.0	5304	0.1594	0.6390
0.1679	18.0	5616	0.1566	0.6462
0.1679	19.0	5928	0.1295	0.6895
0.1727	20.0	6240	0.1444	0.6354
0.1636	21.0	6552	0.1444	0.6282
0.1636	22.0	6864	0.1249	0.6823
0.1611	23.0	7176	0.1404	0.6606
0.1611	24.0	7488	0.1167	0.6859
0.1533	25.0	7800	0.1138	0.6895
0.1565	26.0	8112	0.1148	0.7148
0.1565	27.0	8424	0.1320	0.6462
0.1477	28.0	8736	0.1445	0.6643
0.152	29.0	9048	0.1106	0.6823
0.152	30.0	9360	0.1403	0.6823
0.1478	31.0	9672	0.1240	0.7076
0.1478	32.0	9984	0.1246	0.6823
0.1419	33.0	10296	0.1076	0.7184
0.1434	34.0	10608	0.1068	0.6931
0.1434	35.0	10920	0.1166	0.6968
0.1381	36.0	11232	0.1059	0.7004
0.1371	37.0	11544	0.1225	0.7040
0.1371	38.0	11856	0.1140	0.7076
0.1354	39.0	12168	0.1131	0.7256
0.1354	40.0	12480	0.1074	0.7148
0.1341	41.0	12792	0.1068	0.7329
0.1316	42.0	13104	0.1084	0.7004
0.1316	43.0	13416	0.1018	0.7148
0.1318	44.0	13728	0.1160	0.7292
0.1295	45.0	14040	0.1051	0.7148
0.1295	46.0	14352	0.1078	0.7076
0.128	47.0	14664	0.1059	0.7004
0.128	48.0	14976	0.1035	0.7256
0.1268	49.0	15288	0.1030	0.7004
0.1264	50.0	15600	0.1016	0.7148
0.1264	51.0	15912	0.1022	0.7004
0.1266	52.0	16224	0.1027	0.7040
0.1235	53.0	16536	0.1037	0.7112
0.1235	54.0	16848	0.1083	0.7184
0.121	55.0	17160	0.1008	0.7076
0.121	56.0	17472	0.1017	0.7184
0.1215	57.0	17784	0.1001	0.7148
0.1239	58.0	18096	0.1004	0.7148
0.1239	59.0	18408	0.1005	0.7184
0.1193	60.0	18720	0.1007	0.7184

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230823184639

20230823184639

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230823184639

Evaluation results