20230824064444

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.0709
Accuracy: 0.7329

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.4733	0.5307
0.3538	2.0	624	0.1917	0.5126
0.3538	3.0	936	0.1696	0.5560
0.2775	4.0	1248	0.1700	0.5271
0.2538	5.0	1560	0.3497	0.5343
0.2538	6.0	1872	0.2183	0.5632
0.259	7.0	2184	0.1783	0.5018
0.259	8.0	2496	0.2321	0.5848
0.2587	9.0	2808	0.2081	0.6101
0.2211	10.0	3120	0.1194	0.6715
0.2211	11.0	3432	0.1505	0.6390
0.198	12.0	3744	0.1130	0.7004
0.1939	13.0	4056	0.1187	0.6679
0.1939	14.0	4368	0.1175	0.6787
0.1687	15.0	4680	0.1092	0.7040
0.1687	16.0	4992	0.0984	0.7076
0.1511	17.0	5304	0.1032	0.7076
0.1448	18.0	5616	0.1024	0.7401
0.1448	19.0	5928	0.0902	0.7112
0.1392	20.0	6240	0.0972	0.7112
0.1283	21.0	6552	0.0880	0.7184
0.1283	22.0	6864	0.0892	0.7329
0.1257	23.0	7176	0.1156	0.7401
0.1257	24.0	7488	0.0940	0.7329
0.1215	25.0	7800	0.0876	0.7401
0.1184	26.0	8112	0.1289	0.7437
0.1184	27.0	8424	0.0808	0.7256
0.1112	28.0	8736	0.0823	0.7401
0.1139	29.0	9048	0.0838	0.7256
0.1139	30.0	9360	0.0855	0.7220
0.1095	31.0	9672	0.0813	0.7256
0.1095	32.0	9984	0.0765	0.7256
0.106	33.0	10296	0.0847	0.7365
0.1034	34.0	10608	0.0844	0.7509
0.1034	35.0	10920	0.0811	0.7184
0.0991	36.0	11232	0.0811	0.7292
0.0938	37.0	11544	0.0847	0.7365
0.0938	38.0	11856	0.0824	0.7256
0.0973	39.0	12168	0.0760	0.7292
0.0973	40.0	12480	0.0786	0.7220
0.0908	41.0	12792	0.0732	0.7473
0.0894	42.0	13104	0.0763	0.7401
0.0894	43.0	13416	0.0811	0.7365
0.0896	44.0	13728	0.0734	0.7473
0.0882	45.0	14040	0.0747	0.7329
0.0882	46.0	14352	0.0729	0.7401
0.0847	47.0	14664	0.0723	0.7329
0.0847	48.0	14976	0.0748	0.7401
0.0854	49.0	15288	0.0755	0.7292
0.0813	50.0	15600	0.0715	0.7329
0.0813	51.0	15912	0.0719	0.7292
0.0845	52.0	16224	0.0721	0.7401
0.0821	53.0	16536	0.0711	0.7292
0.0821	54.0	16848	0.0714	0.7437
0.0802	55.0	17160	0.0711	0.7401
0.0802	56.0	17472	0.0718	0.7329
0.0798	57.0	17784	0.0708	0.7220
0.0796	58.0	18096	0.0715	0.7365
0.0796	59.0	18408	0.0712	0.7329
0.0806	60.0	18720	0.0709	0.7329

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824064444

20230824064444

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824064444

Evaluation results