20230824104100

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.0729
Accuracy: 0.7473

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.2294	0.5307
0.3686	2.0	624	0.5346	0.4729
0.3686	3.0	936	0.2223	0.5235
0.2907	4.0	1248	0.1895	0.4729
0.2686	5.0	1560	0.1783	0.5018
0.2686	6.0	1872	0.1995	0.5884
0.2686	7.0	2184	0.3037	0.5740
0.2686	8.0	2496	0.1386	0.6715
0.266	9.0	2808	0.1311	0.7076
0.2363	10.0	3120	0.1403	0.6968
0.2363	11.0	3432	0.2988	0.5957
0.215	12.0	3744	0.1119	0.6968
0.198	13.0	4056	0.1238	0.6859
0.198	14.0	4368	0.1107	0.7040
0.1845	15.0	4680	0.1604	0.6570
0.1845	16.0	4992	0.1143	0.7004
0.1664	17.0	5304	0.1197	0.7148
0.159	18.0	5616	0.1122	0.7329
0.159	19.0	5928	0.1038	0.7184
0.145	20.0	6240	0.0973	0.7040
0.1304	21.0	6552	0.0996	0.7292
0.1304	22.0	6864	0.0938	0.7473
0.1264	23.0	7176	0.1212	0.7437
0.1264	24.0	7488	0.0953	0.7256
0.1212	25.0	7800	0.0899	0.7329
0.1172	26.0	8112	0.1037	0.7365
0.1172	27.0	8424	0.0844	0.7292
0.1122	28.0	8736	0.0850	0.7365
0.1131	29.0	9048	0.0875	0.7220
0.1131	30.0	9360	0.0904	0.7437
0.1082	31.0	9672	0.0883	0.7184
0.1082	32.0	9984	0.0800	0.7509
0.1086	33.0	10296	0.0897	0.7509
0.1015	34.0	10608	0.0837	0.7473
0.1015	35.0	10920	0.0820	0.7329
0.099	36.0	11232	0.0819	0.7365
0.0942	37.0	11544	0.0858	0.7509
0.0942	38.0	11856	0.0793	0.7437
0.0956	39.0	12168	0.0823	0.7581
0.0956	40.0	12480	0.0860	0.7256
0.0921	41.0	12792	0.0753	0.7545
0.0911	42.0	13104	0.0838	0.7473
0.0911	43.0	13416	0.0763	0.7545
0.0894	44.0	13728	0.0761	0.7473
0.0886	45.0	14040	0.0752	0.7581
0.0886	46.0	14352	0.0743	0.7437
0.0855	47.0	14664	0.0759	0.7581
0.0855	48.0	14976	0.0801	0.7437
0.0837	49.0	15288	0.0797	0.7473
0.083	50.0	15600	0.0734	0.7509
0.083	51.0	15912	0.0756	0.7545
0.0845	52.0	16224	0.0744	0.7401
0.084	53.0	16536	0.0731	0.7545
0.084	54.0	16848	0.0736	0.7473
0.0797	55.0	17160	0.0734	0.7653
0.0797	56.0	17472	0.0735	0.7545
0.0803	57.0	17784	0.0737	0.7545
0.0792	58.0	18096	0.0735	0.7581
0.0792	59.0	18408	0.0732	0.7581
0.0815	60.0	18720	0.0729	0.7473

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824104100

20230824104100

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824104100

Evaluation results