20230824023615

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.0725
Accuracy: 0.7365

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.6124	0.5271
0.3459	2.0	624	0.2937	0.4729
0.3459	3.0	936	0.4930	0.4693
0.2482	4.0	1248	0.1965	0.4693
0.2242	5.0	1560	0.2537	0.4693
0.2242	6.0	1872	0.1661	0.5632
0.2359	7.0	2184	0.1414	0.6570
0.2359	8.0	2496	0.1893	0.5018
0.2404	9.0	2808	0.1265	0.6173
0.2198	10.0	3120	0.1214	0.6679
0.2198	11.0	3432	0.1352	0.6029
0.1657	12.0	3744	0.1030	0.7040
0.1472	13.0	4056	0.1043	0.6931
0.1472	14.0	4368	0.1011	0.7004
0.1408	15.0	4680	0.1111	0.7148
0.1408	16.0	4992	0.1046	0.6931
0.1321	17.0	5304	0.0964	0.7004
0.1285	18.0	5616	0.1019	0.7220
0.1285	19.0	5928	0.0927	0.7256
0.1244	20.0	6240	0.0972	0.7004
0.1191	21.0	6552	0.0947	0.7076
0.1191	22.0	6864	0.0983	0.7184
0.1129	23.0	7176	0.1029	0.7040
0.1129	24.0	7488	0.0993	0.7112
0.1115	25.0	7800	0.0933	0.7076
0.1079	26.0	8112	0.1092	0.6931
0.1079	27.0	8424	0.0837	0.7437
0.105	28.0	8736	0.0825	0.7256
0.1049	29.0	9048	0.0809	0.7148
0.1049	30.0	9360	0.0924	0.7256
0.1021	31.0	9672	0.0820	0.7292
0.1021	32.0	9984	0.0793	0.7256
0.099	33.0	10296	0.0820	0.7365
0.0966	34.0	10608	0.0831	0.7184
0.0966	35.0	10920	0.0796	0.7256
0.0928	36.0	11232	0.0790	0.7292
0.0888	37.0	11544	0.0953	0.7256
0.0888	38.0	11856	0.0791	0.7437
0.0905	39.0	12168	0.0849	0.7473
0.0905	40.0	12480	0.0782	0.7401
0.0872	41.0	12792	0.0754	0.7292
0.0853	42.0	13104	0.0770	0.7365
0.0853	43.0	13416	0.0742	0.7473
0.0843	44.0	13728	0.0764	0.7220
0.0826	45.0	14040	0.0765	0.7256
0.0826	46.0	14352	0.0746	0.7365
0.0811	47.0	14664	0.0736	0.7292
0.0811	48.0	14976	0.0824	0.7292
0.079	49.0	15288	0.0749	0.7401
0.0783	50.0	15600	0.0734	0.7401
0.0783	51.0	15912	0.0740	0.7401
0.0806	52.0	16224	0.0749	0.7365
0.078	53.0	16536	0.0729	0.7365
0.078	54.0	16848	0.0728	0.7401
0.0764	55.0	17160	0.0722	0.7437
0.0764	56.0	17472	0.0745	0.7365
0.0766	57.0	17784	0.0730	0.7329
0.0751	58.0	18096	0.0725	0.7401
0.0751	59.0	18408	0.0730	0.7365
0.0765	60.0	18720	0.0725	0.7365

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824023615

20230824023615

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824023615

Evaluation results