20230824063515

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.2971
Accuracy: 0.7437

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.5375	0.5307
0.6046	2.0	624	0.6540	0.4729
0.6046	3.0	936	0.4055	0.5415
0.5378	4.0	1248	0.3920	0.5957
0.5028	5.0	1560	0.4366	0.5921
0.5028	6.0	1872	0.3927	0.6498
0.4686	7.0	2184	0.4005	0.6715
0.4686	8.0	2496	0.3381	0.6643
0.434	9.0	2808	0.3351	0.6679
0.4165	10.0	3120	0.4170	0.6282
0.4165	11.0	3432	0.4045	0.6462
0.4099	12.0	3744	0.4218	0.6895
0.3978	13.0	4056	0.3215	0.7184
0.3978	14.0	4368	0.3361	0.7256
0.3771	15.0	4680	0.4252	0.6426
0.3771	16.0	4992	0.3370	0.7148
0.3682	17.0	5304	0.7211	0.6498
0.3718	18.0	5616	0.3221	0.7004
0.3718	19.0	5928	0.3008	0.7220
0.3568	20.0	6240	0.3129	0.7256
0.325	21.0	6552	0.5513	0.6895
0.325	22.0	6864	0.3316	0.7040
0.3157	23.0	7176	0.4315	0.6968
0.3157	24.0	7488	0.3027	0.7545
0.2914	25.0	7800	0.3060	0.7545
0.2811	26.0	8112	0.3481	0.7365
0.2811	27.0	8424	0.3148	0.7401
0.2657	28.0	8736	0.3024	0.7401
0.265	29.0	9048	0.3254	0.7509
0.265	30.0	9360	0.3451	0.7437
0.2535	31.0	9672	0.3132	0.7545
0.2535	32.0	9984	0.2981	0.7365
0.2507	33.0	10296	0.3338	0.7617
0.2397	34.0	10608	0.3275	0.7365
0.2397	35.0	10920	0.3021	0.7401
0.2379	36.0	11232	0.3322	0.7401
0.2247	37.0	11544	0.3617	0.7329
0.2247	38.0	11856	0.3050	0.7437
0.2291	39.0	12168	0.3189	0.7401
0.2291	40.0	12480	0.2946	0.7473
0.2187	41.0	12792	0.2927	0.7365
0.2175	42.0	13104	0.3130	0.7401
0.2175	43.0	13416	0.2942	0.7365
0.2161	44.0	13728	0.3026	0.7437
0.2072	45.0	14040	0.3566	0.7329
0.2072	46.0	14352	0.2972	0.7437
0.2086	47.0	14664	0.2904	0.7365
0.2086	48.0	14976	0.2961	0.7473
0.2037	49.0	15288	0.3246	0.7473
0.1989	50.0	15600	0.2906	0.7473
0.1989	51.0	15912	0.2876	0.7401
0.2034	52.0	16224	0.3103	0.7437
0.2003	53.0	16536	0.3022	0.7617
0.2003	54.0	16848	0.3022	0.7437
0.1962	55.0	17160	0.2962	0.7365
0.1962	56.0	17472	0.2996	0.7473
0.195	57.0	17784	0.3006	0.7437
0.191	58.0	18096	0.2879	0.7401
0.191	59.0	18408	0.2972	0.7473
0.1946	60.0	18720	0.2971	0.7437

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824063515

20230824063515

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824063515

Evaluation results