20230822185237

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3335
Accuracy: 0.6498

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.002
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.3589	0.5415
0.4381	2.0	624	0.3585	0.5560
0.4381	3.0	936	0.4824	0.4729
0.4251	4.0	1248	0.3497	0.5740
0.4013	5.0	1560	0.5515	0.5307
0.4013	6.0	1872	0.5300	0.5343
0.4064	7.0	2184	0.3515	0.4982
0.4064	8.0	2496	0.3456	0.5704
0.4121	9.0	2808	0.3522	0.5632
0.4048	10.0	3120	0.3437	0.5632
0.4048	11.0	3432	0.3483	0.5668
0.4035	12.0	3744	0.3952	0.4657
0.3797	13.0	4056	0.3535	0.4801
0.3797	14.0	4368	0.3443	0.5993
0.3657	15.0	4680	0.3431	0.5379
0.3657	16.0	4992	0.3478	0.5993
0.3615	17.0	5304	0.3475	0.6173
0.3573	18.0	5616	0.3539	0.6101
0.3573	19.0	5928	0.3384	0.6101
0.3552	20.0	6240	0.3483	0.6245
0.3545	21.0	6552	0.3359	0.6173
0.3545	22.0	6864	0.3844	0.5740
0.349	23.0	7176	0.3436	0.6498
0.349	24.0	7488	0.3422	0.6209
0.351	25.0	7800	0.3495	0.6318
0.3471	26.0	8112	0.3498	0.6101
0.3471	27.0	8424	0.3316	0.6462
0.3468	28.0	8736	0.3322	0.6751
0.3459	29.0	9048	0.3354	0.6390
0.3459	30.0	9360	0.3353	0.6390
0.344	31.0	9672	0.3383	0.6354
0.344	32.0	9984	0.3329	0.6245
0.3435	33.0	10296	0.3411	0.6390
0.3408	34.0	10608	0.3414	0.6354
0.3408	35.0	10920	0.3319	0.6534
0.3401	36.0	11232	0.3347	0.6282
0.3406	37.0	11544	0.3382	0.6137
0.3406	38.0	11856	0.3355	0.6245
0.3378	39.0	12168	0.3416	0.6245
0.3378	40.0	12480	0.3422	0.6209
0.3386	41.0	12792	0.3388	0.6390
0.3362	42.0	13104	0.3330	0.6390
0.3362	43.0	13416	0.3393	0.6282
0.3373	44.0	13728	0.3340	0.6282
0.3337	45.0	14040	0.3318	0.6390
0.3337	46.0	14352	0.3323	0.6354
0.3332	47.0	14664	0.3301	0.6643
0.3332	48.0	14976	0.3422	0.6282
0.3315	49.0	15288	0.3348	0.6570
0.33	50.0	15600	0.3366	0.6462
0.33	51.0	15912	0.3308	0.6570
0.331	52.0	16224	0.3298	0.6606
0.3295	53.0	16536	0.3377	0.6498
0.3295	54.0	16848	0.3439	0.6462
0.3282	55.0	17160	0.3326	0.6570
0.3282	56.0	17472	0.3356	0.6498
0.3291	57.0	17784	0.3309	0.6570
0.3278	58.0	18096	0.3333	0.6498
0.3278	59.0	18408	0.3324	0.6498
0.3292	60.0	18720	0.3335	0.6498

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822185237

20230822185237

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822185237

Evaluation results