20230822144236

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3486
Accuracy: 0.5235

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.3705	0.4729
0.3743	2.0	624	0.3484	0.5162
0.3743	3.0	936	0.3504	0.5162
0.3726	4.0	1248	0.3527	0.5235
0.3712	5.0	1560	0.3552	0.4729
0.3712	6.0	1872	0.3480	0.5199
0.3669	7.0	2184	0.3501	0.4729
0.3669	8.0	2496	0.3503	0.4368
0.3658	9.0	2808	0.3503	0.5343
0.3656	10.0	3120	0.3483	0.5199
0.3656	11.0	3432	0.3510	0.4729
0.3634	12.0	3744	0.3557	0.4729
0.3613	13.0	4056	0.3537	0.4729
0.3613	14.0	4368	0.3505	0.5199
0.3609	15.0	4680	0.3493	0.5199
0.3609	16.0	4992	0.3488	0.5307
0.3591	17.0	5304	0.3568	0.5235
0.3574	18.0	5616	0.3486	0.5235
0.3574	19.0	5928	0.3552	0.4729
0.3599	20.0	6240	0.3553	0.5271
0.3556	21.0	6552	0.3502	0.5307
0.3556	22.0	6864	0.3525	0.5271
0.3573	23.0	7176	0.3553	0.5199
0.3573	24.0	7488	0.3492	0.5162
0.3574	25.0	7800	0.3492	0.5235
0.3559	26.0	8112	0.3531	0.4729
0.3559	27.0	8424	0.3602	0.4729
0.3544	28.0	8736	0.3501	0.5379
0.3539	29.0	9048	0.3490	0.5018
0.3539	30.0	9360	0.3491	0.5090
0.3529	31.0	9672	0.3518	0.5271
0.3529	32.0	9984	0.3489	0.5199
0.3531	33.0	10296	0.3484	0.5307
0.3527	34.0	10608	0.3487	0.5271
0.3527	35.0	10920	0.3491	0.5307
0.3521	36.0	11232	0.3498	0.5343
0.3513	37.0	11544	0.3500	0.5235
0.3513	38.0	11856	0.3487	0.5235
0.3526	39.0	12168	0.3494	0.5415
0.3526	40.0	12480	0.3495	0.5451
0.352	41.0	12792	0.3489	0.5343
0.353	42.0	13104	0.3530	0.4729
0.353	43.0	13416	0.3492	0.5271
0.3509	44.0	13728	0.3501	0.4693
0.3523	45.0	14040	0.3525	0.4729
0.3523	46.0	14352	0.3491	0.5054
0.3506	47.0	14664	0.3515	0.4729
0.3506	48.0	14976	0.3494	0.5379
0.3518	49.0	15288	0.3483	0.5235
0.3507	50.0	15600	0.3490	0.5271
0.3507	51.0	15912	0.3489	0.5379
0.3514	52.0	16224	0.3490	0.5090
0.3509	53.0	16536	0.3484	0.5235
0.3509	54.0	16848	0.3486	0.5199
0.3499	55.0	17160	0.3485	0.5199
0.3499	56.0	17472	0.3486	0.5199
0.3504	57.0	17784	0.3493	0.5415
0.3495	58.0	18096	0.3486	0.5307
0.3495	59.0	18408	0.3485	0.5271
0.3505	60.0	18720	0.3486	0.5235

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822144236

20230822144236

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822144236

Evaluation results