20230822144501

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3480
Accuracy: 0.5271

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.3687	0.4729
0.3745	2.0	624	0.3523	0.5379
0.3745	3.0	936	0.3598	0.4729
0.3798	4.0	1248	0.3479	0.5271
0.3752	5.0	1560	0.3593	0.4729
0.3752	6.0	1872	0.3505	0.5271
0.373	7.0	2184	0.3480	0.5271
0.373	8.0	2496	0.3593	0.4729
0.3724	9.0	2808	0.3490	0.5271
0.3669	10.0	3120	0.3489	0.5271
0.3669	11.0	3432	0.3487	0.5271
0.3681	12.0	3744	0.3588	0.4729
0.3636	13.0	4056	0.3519	0.5271
0.3636	14.0	4368	0.3511	0.5271
0.3629	15.0	4680	0.3510	0.5271
0.3629	16.0	4992	0.3478	0.5271
0.3591	17.0	5304	0.3502	0.5271
0.3564	18.0	5616	0.3481	0.5271
0.3564	19.0	5928	0.3511	0.5271
0.3573	20.0	6240	0.3512	0.5271
0.3574	21.0	6552	0.3481	0.5271
0.3574	22.0	6864	0.3488	0.5271
0.3566	23.0	7176	0.3516	0.5271
0.3566	24.0	7488	0.3483	0.5271
0.3571	25.0	7800	0.3478	0.5271
0.3562	26.0	8112	0.3478	0.5271
0.3562	27.0	8424	0.3534	0.5271
0.356	28.0	8736	0.3482	0.5271
0.3564	29.0	9048	0.3479	0.5271
0.3564	30.0	9360	0.3506	0.5271
0.3566	31.0	9672	0.3481	0.5271
0.3566	32.0	9984	0.3480	0.5271
0.3552	33.0	10296	0.3479	0.5271
0.3558	34.0	10608	0.3483	0.5271
0.3558	35.0	10920	0.3482	0.5271
0.3553	36.0	11232	0.3494	0.5271
0.3546	37.0	11544	0.3478	0.5271
0.3546	38.0	11856	0.3491	0.5271
0.3558	39.0	12168	0.3479	0.5271
0.3558	40.0	12480	0.3486	0.5271
0.3558	41.0	12792	0.3480	0.5271
0.3551	42.0	13104	0.3495	0.5271
0.3551	43.0	13416	0.3479	0.5271
0.3563	44.0	13728	0.3480	0.5271
0.3549	45.0	14040	0.3503	0.5271
0.3549	46.0	14352	0.3490	0.5271
0.355	47.0	14664	0.3493	0.5271
0.355	48.0	14976	0.3479	0.5271
0.3551	49.0	15288	0.3484	0.5271
0.3558	50.0	15600	0.3479	0.5271
0.3558	51.0	15912	0.3480	0.5271
0.3542	52.0	16224	0.3488	0.5271
0.3553	53.0	16536	0.3483	0.5271
0.3553	54.0	16848	0.3485	0.5271
0.3544	55.0	17160	0.3481	0.5271
0.3544	56.0	17472	0.3480	0.5271
0.3549	57.0	17784	0.3483	0.5271
0.3544	58.0	18096	0.3481	0.5271
0.3544	59.0	18408	0.3481	0.5271
0.3537	60.0	18720	0.3480	0.5271

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822144501

20230822144501

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822144501

Evaluation results