20230822105333

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3480
Accuracy: 0.5271

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	2.0240	0.5271
1.1081	2.0	624	0.8435	0.5271
1.1081	3.0	936	0.4636	0.4729
1.109	4.0	1248	0.3964	0.4729
0.9629	5.0	1560	0.3803	0.5271
0.9629	6.0	1872	0.3630	0.5271
0.8211	7.0	2184	0.5683	0.5271
0.8211	8.0	2496	0.3645	0.4729
0.8143	9.0	2808	0.4972	0.5271
0.8375	10.0	3120	0.4557	0.4729
0.8375	11.0	3432	0.4497	0.5271
0.7522	12.0	3744	0.4278	0.4729
0.7584	13.0	4056	0.5233	0.5271
0.7584	14.0	4368	0.4097	0.5271
0.6684	15.0	4680	0.4749	0.4729
0.6684	16.0	4992	0.7626	0.5271
0.6637	17.0	5304	0.6379	0.5271
0.5907	18.0	5616	0.3496	0.5271
0.5907	19.0	5928	0.4018	0.5271
0.5618	20.0	6240	0.3606	0.5271
0.5539	21.0	6552	0.3596	0.4729
0.5539	22.0	6864	0.4662	0.5271
0.537	23.0	7176	0.3488	0.5271
0.537	24.0	7488	0.8345	0.4729
0.5337	25.0	7800	0.3486	0.5271
0.5058	26.0	8112	0.3496	0.5271
0.5058	27.0	8424	0.5283	0.4729
0.5239	28.0	8736	0.3566	0.5271
0.4835	29.0	9048	0.3810	0.4729
0.4835	30.0	9360	0.4577	0.5271
0.4672	31.0	9672	0.4612	0.4729
0.4672	32.0	9984	0.4667	0.5271
0.4699	33.0	10296	0.3585	0.5271
0.4637	34.0	10608	0.3518	0.5271
0.4637	35.0	10920	0.4995	0.4729
0.4539	36.0	11232	0.3777	0.4729
0.4465	37.0	11544	0.3492	0.5271
0.4465	38.0	11856	0.3486	0.5271
0.4446	39.0	12168	0.3482	0.5271
0.4446	40.0	12480	0.3776	0.4729
0.437	41.0	12792	0.3485	0.5271
0.4309	42.0	13104	0.3481	0.5271
0.4309	43.0	13416	0.3657	0.5271
0.424	44.0	13728	0.3484	0.5271
0.4165	45.0	14040	0.3492	0.5271
0.4165	46.0	14352	0.3706	0.4729
0.4206	47.0	14664	0.3490	0.5271
0.4206	48.0	14976	0.3510	0.5271
0.4202	49.0	15288	0.3478	0.5271
0.4038	50.0	15600	0.3621	0.5271
0.4038	51.0	15912	0.3480	0.5271
0.3916	52.0	16224	0.4587	0.4729
0.3901	53.0	16536	0.3506	0.5271
0.3901	54.0	16848	0.3545	0.5271
0.3805	55.0	17160	0.3540	0.4729
0.3805	56.0	17472	0.3626	0.5271
0.3781	57.0	17784	0.3504	0.5271
0.3688	58.0	18096	0.3478	0.5271
0.3688	59.0	18408	0.3527	0.5271
0.3657	60.0	18720	0.3480	0.5271

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822105333

20230822105333

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822105333

Evaluation results