20230823073139

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.0707
Accuracy: 0.4729

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.0712	0.4729
0.0874	2.0	624	0.0707	0.4729
0.0874	3.0	936	0.0707	0.4729
0.0879	4.0	1248	0.0707	0.4729
0.0869	5.0	1560	0.0710	0.4729
0.0869	6.0	1872	0.0707	0.4729
0.0872	7.0	2184	0.0708	0.4729
0.0872	8.0	2496	0.0708	0.4729
0.0865	9.0	2808	0.0712	0.4729
0.0864	10.0	3120	0.0707	0.4729
0.0864	11.0	3432	0.0707	0.4729
0.0866	12.0	3744	0.0709	0.4729
0.0862	13.0	4056	0.0710	0.4729
0.0862	14.0	4368	0.0709	0.4729
0.0865	15.0	4680	0.0712	0.4729
0.0865	16.0	4992	0.0708	0.4729
0.0859	17.0	5304	0.0706	0.4729
0.0855	18.0	5616	0.0708	0.4729
0.0855	19.0	5928	0.0706	0.4729
0.086	20.0	6240	0.0706	0.4729
0.0852	21.0	6552	0.0707	0.4729
0.0852	22.0	6864	0.0708	0.4729
0.0865	23.0	7176	0.0706	0.4729
0.0865	24.0	7488	0.0706	0.4729
0.0862	25.0	7800	0.0706	0.4729
0.0849	26.0	8112	0.0706	0.4729
0.0849	27.0	8424	0.0711	0.4729
0.0847	28.0	8736	0.0706	0.4729
0.0846	29.0	9048	0.0708	0.4729
0.0846	30.0	9360	0.0705	0.4729
0.0847	31.0	9672	0.0708	0.4729
0.0847	32.0	9984	0.0706	0.4729
0.0854	33.0	10296	0.0706	0.4729
0.084	34.0	10608	0.0706	0.4729
0.084	35.0	10920	0.0709	0.4729
0.0845	36.0	11232	0.0707	0.4729
0.0842	37.0	11544	0.0706	0.4729
0.0842	38.0	11856	0.0706	0.4729
0.0847	39.0	12168	0.0706	0.4729
0.0847	40.0	12480	0.0706	0.4729
0.0839	41.0	12792	0.0705	0.4729
0.0848	42.0	13104	0.0706	0.4729
0.0848	43.0	13416	0.0706	0.4729
0.0841	44.0	13728	0.0706	0.4729
0.0845	45.0	14040	0.0709	0.4729
0.0845	46.0	14352	0.0706	0.4729
0.0842	47.0	14664	0.0707	0.4729
0.0842	48.0	14976	0.0707	0.4729
0.0842	49.0	15288	0.0707	0.4729
0.0837	50.0	15600	0.0706	0.4729
0.0837	51.0	15912	0.0706	0.4729
0.0845	52.0	16224	0.0707	0.4729
0.0844	53.0	16536	0.0707	0.4729
0.0844	54.0	16848	0.0706	0.4729
0.0846	55.0	17160	0.0706	0.4729
0.0846	56.0	17472	0.0706	0.4729
0.0836	57.0	17784	0.0706	0.4729
0.0847	58.0	18096	0.0707	0.4729
0.0847	59.0	18408	0.0707	0.4729
0.0849	60.0	18720	0.0707	0.4729

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230823073139

20230823073139

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230823073139

Evaluation results