20230823015034

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.0704
Accuracy: 0.4729

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.3443	0.5271
0.5033	2.0	624	0.0706	0.4946
0.5033	3.0	936	0.0765	0.5199
0.0871	4.0	1248	0.0783	0.5235
0.089	5.0	1560	0.0730	0.4729
0.089	6.0	1872	0.1001	0.4729
0.089	7.0	2184	0.0714	0.4729
0.089	8.0	2496	0.0726	0.5054
0.0851	9.0	2808	0.0780	0.5271
0.1659	10.0	3120	0.0799	0.5271
0.1659	11.0	3432	0.0717	0.5379
0.0968	12.0	3744	0.0706	0.4729
0.085	13.0	4056	0.0829	0.5271
0.085	14.0	4368	0.0776	0.5271
0.088	15.0	4680	0.0760	0.4729
0.088	16.0	4992	0.2058	0.5271
0.084	17.0	5304	0.0726	0.4910
0.0906	18.0	5616	0.0708	0.4729
0.0906	19.0	5928	0.0714	0.4729
0.0865	20.0	6240	0.0707	0.4729
0.0852	21.0	6552	0.0733	0.4729
0.0852	22.0	6864	0.0843	0.5271
0.0848	23.0	7176	0.0849	0.4729
0.0848	24.0	7488	0.0877	0.4729
0.0837	25.0	7800	0.0704	0.4729
0.0828	26.0	8112	0.0740	0.5271
0.0828	27.0	8424	0.0710	0.4729
0.0856	28.0	8736	0.0717	0.4729
0.0836	29.0	9048	0.0715	0.4729
0.0836	30.0	9360	0.0709	0.4657
0.0813	31.0	9672	0.0891	0.5271
0.0813	32.0	9984	0.0711	0.4874
0.0824	33.0	10296	0.0753	0.4729
0.0825	34.0	10608	0.0797	0.5271
0.0825	35.0	10920	0.0710	0.4729
0.0819	36.0	11232	0.0739	0.4729
0.0811	37.0	11544	0.0743	0.4729
0.0811	38.0	11856	0.0731	0.4729
0.0816	39.0	12168	0.0707	0.4693
0.0816	40.0	12480	0.0706	0.4729
0.0804	41.0	12792	0.0716	0.5451
0.0805	42.0	13104	0.0703	0.4729
0.0805	43.0	13416	0.0720	0.5271
0.0801	44.0	13728	0.0711	0.4729
0.08	45.0	14040	0.0716	0.5307
0.08	46.0	14352	0.0706	0.4729
0.0795	47.0	14664	0.0727	0.4729
0.0795	48.0	14976	0.0703	0.4729
0.0792	49.0	15288	0.0716	0.4729
0.0791	50.0	15600	0.0705	0.4729
0.0791	51.0	15912	0.0706	0.4729
0.0793	52.0	16224	0.0715	0.4729
0.0785	53.0	16536	0.0703	0.4729
0.0785	54.0	16848	0.0704	0.4729
0.0778	55.0	17160	0.0724	0.4729
0.0778	56.0	17472	0.0706	0.4729
0.0779	57.0	17784	0.0706	0.4729
0.0777	58.0	18096	0.0708	0.4729
0.0777	59.0	18408	0.0704	0.4729
0.0777	60.0	18720	0.0704	0.4729

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230823015034

20230823015034

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230823015034

Evaluation results