20230823054903

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.0697
Accuracy: 0.5271

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.004
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.0788	0.5271
0.1715	2.0	624	0.2059	0.4729
0.1715	3.0	936	0.1134	0.4729
0.1222	4.0	1248	0.0831	0.5343
0.1187	5.0	1560	0.0732	0.5704
0.1187	6.0	1872	0.0796	0.4729
0.0991	7.0	2184	0.0745	0.4729
0.0991	8.0	2496	0.0716	0.4982
0.0819	9.0	2808	0.0705	0.4910
0.0807	10.0	3120	0.0702	0.4765
0.0807	11.0	3432	0.0713	0.4729
0.0802	12.0	3744	0.0703	0.4729
0.0795	13.0	4056	0.0784	0.4729
0.0795	14.0	4368	0.0706	0.5307
0.0794	15.0	4680	0.0730	0.4729
0.0794	16.0	4992	0.0706	0.4801
0.0806	17.0	5304	0.0711	0.5596
0.0811	18.0	5616	0.0704	0.4693
0.0811	19.0	5928	0.0701	0.4874
0.0798	20.0	6240	0.0719	0.6101
0.0793	21.0	6552	0.0705	0.4693
0.0793	22.0	6864	0.0707	0.5884
0.0795	23.0	7176	0.0712	0.4729
0.0795	24.0	7488	0.0705	0.4729
0.0796	25.0	7800	0.0789	0.5271
0.0796	26.0	8112	0.0705	0.4801
0.0796	27.0	8424	0.0703	0.4765
0.0787	28.0	8736	0.0703	0.4838
0.079	29.0	9048	0.0716	0.4729
0.079	30.0	9360	0.0739	0.5704
0.0788	31.0	9672	0.0749	0.5632
0.0788	32.0	9984	0.0711	0.4729
0.0789	33.0	10296	0.0705	0.4838
0.0786	34.0	10608	0.0700	0.5199
0.0786	35.0	10920	0.0699	0.4838
0.0785	36.0	11232	0.0715	0.4729
0.0784	37.0	11544	0.0716	0.6354
0.0784	38.0	11856	0.0719	0.4729
0.0781	39.0	12168	0.0700	0.5487
0.0781	40.0	12480	0.0700	0.5848
0.0778	41.0	12792	0.0704	0.6173
0.0778	42.0	13104	0.0705	0.5848
0.0778	43.0	13416	0.0705	0.6209
0.078	44.0	13728	0.0701	0.5199
0.0776	45.0	14040	0.0704	0.5957
0.0776	46.0	14352	0.0702	0.5848
0.0772	47.0	14664	0.0703	0.4765
0.0772	48.0	14976	0.0697	0.5379
0.0773	49.0	15288	0.0696	0.5596
0.0772	50.0	15600	0.0702	0.4765
0.0772	51.0	15912	0.0701	0.4801
0.0776	52.0	16224	0.0706	0.4729
0.0772	53.0	16536	0.0698	0.5054
0.0772	54.0	16848	0.0706	0.6318
0.0766	55.0	17160	0.0708	0.4765
0.0766	56.0	17472	0.0700	0.6209
0.0766	57.0	17784	0.0697	0.5307
0.0767	58.0	18096	0.0700	0.4801
0.0767	59.0	18408	0.0697	0.5235
0.0767	60.0	18720	0.0697	0.5271

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230823054903

20230823054903

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230823054903

Evaluation results