20230823034647

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.0699
Accuracy: 0.5162

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.0734	0.4729
0.1062	2.0	624	0.0755	0.5235
0.1062	3.0	936	0.0930	0.4729
0.0892	4.0	1248	0.0712	0.5307
0.0836	5.0	1560	0.0838	0.5307
0.0836	6.0	1872	0.0706	0.4801
0.0839	7.0	2184	0.0738	0.4729
0.0839	8.0	2496	0.0972	0.5271
0.0838	9.0	2808	0.0804	0.5415
0.0842	10.0	3120	0.0705	0.5199
0.0842	11.0	3432	0.0706	0.5848
0.0832	12.0	3744	0.0776	0.4729
0.0838	13.0	4056	0.0972	0.4729
0.0838	14.0	4368	0.0705	0.4838
0.0824	15.0	4680	0.0725	0.4693
0.0824	16.0	4992	0.0711	0.5884
0.0815	17.0	5304	0.0702	0.4729
0.0827	18.0	5616	0.0707	0.5921
0.0827	19.0	5928	0.0865	0.5307
0.0821	20.0	6240	0.0702	0.5235
0.0817	21.0	6552	0.0822	0.4729
0.0817	22.0	6864	0.0753	0.5632
0.0822	23.0	7176	0.0717	0.4729
0.0822	24.0	7488	0.0702	0.4765
0.0812	25.0	7800	0.0769	0.5307
0.0795	26.0	8112	0.0768	0.4729
0.0795	27.0	8424	0.0852	0.4729
0.0802	28.0	8736	0.0718	0.4729
0.0792	29.0	9048	0.0725	0.4729
0.0792	30.0	9360	0.0706	0.5668
0.0794	31.0	9672	0.0720	0.5812
0.0794	32.0	9984	0.0712	0.4801
0.0791	33.0	10296	0.0711	0.4801
0.0782	34.0	10608	0.0703	0.5054
0.0782	35.0	10920	0.0708	0.4838
0.0778	36.0	11232	0.0716	0.4729
0.0777	37.0	11544	0.0711	0.6570
0.0777	38.0	11856	0.0731	0.4729
0.0775	39.0	12168	0.0714	0.4729
0.0775	40.0	12480	0.0710	0.6282
0.0772	41.0	12792	0.0701	0.4765
0.0773	42.0	13104	0.0701	0.5307
0.0773	43.0	13416	0.0707	0.5668
0.0772	44.0	13728	0.0705	0.5848
0.0773	45.0	14040	0.0701	0.5235
0.0773	46.0	14352	0.0699	0.5090
0.0769	47.0	14664	0.0705	0.4765
0.0769	48.0	14976	0.0699	0.5451
0.0768	49.0	15288	0.0701	0.5668
0.0769	50.0	15600	0.0701	0.4765
0.0769	51.0	15912	0.0699	0.5271
0.0774	52.0	16224	0.0700	0.4729
0.0768	53.0	16536	0.0700	0.5126
0.0768	54.0	16848	0.0702	0.5957
0.0765	55.0	17160	0.0706	0.4729
0.0765	56.0	17472	0.0700	0.5379
0.0766	57.0	17784	0.0700	0.5343
0.0767	58.0	18096	0.0701	0.4838
0.0767	59.0	18408	0.0699	0.5054
0.0766	60.0	18720	0.0699	0.5162

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230823034647

20230823034647

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230823034647

Evaluation results