20230823213602

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6021
Accuracy: 0.7076

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.8868	0.4693
0.7276	2.0	624	0.8623	0.4729
0.7276	3.0	936	0.6658	0.5126
0.7221	4.0	1248	0.7170	0.4838
0.6968	5.0	1560	0.6691	0.5668
0.6968	6.0	1872	0.6427	0.5740
0.6561	7.0	2184	0.8715	0.5379
0.6561	8.0	2496	0.6609	0.5668
0.6562	9.0	2808	0.6147	0.5993
0.6578	10.0	3120	0.6103	0.6065
0.6578	11.0	3432	0.7649	0.4838
0.6252	12.0	3744	0.5990	0.6426
0.6084	13.0	4056	0.5962	0.6462
0.6084	14.0	4368	0.5738	0.6679
0.5841	15.0	4680	0.6292	0.6534
0.5841	16.0	4992	0.7218	0.6354
0.5715	17.0	5304	0.5832	0.6643
0.5619	18.0	5616	0.5680	0.6787
0.5619	19.0	5928	0.7152	0.5957
0.5641	20.0	6240	0.7627	0.6462
0.5432	21.0	6552	0.5672	0.6895
0.5432	22.0	6864	0.6023	0.6787
0.5586	23.0	7176	0.6581	0.6859
0.5586	24.0	7488	0.5614	0.6895
0.5254	25.0	7800	0.7315	0.6679
0.5267	26.0	8112	0.5316	0.7076
0.5267	27.0	8424	0.5391	0.7004
0.5189	28.0	8736	0.5935	0.7040
0.5172	29.0	9048	0.5977	0.7076
0.5172	30.0	9360	0.5918	0.7148
0.5069	31.0	9672	0.7130	0.6751
0.5069	32.0	9984	0.6718	0.6931
0.4976	33.0	10296	0.5982	0.7112
0.4895	34.0	10608	0.5927	0.7076
0.4895	35.0	10920	0.5583	0.7148
0.4916	36.0	11232	0.5706	0.7076
0.4867	37.0	11544	0.6064	0.7112
0.4867	38.0	11856	0.5939	0.7040
0.4914	39.0	12168	0.6528	0.7112
0.4914	40.0	12480	0.5773	0.7148
0.4733	41.0	12792	0.5853	0.7148
0.4796	42.0	13104	0.5876	0.7329
0.4796	43.0	13416	0.6521	0.7112
0.4706	44.0	13728	0.6386	0.7004
0.4655	45.0	14040	0.5846	0.7401
0.4655	46.0	14352	0.6645	0.7004
0.4654	47.0	14664	0.5831	0.7292
0.4654	48.0	14976	0.6665	0.7040
0.4567	49.0	15288	0.5760	0.7220
0.4563	50.0	15600	0.5796	0.7292
0.4563	51.0	15912	0.5656	0.7256
0.4471	52.0	16224	0.5585	0.7329
0.4484	53.0	16536	0.6286	0.7076
0.4484	54.0	16848	0.6116	0.7040
0.4424	55.0	17160	0.5852	0.7220
0.4424	56.0	17472	0.6008	0.7040
0.4439	57.0	17784	0.5777	0.7292
0.442	58.0	18096	0.5915	0.7184
0.442	59.0	18408	0.5930	0.7148
0.4411	60.0	18720	0.6021	0.7076

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230823213602

20230823213602

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230823213602

Evaluation results