20230822105337

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3531
Accuracy: 0.5271

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.05
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.3571	0.4729
6.8951	2.0	624	1.4465	0.5271
6.8951	3.0	936	1.4737	0.4729
4.4457	4.0	1248	0.4591	0.5271
3.2957	5.0	1560	0.4022	0.4729
3.2957	6.0	1872	1.2355	0.4729
3.7646	7.0	2184	9.3766	0.4729
3.7646	8.0	2496	0.3764	0.5271
3.3825	9.0	2808	4.6165	0.5271
2.7848	10.0	3120	3.2620	0.5271
2.7848	11.0	3432	2.3010	0.5271
2.3837	12.0	3744	0.3484	0.5271
2.1666	13.0	4056	0.4398	0.5271
2.1666	14.0	4368	1.4703	0.4729
2.107	15.0	4680	1.0550	0.5271
2.107	16.0	4992	1.0008	0.4729
2.161	17.0	5304	0.7810	0.4729
1.927	18.0	5616	0.8418	0.4729
1.927	19.0	5928	0.5166	0.4729
1.8072	20.0	6240	0.3493	0.5271
1.7187	21.0	6552	1.4221	0.5271
1.7187	22.0	6864	2.9356	0.5271
2.1333	23.0	7176	0.8474	0.4729
2.1333	24.0	7488	5.1220	0.4729
2.0017	25.0	7800	0.3589	0.4729
1.6518	26.0	8112	0.3996	0.4729
1.6518	27.0	8424	0.5351	0.5271
1.5012	28.0	8736	0.3479	0.5271
1.4194	29.0	9048	0.3492	0.5271
1.4194	30.0	9360	0.6942	0.5271
1.3048	31.0	9672	0.5089	0.5271
1.3048	32.0	9984	1.1509	0.5271
1.2972	33.0	10296	1.1207	0.4729
1.1774	34.0	10608	1.4443	0.4729
1.1774	35.0	10920	2.3753	0.4729
1.492	36.0	11232	0.3622	0.4729
1.3617	37.0	11544	1.3564	0.5271
1.3617	38.0	11856	0.6944	0.5271
1.4582	39.0	12168	0.5510	0.4729
1.4582	40.0	12480	0.3660	0.5271
1.0904	41.0	12792	0.3480	0.5271
0.9409	42.0	13104	0.4835	0.5271
0.9409	43.0	13416	0.6226	0.4729
0.9404	44.0	13728	0.4021	0.4729
0.8008	45.0	14040	0.5381	0.5271
0.8008	46.0	14352	0.3887	0.4729
0.841	47.0	14664	0.3763	0.5271
0.841	48.0	14976	0.3667	0.5271
0.6912	49.0	15288	0.4490	0.4729
0.6381	50.0	15600	0.7097	0.5271
0.6381	51.0	15912	0.3639	0.4729
0.5792	52.0	16224	0.3798	0.5271
0.53	53.0	16536	0.3854	0.4729
0.53	54.0	16848	0.3884	0.4729
0.4977	55.0	17160	0.3898	0.4729
0.4977	56.0	17472	0.3480	0.5271
0.4596	57.0	17784	0.3542	0.4729
0.4228	58.0	18096	0.3539	0.5271
0.4228	59.0	18408	0.3499	0.5271
0.3933	60.0	18720	0.3531	0.5271

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822105337

20230822105337

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822105337

Evaluation results