20230824064723

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6742
Accuracy: 0.7076

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	1.0968	0.5307
0.8903	2.0	624	0.9977	0.4729
0.8903	3.0	936	0.6500	0.5415
0.813	4.0	1248	0.8148	0.4729
0.7606	5.0	1560	0.6263	0.5993
0.7606	6.0	1872	0.7920	0.6245
0.7342	7.0	2184	1.2811	0.5884
0.7342	8.0	2496	0.5840	0.6462
0.6906	9.0	2808	0.5715	0.6751
0.6551	10.0	3120	0.5806	0.6859
0.6551	11.0	3432	0.5498	0.6823
0.6197	12.0	3744	0.6886	0.6968
0.5972	13.0	4056	1.1724	0.4477
0.5972	14.0	4368	0.6682	0.6101
0.7875	15.0	4680	0.6779	0.5560
0.7875	16.0	4992	0.9667	0.6354
0.6467	17.0	5304	0.9092	0.6606
0.5892	18.0	5616	0.6701	0.4621
0.5892	19.0	5928	0.6021	0.6643
0.6056	20.0	6240	0.8808	0.6787
0.5409	21.0	6552	0.5458	0.6751
0.5409	22.0	6864	0.5723	0.6859
0.5387	23.0	7176	0.9638	0.6679
0.5387	24.0	7488	0.7176	0.6968
0.511	25.0	7800	0.6557	0.6895
0.4744	26.0	8112	0.5338	0.7148
0.4744	27.0	8424	0.5646	0.7076
0.4743	28.0	8736	0.5423	0.7040
0.4598	29.0	9048	0.6324	0.7076
0.4598	30.0	9360	0.7069	0.7004
0.4485	31.0	9672	0.6809	0.6859
0.4485	32.0	9984	0.5675	0.7076
0.442	33.0	10296	0.8006	0.6895
0.4141	34.0	10608	0.5902	0.7112
0.4141	35.0	10920	0.6252	0.7148
0.4054	36.0	11232	0.8398	0.7112
0.3819	37.0	11544	0.7482	0.7004
0.3819	38.0	11856	0.6538	0.7112
0.3825	39.0	12168	0.7720	0.6968
0.3825	40.0	12480	0.6094	0.6931
0.379	41.0	12792	0.5863	0.7040
0.3701	42.0	13104	0.6197	0.7040
0.3701	43.0	13416	0.5795	0.7112
0.3576	44.0	13728	0.6484	0.7076
0.3454	45.0	14040	0.6623	0.6968
0.3454	46.0	14352	0.6562	0.7220
0.3455	47.0	14664	0.5921	0.7184
0.3455	48.0	14976	0.6980	0.7112
0.3344	49.0	15288	0.6210	0.7004
0.3285	50.0	15600	0.5674	0.7184
0.3285	51.0	15912	0.6134	0.7040
0.3295	52.0	16224	0.7118	0.7148
0.3181	53.0	16536	0.6978	0.7040
0.3181	54.0	16848	0.6851	0.7112
0.3021	55.0	17160	0.7702	0.7040
0.3021	56.0	17472	0.7319	0.7040
0.3044	57.0	17784	0.6459	0.7076
0.2938	58.0	18096	0.6386	0.7076
0.2938	59.0	18408	0.6550	0.7004
0.2991	60.0	18720	0.6742	0.7076

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824064723

20230824064723

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824064723

Evaluation results