20230824024310

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3005
Accuracy: 0.7509

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.6323	0.5307
0.5669	2.0	624	0.4749	0.5415
0.5669	3.0	936	0.4812	0.5271
0.5542	4.0	1248	0.3917	0.5704
0.5146	5.0	1560	0.4706	0.5523
0.5146	6.0	1872	0.4418	0.6173
0.464	7.0	2184	0.3863	0.6462
0.464	8.0	2496	0.3326	0.6751
0.4357	9.0	2808	0.3896	0.6065
0.4268	10.0	3120	0.3329	0.6823
0.4268	11.0	3432	0.4012	0.6679
0.4077	12.0	3744	0.3661	0.7112
0.3832	13.0	4056	0.3640	0.7112
0.3832	14.0	4368	0.3328	0.7040
0.3918	15.0	4680	0.3398	0.7076
0.3918	16.0	4992	0.6806	0.6282
0.3741	17.0	5304	0.4620	0.6498
0.3627	18.0	5616	0.3085	0.7473
0.3627	19.0	5928	0.3018	0.7256
0.3392	20.0	6240	0.3790	0.6534
0.3074	21.0	6552	0.2964	0.7401
0.3074	22.0	6864	0.3124	0.7401
0.3076	23.0	7176	0.3907	0.6931
0.3076	24.0	7488	0.3046	0.7329
0.2868	25.0	7800	0.3494	0.7365
0.2757	26.0	8112	0.3811	0.7148
0.2757	27.0	8424	0.3061	0.7509
0.2688	28.0	8736	0.2989	0.7401
0.2638	29.0	9048	0.3090	0.7365
0.2638	30.0	9360	0.3295	0.7365
0.2554	31.0	9672	0.3185	0.7401
0.2554	32.0	9984	0.2872	0.7401
0.2538	33.0	10296	0.3178	0.7509
0.2404	34.0	10608	0.2920	0.7473
0.2404	35.0	10920	0.3001	0.7329
0.2342	36.0	11232	0.3155	0.7437
0.2258	37.0	11544	0.3324	0.7437
0.2258	38.0	11856	0.3179	0.7437
0.2247	39.0	12168	0.3276	0.7509
0.2247	40.0	12480	0.2988	0.7401
0.2184	41.0	12792	0.2916	0.7329
0.215	42.0	13104	0.3033	0.7401
0.215	43.0	13416	0.3209	0.7473
0.2117	44.0	13728	0.2994	0.7473
0.2035	45.0	14040	0.3093	0.7473
0.2035	46.0	14352	0.2984	0.7365
0.203	47.0	14664	0.2866	0.7401
0.203	48.0	14976	0.3140	0.7473
0.2019	49.0	15288	0.3158	0.7509
0.1937	50.0	15600	0.2996	0.7545
0.1937	51.0	15912	0.2814	0.7473
0.1988	52.0	16224	0.3050	0.7437
0.1965	53.0	16536	0.3073	0.7473
0.1965	54.0	16848	0.2994	0.7509
0.1918	55.0	17160	0.2985	0.7509
0.1918	56.0	17472	0.3046	0.7509
0.1902	57.0	17784	0.2991	0.7473
0.1879	58.0	18096	0.2942	0.7509
0.1879	59.0	18408	0.2976	0.7509
0.194	60.0	18720	0.3005	0.7509

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824024310

20230824024310

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824024310

Evaluation results