20230822185221

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3289
Accuracy: 0.7329

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.002
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.5077	0.5307
0.4439	2.0	624	0.3971	0.4874
0.4439	3.0	936	0.3574	0.5379
0.4231	4.0	1248	0.3625	0.5776
0.4071	5.0	1560	0.4937	0.5343
0.4071	6.0	1872	0.3738	0.5668
0.3956	7.0	2184	0.4081	0.4729
0.3956	8.0	2496	0.3386	0.6209
0.3905	9.0	2808	0.4147	0.4729
0.3888	10.0	3120	0.3353	0.6354
0.3888	11.0	3432	0.3540	0.6282
0.3992	12.0	3744	0.3453	0.5848
0.372	13.0	4056	0.3265	0.6895
0.372	14.0	4368	0.3575	0.6426
0.3643	15.0	4680	0.3304	0.6498
0.3643	16.0	4992	0.3633	0.6715
0.3666	17.0	5304	0.5230	0.5343
0.3517	18.0	5616	0.3384	0.6462
0.3517	19.0	5928	0.3293	0.6823
0.3519	20.0	6240	0.3613	0.6823
0.338	21.0	6552	0.3242	0.7256
0.338	22.0	6864	0.3399	0.7184
0.3316	23.0	7176	0.3392	0.7004
0.3316	24.0	7488	0.3343	0.6534
0.3266	25.0	7800	0.3467	0.7112
0.3213	26.0	8112	0.3419	0.7040
0.3213	27.0	8424	0.3190	0.7112
0.3177	28.0	8736	0.3205	0.6931
0.3187	29.0	9048	0.3303	0.7076
0.3187	30.0	9360	0.3268	0.7148
0.3162	31.0	9672	0.3274	0.7148
0.3162	32.0	9984	0.3311	0.7112
0.3132	33.0	10296	0.3454	0.7148
0.3087	34.0	10608	0.3250	0.7076
0.3087	35.0	10920	0.3266	0.7076
0.3076	36.0	11232	0.3347	0.7292
0.3071	37.0	11544	0.3308	0.7112
0.3071	38.0	11856	0.3272	0.7220
0.3061	39.0	12168	0.3301	0.7148
0.3061	40.0	12480	0.3226	0.7256
0.3006	41.0	12792	0.3285	0.7365
0.3016	42.0	13104	0.3226	0.7148
0.3016	43.0	13416	0.3291	0.7220
0.2984	44.0	13728	0.3377	0.7112
0.2976	45.0	14040	0.3326	0.7220
0.2976	46.0	14352	0.3341	0.7292
0.2967	47.0	14664	0.3187	0.7184
0.2967	48.0	14976	0.3322	0.7148
0.2953	49.0	15288	0.3269	0.7365
0.2911	50.0	15600	0.3256	0.7365
0.2911	51.0	15912	0.3252	0.7256
0.2929	52.0	16224	0.3251	0.7292
0.2904	53.0	16536	0.3258	0.7256
0.2904	54.0	16848	0.3358	0.7220
0.2895	55.0	17160	0.3219	0.7329
0.2895	56.0	17472	0.3322	0.7329
0.2887	57.0	17784	0.3259	0.7365
0.2883	58.0	18096	0.3260	0.7292
0.2883	59.0	18408	0.3276	0.7365
0.2874	60.0	18720	0.3289	0.7329

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822185221

20230822185221

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822185221

Evaluation results