20230825070638

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3456
Accuracy: 0.7329

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.7894	0.5271
No log	2.0	312	0.6658	0.5379
No log	3.0	468	0.6408	0.5054
0.886	4.0	624	0.7134	0.4729
0.886	5.0	780	0.6234	0.5560
0.886	6.0	936	0.4782	0.6318
0.7765	7.0	1092	1.1394	0.5776
0.7765	8.0	1248	0.5214	0.6534
0.7765	9.0	1404	0.4206	0.6570
0.7206	10.0	1560	0.5019	0.6643
0.7206	11.0	1716	0.7680	0.5343
0.7206	12.0	1872	0.3433	0.7220
0.6543	13.0	2028	0.3834	0.7292
0.6543	14.0	2184	0.4588	0.6751
0.6543	15.0	2340	0.3413	0.7040
0.6543	16.0	2496	0.4874	0.6426
0.5973	17.0	2652	0.3283	0.7256
0.5973	18.0	2808	0.3605	0.7329
0.5973	19.0	2964	0.3314	0.7256
0.5433	20.0	3120	0.5998	0.6606
0.5433	21.0	3276	0.3489	0.6931
0.5433	22.0	3432	0.4316	0.6715
0.5373	23.0	3588	0.3328	0.7076
0.5373	24.0	3744	0.3379	0.7220
0.5373	25.0	3900	0.3580	0.7148
0.4923	26.0	4056	0.3141	0.7329
0.4923	27.0	4212	0.4341	0.7365
0.4923	28.0	4368	0.3386	0.7220
0.4513	29.0	4524	0.3038	0.7220
0.4513	30.0	4680	0.3775	0.7220
0.4513	31.0	4836	0.4197	0.7076
0.4513	32.0	4992	0.4666	0.7220
0.4041	33.0	5148	0.3355	0.7365
0.4041	34.0	5304	0.3147	0.7329
0.4041	35.0	5460	0.3810	0.7184
0.3705	36.0	5616	0.3184	0.7256
0.3705	37.0	5772	0.3668	0.7076
0.3705	38.0	5928	0.3859	0.7220
0.3556	39.0	6084	0.3010	0.7329
0.3556	40.0	6240	0.3201	0.7220
0.3556	41.0	6396	0.3304	0.7329
0.3089	42.0	6552	0.3634	0.7365
0.3089	43.0	6708	0.3844	0.7184
0.3089	44.0	6864	0.3320	0.7220
0.3015	45.0	7020	0.3696	0.7220
0.3015	46.0	7176	0.3665	0.7220
0.3015	47.0	7332	0.3355	0.7256
0.3015	48.0	7488	0.3568	0.7292
0.2709	49.0	7644	0.3450	0.7329
0.2709	50.0	7800	0.3790	0.7148
0.2709	51.0	7956	0.3516	0.7112
0.2681	52.0	8112	0.3741	0.7329
0.2681	53.0	8268	0.3615	0.7220
0.2681	54.0	8424	0.3479	0.7292
0.2477	55.0	8580	0.3401	0.7184
0.2477	56.0	8736	0.3766	0.7329
0.2477	57.0	8892	0.3562	0.7148
0.2344	58.0	9048	0.3412	0.7220
0.2344	59.0	9204	0.3782	0.7437
0.2344	60.0	9360	0.3723	0.7040
0.2126	61.0	9516	0.3852	0.7292
0.2126	62.0	9672	0.3901	0.7256
0.2126	63.0	9828	0.3698	0.7112
0.2126	64.0	9984	0.3249	0.7220
0.2127	65.0	10140	0.3979	0.7004
0.2127	66.0	10296	0.3705	0.7365
0.2127	67.0	10452	0.3317	0.7220
0.199	68.0	10608	0.3322	0.7329
0.199	69.0	10764	0.3706	0.7220
0.199	70.0	10920	0.3628	0.7148
0.1959	71.0	11076	0.3600	0.7437
0.1959	72.0	11232	0.3349	0.7437
0.1959	73.0	11388	0.3650	0.7184
0.184	74.0	11544	0.3337	0.7365
0.184	75.0	11700	0.3309	0.7329
0.184	76.0	11856	0.3237	0.7365
0.183	77.0	12012	0.3430	0.7256
0.183	78.0	12168	0.3567	0.7329
0.183	79.0	12324	0.3541	0.7329
0.183	80.0	12480	0.3456	0.7329

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230825070638

20230825070638

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230825070638

Evaluation results