20230825183854

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3677
Accuracy: 0.7329

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.6538	0.5307
No log	2.0	312	0.6933	0.5162
No log	3.0	468	0.7141	0.4585
0.8733	4.0	624	0.6298	0.5343
0.8733	5.0	780	0.6732	0.5343
0.8733	6.0	936	0.5740	0.6137
0.8394	7.0	1092	0.7296	0.5632
0.8394	8.0	1248	0.8035	0.5668
0.8394	9.0	1404	0.6425	0.6209
0.7591	10.0	1560	0.4622	0.6643
0.7591	11.0	1716	0.4437	0.6859
0.7591	12.0	1872	0.4827	0.6787
0.6772	13.0	2028	0.5774	0.6715
0.6772	14.0	2184	0.4063	0.7112
0.6772	15.0	2340	0.5000	0.6498
0.6772	16.0	2496	0.4834	0.6570
0.6497	17.0	2652	0.5429	0.6931
0.6497	18.0	2808	0.4595	0.7148
0.6497	19.0	2964	0.3976	0.6787
0.6063	20.0	3120	0.3676	0.7004
0.6063	21.0	3276	0.4152	0.7329
0.6063	22.0	3432	0.4491	0.6643
0.5763	23.0	3588	0.4205	0.6968
0.5763	24.0	3744	0.3677	0.7112
0.5763	25.0	3900	0.4396	0.6606
0.5433	26.0	4056	0.3519	0.7292
0.5433	27.0	4212	0.4936	0.7329
0.5433	28.0	4368	0.5706	0.6209
0.5217	29.0	4524	0.5359	0.6643
0.5217	30.0	4680	0.3722	0.7256
0.5217	31.0	4836	0.4510	0.6498
0.5217	32.0	4992	0.4153	0.7076
0.4772	33.0	5148	0.4060	0.7292
0.4772	34.0	5304	0.4248	0.7112
0.4772	35.0	5460	0.3862	0.7184
0.46	36.0	5616	0.4376	0.6715
0.46	37.0	5772	0.4369	0.6751
0.46	38.0	5928	0.3735	0.7112
0.4145	39.0	6084	0.3600	0.7256
0.4145	40.0	6240	0.3753	0.7401
0.4145	41.0	6396	0.4377	0.7437
0.4086	42.0	6552	0.4095	0.7509
0.4086	43.0	6708	0.4555	0.7112
0.4086	44.0	6864	0.4092	0.7365
0.3716	45.0	7020	0.4073	0.6968
0.3716	46.0	7176	0.4190	0.7220
0.3716	47.0	7332	0.4445	0.7617
0.3716	48.0	7488	0.4113	0.7112
0.3526	49.0	7644	0.4075	0.7184
0.3526	50.0	7800	0.3924	0.7437
0.3526	51.0	7956	0.3993	0.7184
0.3175	52.0	8112	0.4196	0.7292
0.3175	53.0	8268	0.4894	0.6931
0.3175	54.0	8424	0.4043	0.7256
0.3204	55.0	8580	0.4841	0.6895
0.3204	56.0	8736	0.3880	0.7220
0.3204	57.0	8892	0.5248	0.7040
0.3093	58.0	9048	0.3957	0.7220
0.3093	59.0	9204	0.4407	0.7292
0.3093	60.0	9360	0.3696	0.7292
0.3068	61.0	9516	0.3891	0.7148
0.3068	62.0	9672	0.4251	0.7220
0.3068	63.0	9828	0.4027	0.7509
0.3068	64.0	9984	0.3926	0.7329
0.2853	65.0	10140	0.3853	0.7329
0.2853	66.0	10296	0.3718	0.7329
0.2853	67.0	10452	0.3739	0.7401
0.2705	68.0	10608	0.3705	0.7653
0.2705	69.0	10764	0.3788	0.7365
0.2705	70.0	10920	0.3832	0.7329
0.2643	71.0	11076	0.3846	0.7509
0.2643	72.0	11232	0.3731	0.7545
0.2643	73.0	11388	0.3909	0.7329
0.2604	74.0	11544	0.3711	0.7437
0.2604	75.0	11700	0.3693	0.7437
0.2604	76.0	11856	0.3797	0.7292
0.2573	77.0	12012	0.3761	0.7329
0.2573	78.0	12168	0.3799	0.7220
0.2573	79.0	12324	0.3657	0.7473
0.2573	80.0	12480	0.3677	0.7329

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230825183854

20230825183854

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230825183854

Evaluation results