20230825045636

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.4379
Accuracy: 0.7690

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	1.3576	0.5307
No log	2.0	312	0.9952	0.4693
No log	3.0	468	1.0581	0.4765
0.907	4.0	624	0.8017	0.5343
0.907	5.0	780	0.6566	0.5451
0.907	6.0	936	0.5420	0.6245
0.8287	7.0	1092	0.5092	0.6173
0.8287	8.0	1248	0.4948	0.6462
0.8287	9.0	1404	0.4754	0.6895
0.7327	10.0	1560	0.7416	0.6173
0.7327	11.0	1716	1.1722	0.4621
0.7327	12.0	1872	0.5543	0.6895
0.7276	13.0	2028	0.4895	0.6931
0.7276	14.0	2184	0.4304	0.7148
0.7276	15.0	2340	0.4261	0.7401
0.7276	16.0	2496	0.4467	0.6859
0.6207	17.0	2652	0.4700	0.7184
0.6207	18.0	2808	0.6254	0.6751
0.6207	19.0	2964	0.5108	0.7292
0.5699	20.0	3120	0.7519	0.6354
0.5699	21.0	3276	0.4584	0.7184
0.5699	22.0	3432	0.8289	0.6318
0.5829	23.0	3588	0.4071	0.7148
0.5829	24.0	3744	0.4575	0.7365
0.5829	25.0	3900	0.5062	0.6895
0.4913	26.0	4056	0.5308	0.7220
0.4913	27.0	4212	0.4907	0.7473
0.4913	28.0	4368	0.4703	0.7365
0.4679	29.0	4524	0.4244	0.7148
0.4679	30.0	4680	0.4450	0.7365
0.4679	31.0	4836	0.6184	0.6968
0.4679	32.0	4992	0.4378	0.7437
0.4377	33.0	5148	0.4118	0.7437
0.4377	34.0	5304	0.4272	0.7437
0.4377	35.0	5460	0.3998	0.7473
0.4076	36.0	5616	0.5180	0.7581
0.4076	37.0	5772	0.4967	0.7581
0.4076	38.0	5928	0.4595	0.7437
0.372	39.0	6084	0.5050	0.7329
0.372	40.0	6240	0.3900	0.7401
0.372	41.0	6396	0.4596	0.7545
0.3201	42.0	6552	0.4917	0.7690
0.3201	43.0	6708	0.4171	0.7870
0.3201	44.0	6864	0.4851	0.7256
0.3284	45.0	7020	0.4763	0.7401
0.3284	46.0	7176	0.4541	0.7581
0.3284	47.0	7332	0.4909	0.7509
0.3284	48.0	7488	0.5488	0.7329
0.2809	49.0	7644	0.5422	0.7473
0.2809	50.0	7800	0.4695	0.7653
0.2809	51.0	7956	0.5016	0.7581
0.275	52.0	8112	0.4627	0.7690
0.275	53.0	8268	0.4886	0.7401
0.275	54.0	8424	0.4425	0.7690
0.2456	55.0	8580	0.4289	0.7653
0.2456	56.0	8736	0.4891	0.7545
0.2456	57.0	8892	0.4477	0.7437
0.2328	58.0	9048	0.4510	0.7581
0.2328	59.0	9204	0.5283	0.7581
0.2328	60.0	9360	0.4405	0.7653
0.222	61.0	9516	0.5418	0.7509
0.222	62.0	9672	0.4933	0.7617
0.222	63.0	9828	0.4399	0.7653
0.222	64.0	9984	0.4490	0.7726
0.2174	65.0	10140	0.4820	0.7581
0.2174	66.0	10296	0.4732	0.7726
0.2174	67.0	10452	0.4712	0.7690
0.2075	68.0	10608	0.4847	0.7545
0.2075	69.0	10764	0.4704	0.7509
0.2075	70.0	10920	0.4855	0.7581
0.1987	71.0	11076	0.4845	0.7617
0.1987	72.0	11232	0.4724	0.7617
0.1987	73.0	11388	0.4272	0.7690
0.1845	74.0	11544	0.4324	0.7653
0.1845	75.0	11700	0.4343	0.7726
0.1845	76.0	11856	0.4407	0.7762
0.1835	77.0	12012	0.4185	0.7726
0.1835	78.0	12168	0.4363	0.7762
0.1835	79.0	12324	0.4328	0.7762
0.1835	80.0	12480	0.4379	0.7690

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230825045636

20230825045636

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230825045636

Evaluation results