20230824210912

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 1.0575
Accuracy: 0.7401

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.8273	0.5307
No log	2.0	312	1.1309	0.4729
No log	3.0	468	0.8140	0.4765
0.9525	4.0	624	0.6978	0.5776
0.9525	5.0	780	0.6845	0.5704
0.9525	6.0	936	0.6365	0.6282
0.8192	7.0	1092	0.8362	0.6354
0.8192	8.0	1248	0.5976	0.6859
0.8192	9.0	1404	0.6788	0.6751
0.7543	10.0	1560	0.6672	0.6606
0.7543	11.0	1716	0.6932	0.5776
0.7543	12.0	1872	0.6756	0.6895
0.6718	13.0	2028	0.6336	0.7292
0.6718	14.0	2184	0.6149	0.7256
0.6718	15.0	2340	0.7579	0.6570
0.6718	16.0	2496	0.8701	0.6137
0.6043	17.0	2652	0.5931	0.7256
0.6043	18.0	2808	0.5982	0.7256
0.6043	19.0	2964	0.6829	0.7148
0.5842	20.0	3120	1.3393	0.6354
0.5842	21.0	3276	0.7701	0.6823
0.5842	22.0	3432	0.7801	0.6679
0.5907	23.0	3588	0.6225	0.7401
0.5907	24.0	3744	0.7348	0.7292
0.5907	25.0	3900	0.7832	0.6859
0.5013	26.0	4056	0.5946	0.7329
0.5013	27.0	4212	0.6441	0.7365
0.5013	28.0	4368	0.6992	0.7112
0.4569	29.0	4524	0.8007	0.7329
0.4569	30.0	4680	1.1460	0.6643
0.4569	31.0	4836	1.1331	0.6606
0.4569	32.0	4992	0.7750	0.7220
0.4256	33.0	5148	0.8709	0.7256
0.4256	34.0	5304	0.8764	0.7184
0.4256	35.0	5460	0.8154	0.7256
0.3773	36.0	5616	0.8308	0.7329
0.3773	37.0	5772	0.8417	0.7184
0.3773	38.0	5928	1.1260	0.7401
0.3676	39.0	6084	0.8739	0.7401
0.3676	40.0	6240	0.7295	0.7509
0.3676	41.0	6396	1.0227	0.7220
0.3122	42.0	6552	1.2354	0.7184
0.3122	43.0	6708	0.9760	0.7401
0.3122	44.0	6864	0.8684	0.7329
0.3011	45.0	7020	0.9423	0.7545
0.3011	46.0	7176	1.0446	0.7401
0.3011	47.0	7332	1.2442	0.7256
0.3011	48.0	7488	0.8938	0.7292
0.2606	49.0	7644	1.0857	0.7220
0.2606	50.0	7800	1.1683	0.7148
0.2606	51.0	7956	0.9944	0.7220
0.2496	52.0	8112	0.9914	0.7401
0.2496	53.0	8268	1.0398	0.7365
0.2496	54.0	8424	1.2414	0.7256
0.2293	55.0	8580	1.0096	0.7220
0.2293	56.0	8736	0.9548	0.7365
0.2293	57.0	8892	1.2170	0.7220
0.2182	58.0	9048	1.1249	0.7220
0.2182	59.0	9204	1.1084	0.7292
0.2182	60.0	9360	1.0558	0.7292
0.2111	61.0	9516	1.1070	0.7292
0.2111	62.0	9672	1.1918	0.7473
0.2111	63.0	9828	1.1819	0.7220
0.2111	64.0	9984	1.1041	0.7437
0.2024	65.0	10140	1.2129	0.7184
0.2024	66.0	10296	1.0185	0.7437
0.2024	67.0	10452	0.9763	0.7437
0.1901	68.0	10608	1.0053	0.7292
0.1901	69.0	10764	1.1605	0.7292
0.1901	70.0	10920	1.3683	0.7220
0.1843	71.0	11076	1.0427	0.7365
0.1843	72.0	11232	1.1283	0.7437
0.1843	73.0	11388	1.0405	0.7473
0.1715	74.0	11544	0.9890	0.7509
0.1715	75.0	11700	1.2353	0.7329
0.1715	76.0	11856	1.0175	0.7365
0.1698	77.0	12012	1.0641	0.7365
0.1698	78.0	12168	1.0655	0.7292
0.1698	79.0	12324	1.0779	0.7329
0.1698	80.0	12480	1.0575	0.7401

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824210912

20230824210912

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824210912

Evaluation results