20230829231514

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5962
Accuracy: 0.6827

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.007
train_batch_size: 16
eval_batch_size: 8
seed: 44
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	35	1.8158	0.5865
No log	2.0	70	0.5893	0.625
No log	3.0	105	0.8945	0.5962
No log	4.0	140	0.5866	0.625
No log	5.0	175	0.9890	0.3846
No log	6.0	210	0.8076	0.5192
No log	7.0	245	0.6353	0.5288
No log	8.0	280	2.2871	0.3846
No log	9.0	315	0.7403	0.6346
No log	10.0	350	1.4011	0.4038
No log	11.0	385	1.1139	0.4038
No log	12.0	420	0.9394	0.6058
No log	13.0	455	0.6693	0.5865
No log	14.0	490	1.1625	0.4231
1.0588	15.0	525	0.6894	0.6346
1.0588	16.0	560	0.6938	0.3942
1.0588	17.0	595	0.6737	0.5
1.0588	18.0	630	0.7273	0.625
1.0588	19.0	665	0.6071	0.5385
1.0588	20.0	700	1.0395	0.5192
1.0588	21.0	735	0.6420	0.6058
1.0588	22.0	770	0.7194	0.6154
1.0588	23.0	805	1.3367	0.3942
1.0588	24.0	840	0.9467	0.4231
1.0588	25.0	875	0.6453	0.6058
1.0588	26.0	910	0.6247	0.6346
1.0588	27.0	945	0.6118	0.5577
1.0588	28.0	980	0.7381	0.4423
0.8818	29.0	1015	0.5847	0.6346
0.8818	30.0	1050	0.7924	0.3654
0.8818	31.0	1085	0.7978	0.4231
0.8818	32.0	1120	1.1682	0.3654
0.8818	33.0	1155	1.1758	0.6346
0.8818	34.0	1190	0.6784	0.6442
0.8818	35.0	1225	0.6660	0.4135
0.8818	36.0	1260	1.1904	0.3654
0.8818	37.0	1295	0.5965	0.6731
0.8818	38.0	1330	0.6026	0.6442
0.8818	39.0	1365	0.6658	0.6346
0.8818	40.0	1400	0.7463	0.3846
0.8818	41.0	1435	1.2989	0.3654
0.8818	42.0	1470	0.9206	0.3654
0.8069	43.0	1505	0.6119	0.6346
0.8069	44.0	1540	0.7291	0.4038
0.8069	45.0	1575	0.9749	0.3654
0.8069	46.0	1610	0.6391	0.4808
0.8069	47.0	1645	0.5934	0.6442
0.8069	48.0	1680	0.6020	0.6346
0.8069	49.0	1715	0.6096	0.6346
0.8069	50.0	1750	0.7630	0.3654
0.8069	51.0	1785	0.8983	0.3654
0.8069	52.0	1820	0.6252	0.5481
0.8069	53.0	1855	0.9840	0.3654
0.8069	54.0	1890	0.7640	0.3846
0.8069	55.0	1925	0.6074	0.6346
0.8069	56.0	1960	0.5978	0.6346
0.8069	57.0	1995	0.7187	0.375
0.7258	58.0	2030	0.6309	0.4423
0.7258	59.0	2065	0.6101	0.6442
0.7258	60.0	2100	0.6555	0.6346
0.7258	61.0	2135	0.6048	0.6346
0.7258	62.0	2170	0.6749	0.4038
0.7258	63.0	2205	0.6003	0.6538
0.7258	64.0	2240	0.6711	0.6346
0.7258	65.0	2275	0.5839	0.6346
0.7258	66.0	2310	0.5848	0.6346
0.7258	67.0	2345	0.6198	0.6346
0.7258	68.0	2380	0.6282	0.4904
0.7258	69.0	2415	0.5936	0.6346
0.7258	70.0	2450	0.5954	0.6346
0.7258	71.0	2485	0.5858	0.6346
0.6781	72.0	2520	0.6104	0.5769
0.6781	73.0	2555	0.6286	0.5192
0.6781	74.0	2590	0.6538	0.4231
0.6781	75.0	2625	0.6025	0.625
0.6781	76.0	2660	0.5940	0.6635
0.6781	77.0	2695	0.7307	0.3846
0.6781	78.0	2730	0.6168	0.5673
0.6781	79.0	2765	0.5995	0.6635
0.6781	80.0	2800	0.5962	0.6827

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230829231514

20230829231514

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230829231514

Evaluation results