20230826100510

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5641
Accuracy: 0.76

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.05
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.7263	0.4
No log	2.0	50	0.6115	0.6
No log	3.0	75	0.5427	0.62
No log	4.0	100	0.5319	0.61
No log	5.0	125	0.5818	0.55
No log	6.0	150	0.5093	0.68
No log	7.0	175	0.7841	0.63
No log	8.0	200	0.7629	0.68
No log	9.0	225	0.5874	0.69
No log	10.0	250	0.5228	0.71
No log	11.0	275	0.8439	0.74
No log	12.0	300	0.8243	0.71
No log	13.0	325	0.5670	0.65
No log	14.0	350	0.5601	0.61
No log	15.0	375	0.6452	0.64
No log	16.0	400	0.5239	0.69
No log	17.0	425	0.7315	0.66
No log	18.0	450	0.6651	0.67
No log	19.0	475	0.9040	0.72
1.3727	20.0	500	0.5786	0.73
1.3727	21.0	525	0.7333	0.69
1.3727	22.0	550	0.7584	0.7
1.3727	23.0	575	0.9901	0.71
1.3727	24.0	600	0.5711	0.7
1.3727	25.0	625	0.5870	0.67
1.3727	26.0	650	0.5832	0.7
1.3727	27.0	675	0.9777	0.72
1.3727	28.0	700	0.6448	0.71
1.3727	29.0	725	0.8739	0.71
1.3727	30.0	750	0.6710	0.68
1.3727	31.0	775	0.5919	0.71
1.3727	32.0	800	0.7616	0.7
1.3727	33.0	825	0.5837	0.72
1.3727	34.0	850	1.0103	0.74
1.3727	35.0	875	0.7008	0.73
1.3727	36.0	900	1.0161	0.72
1.3727	37.0	925	0.6911	0.75
1.3727	38.0	950	0.6451	0.75
1.3727	39.0	975	0.7190	0.74
0.7534	40.0	1000	0.5164	0.74
0.7534	41.0	1025	0.4995	0.72
0.7534	42.0	1050	0.5840	0.75
0.7534	43.0	1075	0.7395	0.75
0.7534	44.0	1100	0.6374	0.72
0.7534	45.0	1125	0.7467	0.73
0.7534	46.0	1150	0.6876	0.74
0.7534	47.0	1175	0.5959	0.74
0.7534	48.0	1200	0.5625	0.74
0.7534	49.0	1225	0.6837	0.75
0.7534	50.0	1250	0.6766	0.76
0.7534	51.0	1275	0.6266	0.75
0.7534	52.0	1300	0.6642	0.74
0.7534	53.0	1325	0.6202	0.74
0.7534	54.0	1350	0.6398	0.75
0.7534	55.0	1375	0.6689	0.75
0.7534	56.0	1400	0.6629	0.76
0.7534	57.0	1425	0.5903	0.76
0.7534	58.0	1450	0.6133	0.77
0.7534	59.0	1475	0.6885	0.76
0.4477	60.0	1500	0.5950	0.76
0.4477	61.0	1525	0.5715	0.75
0.4477	62.0	1550	0.6111	0.76
0.4477	63.0	1575	0.6023	0.76
0.4477	64.0	1600	0.5793	0.76
0.4477	65.0	1625	0.5727	0.74
0.4477	66.0	1650	0.5606	0.76
0.4477	67.0	1675	0.5970	0.76
0.4477	68.0	1700	0.5602	0.76
0.4477	69.0	1725	0.5781	0.75
0.4477	70.0	1750	0.6142	0.76
0.4477	71.0	1775	0.5758	0.76
0.4477	72.0	1800	0.5650	0.75
0.4477	73.0	1825	0.5823	0.76
0.4477	74.0	1850	0.5547	0.76
0.4477	75.0	1875	0.5637	0.76
0.4477	76.0	1900	0.5806	0.76
0.4477	77.0	1925	0.5602	0.76
0.4477	78.0	1950	0.5708	0.76
0.4477	79.0	1975	0.5624	0.76
0.3287	80.0	2000	0.5641	0.76

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826100510

20230826100510

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826100510

Evaluation results