20230826161130

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1582
Accuracy: 0.39

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.02
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.6392	0.43
No log	2.0	50	0.1729	0.41
No log	3.0	75	0.1658	0.61
No log	4.0	100	0.1579	0.57
No log	5.0	125	0.1678	0.4
No log	6.0	150	0.1583	0.55
No log	7.0	175	0.1650	0.6
No log	8.0	200	0.1643	0.62
No log	9.0	225	0.1594	0.48
No log	10.0	250	0.1572	0.61
No log	11.0	275	0.1660	0.4
No log	12.0	300	0.1570	0.63
No log	13.0	325	0.1589	0.51
No log	14.0	350	0.1581	0.42
No log	15.0	375	0.1582	0.5
No log	16.0	400	0.1576	0.53
No log	17.0	425	0.1580	0.52
No log	18.0	450	0.1581	0.55
No log	19.0	475	0.1583	0.45
0.621	20.0	500	0.1606	0.52
0.621	21.0	525	0.1583	0.52
0.621	22.0	550	0.1573	0.49
0.621	23.0	575	0.1582	0.43
0.621	24.0	600	0.1581	0.53
0.621	25.0	625	0.1582	0.49
0.621	26.0	650	0.1582	0.5
0.621	27.0	675	0.1583	0.53
0.621	28.0	700	0.1586	0.47
0.621	29.0	725	0.1585	0.48
0.621	30.0	750	0.1584	0.46
0.621	31.0	775	0.1582	0.55
0.621	32.0	800	0.1582	0.53
0.621	33.0	825	0.1583	0.51
0.621	34.0	850	0.1585	0.39
0.621	35.0	875	0.1582	0.69
0.621	36.0	900	0.1583	0.48
0.621	37.0	925	0.1582	0.61
0.621	38.0	950	0.1580	0.63
0.621	39.0	975	0.1581	0.47
0.4969	40.0	1000	0.1582	0.49
0.4969	41.0	1025	0.1583	0.49
0.4969	42.0	1050	0.1583	0.47
0.4969	43.0	1075	0.1581	0.52
0.4969	44.0	1100	0.1584	0.47
0.4969	45.0	1125	0.1584	0.35
0.4969	46.0	1150	0.1582	0.56
0.4969	47.0	1175	0.1582	0.54
0.4969	48.0	1200	0.1582	0.53
0.4969	49.0	1225	0.1582	0.56
0.4969	50.0	1250	0.1582	0.54
0.4969	51.0	1275	0.1582	0.57
0.4969	52.0	1300	0.1582	0.52
0.4969	53.0	1325	0.1581	0.59
0.4969	54.0	1350	0.1582	0.55
0.4969	55.0	1375	0.1585	0.41
0.4969	56.0	1400	0.1584	0.45
0.4969	57.0	1425	0.1583	0.54
0.4969	58.0	1450	0.1583	0.41
0.4969	59.0	1475	0.1583	0.42
0.4428	60.0	1500	0.1583	0.4
0.4428	61.0	1525	0.1583	0.59
0.4428	62.0	1550	0.1582	0.65
0.4428	63.0	1575	0.1581	0.64
0.4428	64.0	1600	0.1581	0.59
0.4428	65.0	1625	0.1583	0.42
0.4428	66.0	1650	0.1582	0.5
0.4428	67.0	1675	0.1583	0.43
0.4428	68.0	1700	0.1584	0.39
0.4428	69.0	1725	0.1583	0.5
0.4428	70.0	1750	0.1583	0.49
0.4428	71.0	1775	0.1583	0.48
0.4428	72.0	1800	0.1584	0.29
0.4428	73.0	1825	0.1583	0.4
0.4428	74.0	1850	0.1582	0.59
0.4428	75.0	1875	0.1582	0.59
0.4428	76.0	1900	0.1582	0.53
0.4428	77.0	1925	0.1583	0.33
0.4428	78.0	1950	0.1583	0.35
0.4428	79.0	1975	0.1583	0.36
0.4082	80.0	2000	0.1582	0.39

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826161130

20230826161130

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826161130

Evaluation results