20230826105341

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.4258
Accuracy: 0.4

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.05
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.4625	0.45
No log	2.0	50	0.4859	0.61
No log	3.0	75	0.4227	0.61
No log	4.0	100	0.4247	0.53
No log	5.0	125	0.4481	0.43
No log	6.0	150	0.4310	0.57
No log	7.0	175	0.4267	0.47
No log	8.0	200	0.4246	0.5
No log	9.0	225	0.4267	0.44
No log	10.0	250	0.4260	0.51
No log	11.0	275	0.4226	0.52
No log	12.0	300	0.4271	0.44
No log	13.0	325	0.4266	0.49
No log	14.0	350	0.4244	0.58
No log	15.0	375	0.4253	0.55
No log	16.0	400	0.4256	0.51
No log	17.0	425	0.4265	0.44
No log	18.0	450	0.4261	0.42
No log	19.0	475	0.4262	0.46
1.4009	20.0	500	0.4260	0.47
1.4009	21.0	525	0.4285	0.42
1.4009	22.0	550	0.4260	0.5
1.4009	23.0	575	0.4245	0.54
1.4009	24.0	600	0.4251	0.54
1.4009	25.0	625	0.4271	0.46
1.4009	26.0	650	0.4261	0.46
1.4009	27.0	675	0.4257	0.49
1.4009	28.0	700	0.4255	0.55
1.4009	29.0	725	0.4254	0.52
1.4009	30.0	750	0.4260	0.52
1.4009	31.0	775	0.4256	0.49
1.4009	32.0	800	0.4257	0.55
1.4009	33.0	825	0.4255	0.53
1.4009	34.0	850	0.4256	0.54
1.4009	35.0	875	0.4262	0.44
1.4009	36.0	900	0.4257	0.51
1.4009	37.0	925	0.4267	0.4
1.4009	38.0	950	0.4259	0.48
1.4009	39.0	975	0.4255	0.55
0.9833	40.0	1000	0.4254	0.49
0.9833	41.0	1025	0.4257	0.49
0.9833	42.0	1050	0.4254	0.58
0.9833	43.0	1075	0.4261	0.48
0.9833	44.0	1100	0.4260	0.5
0.9833	45.0	1125	0.4257	0.51
0.9833	46.0	1150	0.4254	0.52
0.9833	47.0	1175	0.4255	0.5
0.9833	48.0	1200	0.4257	0.48
0.9833	49.0	1225	0.4261	0.41
0.9833	50.0	1250	0.4251	0.57
0.9833	51.0	1275	0.4258	0.47
0.9833	52.0	1300	0.4255	0.52
0.9833	53.0	1325	0.4257	0.53
0.9833	54.0	1350	0.4256	0.52
0.9833	55.0	1375	0.4257	0.51
0.9833	56.0	1400	0.4257	0.5
0.9833	57.0	1425	0.4257	0.49
0.9833	58.0	1450	0.4257	0.51
0.9833	59.0	1475	0.4255	0.57
0.7428	60.0	1500	0.4259	0.46
0.7428	61.0	1525	0.4257	0.51
0.7428	62.0	1550	0.4255	0.55
0.7428	63.0	1575	0.4256	0.55
0.7428	64.0	1600	0.4258	0.4
0.7428	65.0	1625	0.4258	0.44
0.7428	66.0	1650	0.4259	0.41
0.7428	67.0	1675	0.4260	0.38
0.7428	68.0	1700	0.4257	0.52
0.7428	69.0	1725	0.4259	0.35
0.7428	70.0	1750	0.4259	0.38
0.7428	71.0	1775	0.4259	0.44
0.7428	72.0	1800	0.4260	0.41
0.7428	73.0	1825	0.4257	0.45
0.7428	74.0	1850	0.4258	0.42
0.7428	75.0	1875	0.4258	0.41
0.7428	76.0	1900	0.4258	0.4
0.7428	77.0	1925	0.4258	0.45
0.7428	78.0	1950	0.4258	0.43
0.7428	79.0	1975	0.4258	0.44
0.6138	80.0	2000	0.4258	0.4

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826105341

20230826105341

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826105341

Evaluation results