20230826093525

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6263
Accuracy: 0.44

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.05
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.8357	0.4
No log	2.0	50	0.6364	0.62
No log	3.0	75	0.7513	0.62
No log	4.0	100	0.5950	0.6
No log	5.0	125	0.6111	0.49
No log	6.0	150	0.7314	0.59
No log	7.0	175	0.6188	0.67
No log	8.0	200	1.2028	0.58
No log	9.0	225	0.6303	0.71
No log	10.0	250	0.8705	0.65
No log	11.0	275	0.5481	0.68
No log	12.0	300	0.8700	0.7
No log	13.0	325	0.7616	0.62
No log	14.0	350	0.7385	0.71
No log	15.0	375	0.8501	0.55
No log	16.0	400	0.6954	0.49
No log	17.0	425	0.6255	0.55
No log	18.0	450	0.6264	0.38
No log	19.0	475	0.6275	0.42
1.5048	20.0	500	0.6259	0.61
1.5048	21.0	525	0.6270	0.42
1.5048	22.0	550	0.6275	0.42
1.5048	23.0	575	0.6249	0.59
1.5048	24.0	600	0.6269	0.4
1.5048	25.0	625	0.6254	0.57
1.5048	26.0	650	0.6265	0.45
1.5048	27.0	675	0.6262	0.62
1.5048	28.0	700	0.6247	0.54
1.5048	29.0	725	0.6241	0.59
1.5048	30.0	750	0.6247	0.56
1.5048	31.0	775	0.6262	0.5
1.5048	32.0	800	0.6261	0.6
1.5048	33.0	825	0.6261	0.55
1.5048	34.0	850	0.6264	0.44
1.5048	35.0	875	0.6266	0.43
1.5048	36.0	900	0.6265	0.44
1.5048	37.0	925	0.6262	0.47
1.5048	38.0	950	0.6264	0.48
1.5048	39.0	975	0.6264	0.43
1.2203	40.0	1000	0.6262	0.63
1.2203	41.0	1025	0.6263	0.53
1.2203	42.0	1050	0.6262	0.59
1.2203	43.0	1075	0.6265	0.38
1.2203	44.0	1100	0.6262	0.61
1.2203	45.0	1125	0.6262	0.64
1.2203	46.0	1150	0.6263	0.5
1.2203	47.0	1175	0.6262	0.6
1.2203	48.0	1200	0.6263	0.55
1.2203	49.0	1225	0.6265	0.39
1.2203	50.0	1250	0.6262	0.62
1.2203	51.0	1275	0.6262	0.51
1.2203	52.0	1300	0.6261	0.57
1.2203	53.0	1325	0.6262	0.58
1.2203	54.0	1350	0.6261	0.58
1.2203	55.0	1375	0.6260	0.61
1.2203	56.0	1400	0.6261	0.64
1.2203	57.0	1425	0.6263	0.41
1.2203	58.0	1450	0.6264	0.41
1.2203	59.0	1475	0.6263	0.45
0.9516	60.0	1500	0.6263	0.54
0.9516	61.0	1525	0.6263	0.47
0.9516	62.0	1550	0.6261	0.61
0.9516	63.0	1575	0.6263	0.59
0.9516	64.0	1600	0.6261	0.63
0.9516	65.0	1625	0.6263	0.5
0.9516	66.0	1650	0.6265	0.39
0.9516	67.0	1675	0.6262	0.59
0.9516	68.0	1700	0.6264	0.38
0.9516	69.0	1725	0.6262	0.59
0.9516	70.0	1750	0.6263	0.51
0.9516	71.0	1775	0.6261	0.6
0.9516	72.0	1800	0.6263	0.4
0.9516	73.0	1825	0.6262	0.6
0.9516	74.0	1850	0.6263	0.48
0.9516	75.0	1875	0.6262	0.62
0.9516	76.0	1900	0.6263	0.44
0.9516	77.0	1925	0.6263	0.43
0.9516	78.0	1950	0.6263	0.45
0.9516	79.0	1975	0.6263	0.42
0.7734	80.0	2000	0.6263	0.44

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826093525

20230826093525

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for dkqjrm/20230826093525

Dataset used to train dkqjrm/20230826093525

Evaluation results