20230826092050

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.4268
Accuracy: 0.37

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.05
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.5239	0.61
No log	2.0	50	0.4231	0.45
No log	3.0	75	0.4342	0.48
No log	4.0	100	0.4309	0.43
No log	5.0	125	0.4262	0.58
No log	6.0	150	0.4267	0.49
No log	7.0	175	0.4263	0.61
No log	8.0	200	0.4268	0.49
No log	9.0	225	0.4267	0.56
No log	10.0	250	0.4268	0.51
No log	11.0	275	0.4275	0.4
No log	12.0	300	0.4269	0.46
No log	13.0	325	0.4267	0.62
No log	14.0	350	0.4267	0.55
No log	15.0	375	0.4268	0.42
No log	16.0	400	0.4268	0.45
No log	17.0	425	0.4270	0.44
No log	18.0	450	0.4267	0.6
No log	19.0	475	0.4268	0.61
1.2569	20.0	500	0.4268	0.38
1.2569	21.0	525	0.4268	0.57
1.2569	22.0	550	0.4267	0.61
1.2569	23.0	575	0.4267	0.59
1.2569	24.0	600	0.4267	0.54
1.2569	25.0	625	0.4268	0.53
1.2569	26.0	650	0.4268	0.38
1.2569	27.0	675	0.4267	0.61
1.2569	28.0	700	0.4268	0.43
1.2569	29.0	725	0.4268	0.61
1.2569	30.0	750	0.4268	0.43
1.2569	31.0	775	0.4268	0.43
1.2569	32.0	800	0.4268	0.54
1.2569	33.0	825	0.4268	0.47
1.2569	34.0	850	0.4268	0.43
1.2569	35.0	875	0.4268	0.43
1.2569	36.0	900	0.4268	0.64
1.2569	37.0	925	0.4268	0.45
1.2569	38.0	950	0.4268	0.43
1.2569	39.0	975	0.4268	0.41
0.9505	40.0	1000	0.4267	0.58
0.9505	41.0	1025	0.4267	0.59
0.9505	42.0	1050	0.4268	0.56
0.9505	43.0	1075	0.4268	0.43
0.9505	44.0	1100	0.4268	0.49
0.9505	45.0	1125	0.4268	0.58
0.9505	46.0	1150	0.4267	0.59
0.9505	47.0	1175	0.4267	0.6
0.9505	48.0	1200	0.4267	0.63
0.9505	49.0	1225	0.4268	0.44
0.9505	50.0	1250	0.4268	0.52
0.9505	51.0	1275	0.4268	0.4
0.9505	52.0	1300	0.4268	0.46
0.9505	53.0	1325	0.4268	0.47
0.9505	54.0	1350	0.4268	0.51
0.9505	55.0	1375	0.4268	0.44
0.9505	56.0	1400	0.4268	0.55
0.9505	57.0	1425	0.4267	0.54
0.9505	58.0	1450	0.4267	0.55
0.9505	59.0	1475	0.4267	0.54
0.7437	60.0	1500	0.4267	0.58
0.7437	61.0	1525	0.4268	0.57
0.7437	62.0	1550	0.4268	0.42
0.7437	63.0	1575	0.4268	0.41
0.7437	64.0	1600	0.4268	0.44
0.7437	65.0	1625	0.4268	0.47
0.7437	66.0	1650	0.4268	0.41
0.7437	67.0	1675	0.4268	0.54
0.7437	68.0	1700	0.4268	0.4
0.7437	69.0	1725	0.4268	0.41
0.7437	70.0	1750	0.4268	0.4
0.7437	71.0	1775	0.4268	0.41
0.7437	72.0	1800	0.4268	0.42
0.7437	73.0	1825	0.4268	0.43
0.7437	74.0	1850	0.4268	0.41
0.7437	75.0	1875	0.4268	0.41
0.7437	76.0	1900	0.4268	0.4
0.7437	77.0	1925	0.4268	0.4
0.7437	78.0	1950	0.4268	0.41
0.7437	79.0	1975	0.4268	0.38
0.6146	80.0	2000	0.4268	0.37

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826092050

20230826092050

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826092050

Evaluation results