20230826083404

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5588
Accuracy: 0.56

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.05
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.6769	0.61
No log	2.0	50	0.5349	0.59
No log	3.0	75	0.6615	0.58
No log	4.0	100	0.6596	0.64
No log	5.0	125	0.5523	0.71
No log	6.0	150	0.8447	0.67
No log	7.0	175	0.7506	0.66
No log	8.0	200	0.8463	0.68
No log	9.0	225	0.9064	0.56
No log	10.0	250	0.5533	0.58
No log	11.0	275	0.5701	0.41
No log	12.0	300	0.5593	0.51
No log	13.0	325	0.5599	0.52
No log	14.0	350	0.5619	0.37
No log	15.0	375	0.5591	0.56
No log	16.0	400	0.5569	0.55
No log	17.0	425	0.5511	0.56
No log	18.0	450	0.5599	0.52
No log	19.0	475	0.5561	0.59
1.4827	20.0	500	0.5577	0.57
1.4827	21.0	525	0.5537	0.58
1.4827	22.0	550	0.5616	0.43
1.4827	23.0	575	0.5607	0.34
1.4827	24.0	600	0.5616	0.39
1.4827	25.0	625	0.5597	0.56
1.4827	26.0	650	0.5623	0.41
1.4827	27.0	675	0.5612	0.43
1.4827	28.0	700	0.5573	0.57
1.4827	29.0	725	0.5631	0.42
1.4827	30.0	750	0.5594	0.51
1.4827	31.0	775	0.5593	0.56
1.4827	32.0	800	0.5646	0.43
1.4827	33.0	825	0.5664	0.44
1.4827	34.0	850	0.5597	0.56
1.4827	35.0	875	0.5629	0.41
1.4827	36.0	900	0.5610	0.43
1.4827	37.0	925	0.5572	0.58
1.4827	38.0	950	0.5592	0.6
1.4827	39.0	975	0.5553	0.59
1.1505	40.0	1000	0.5597	0.58
1.1505	41.0	1025	0.5570	0.62
1.1505	42.0	1050	0.5582	0.6
1.1505	43.0	1075	0.5601	0.46
1.1505	44.0	1100	0.5598	0.55
1.1505	45.0	1125	0.5574	0.59
1.1505	46.0	1150	0.5591	0.52
1.1505	47.0	1175	0.5601	0.5
1.1505	48.0	1200	0.5593	0.56
1.1505	49.0	1225	0.5600	0.48
1.1505	50.0	1250	0.5620	0.39
1.1505	51.0	1275	0.5598	0.51
1.1505	52.0	1300	0.5616	0.39
1.1505	53.0	1325	0.5601	0.43
1.1505	54.0	1350	0.5617	0.4
1.1505	55.0	1375	0.5619	0.41
1.1505	56.0	1400	0.5625	0.39
1.1505	57.0	1425	0.5591	0.56
1.1505	58.0	1450	0.5588	0.59
1.1505	59.0	1475	0.5580	0.59
0.9071	60.0	1500	0.5584	0.62
0.9071	61.0	1525	0.5590	0.58
0.9071	62.0	1550	0.5585	0.57
0.9071	63.0	1575	0.5586	0.59
0.9071	64.0	1600	0.5589	0.57
0.9071	65.0	1625	0.5587	0.59
0.9071	66.0	1650	0.5588	0.61
0.9071	67.0	1675	0.5592	0.57
0.9071	68.0	1700	0.5579	0.58
0.9071	69.0	1725	0.5586	0.56
0.9071	70.0	1750	0.5590	0.57
0.9071	71.0	1775	0.5590	0.57
0.9071	72.0	1800	0.5590	0.59
0.9071	73.0	1825	0.5591	0.56
0.9071	74.0	1850	0.5586	0.56
0.9071	75.0	1875	0.5590	0.56
0.9071	76.0	1900	0.5592	0.57
0.9071	77.0	1925	0.5587	0.53
0.9071	78.0	1950	0.5588	0.56
0.9071	79.0	1975	0.5589	0.58
0.7248	80.0	2000	0.5588	0.56

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826083404

20230826083404

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826083404

Evaluation results