20230826100309

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.2920
Accuracy: 0.4

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.05
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.3608	0.44
No log	2.0	50	0.2890	0.57
No log	3.0	75	0.2961	0.58
No log	4.0	100	0.2865	0.65
No log	5.0	125	0.2901	0.58
No log	6.0	150	0.2933	0.46
No log	7.0	175	0.3291	0.64
No log	8.0	200	0.2864	0.62
No log	9.0	225	0.2979	0.42
No log	10.0	250	0.3035	0.63
No log	11.0	275	0.2902	0.59
No log	12.0	300	0.2917	0.5
No log	13.0	325	0.2935	0.44
No log	14.0	350	0.3057	0.44
No log	15.0	375	0.2980	0.45
No log	16.0	400	0.2947	0.47
No log	17.0	425	0.2945	0.5
No log	18.0	450	0.2924	0.49
No log	19.0	475	0.2922	0.55
1.1902	20.0	500	0.2923	0.45
1.1902	21.0	525	0.2864	0.55
1.1902	22.0	550	0.2925	0.42
1.1902	23.0	575	0.2910	0.58
1.1902	24.0	600	0.2895	0.58
1.1902	25.0	625	0.2918	0.62
1.1902	26.0	650	0.2921	0.42
1.1902	27.0	675	0.2918	0.58
1.1902	28.0	700	0.2910	0.6
1.1902	29.0	725	0.2919	0.57
1.1902	30.0	750	0.2920	0.48
1.1902	31.0	775	0.2922	0.41
1.1902	32.0	800	0.2920	0.53
1.1902	33.0	825	0.2920	0.51
1.1902	34.0	850	0.2919	0.54
1.1902	35.0	875	0.2920	0.52
1.1902	36.0	900	0.2921	0.39
1.1902	37.0	925	0.2920	0.53
1.1902	38.0	950	0.2920	0.49
1.1902	39.0	975	0.2922	0.4
0.8276	40.0	1000	0.2919	0.58
0.8276	41.0	1025	0.2918	0.62
0.8276	42.0	1050	0.2918	0.61
0.8276	43.0	1075	0.2922	0.42
0.8276	44.0	1100	0.2921	0.43
0.8276	45.0	1125	0.2920	0.42
0.8276	46.0	1150	0.2920	0.42
0.8276	47.0	1175	0.2920	0.35
0.8276	48.0	1200	0.2920	0.54
0.8276	49.0	1225	0.2920	0.6
0.8276	50.0	1250	0.2920	0.52
0.8276	51.0	1275	0.2920	0.37
0.8276	52.0	1300	0.2920	0.45
0.8276	53.0	1325	0.2920	0.44
0.8276	54.0	1350	0.2920	0.59
0.8276	55.0	1375	0.2920	0.44
0.8276	56.0	1400	0.2920	0.58
0.8276	57.0	1425	0.2920	0.57
0.8276	58.0	1450	0.2920	0.46
0.8276	59.0	1475	0.2920	0.42
0.6389	60.0	1500	0.2920	0.37
0.6389	61.0	1525	0.2919	0.6
0.6389	62.0	1550	0.2919	0.6
0.6389	63.0	1575	0.2920	0.55
0.6389	64.0	1600	0.2920	0.52
0.6389	65.0	1625	0.2920	0.5
0.6389	66.0	1650	0.2920	0.36
0.6389	67.0	1675	0.2920	0.58
0.6389	68.0	1700	0.2920	0.38
0.6389	69.0	1725	0.2920	0.58
0.6389	70.0	1750	0.2920	0.53
0.6389	71.0	1775	0.2920	0.37
0.6389	72.0	1800	0.2920	0.39
0.6389	73.0	1825	0.2920	0.36
0.6389	74.0	1850	0.2920	0.43
0.6389	75.0	1875	0.2920	0.38
0.6389	76.0	1900	0.2920	0.43
0.6389	77.0	1925	0.2920	0.37
0.6389	78.0	1950	0.2920	0.37
0.6389	79.0	1975	0.2920	0.38
0.5225	80.0	2000	0.2920	0.4

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826100309

20230826100309

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826100309

Evaluation results