20230826054840

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.4136
Accuracy: 0.71

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.02
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.6183	0.53
No log	2.0	50	0.4189	0.62
No log	3.0	75	0.4351	0.6
No log	4.0	100	0.4181	0.6
No log	5.0	125	0.4105	0.62
No log	6.0	150	0.4140	0.63
No log	7.0	175	0.4052	0.66
No log	8.0	200	0.4322	0.66
No log	9.0	225	0.4364	0.41
No log	10.0	250	0.4247	0.55
No log	11.0	275	0.4261	0.53
No log	12.0	300	0.4176	0.6
No log	13.0	325	0.4108	0.58
No log	14.0	350	0.4305	0.51
No log	15.0	375	0.4064	0.61
No log	16.0	400	0.4032	0.59
No log	17.0	425	0.4098	0.63
No log	18.0	450	0.4132	0.61
No log	19.0	475	0.3925	0.65
0.7171	20.0	500	0.3957	0.69
0.7171	21.0	525	0.4292	0.64
0.7171	22.0	550	0.4025	0.63
0.7171	23.0	575	0.3997	0.69
0.7171	24.0	600	0.4115	0.62
0.7171	25.0	625	0.4044	0.67
0.7171	26.0	650	0.4098	0.69
0.7171	27.0	675	0.4051	0.65
0.7171	28.0	700	0.4244	0.72
0.7171	29.0	725	0.4032	0.64
0.7171	30.0	750	0.4136	0.7
0.7171	31.0	775	0.3993	0.68
0.7171	32.0	800	0.4170	0.72
0.7171	33.0	825	0.4038	0.71
0.7171	34.0	850	0.4251	0.72
0.7171	35.0	875	0.4079	0.66
0.7171	36.0	900	0.4119	0.71
0.7171	37.0	925	0.4075	0.67
0.7171	38.0	950	0.4406	0.73
0.7171	39.0	975	0.4081	0.72
0.4731	40.0	1000	0.4191	0.67
0.4731	41.0	1025	0.4217	0.68
0.4731	42.0	1050	0.3983	0.73
0.4731	43.0	1075	0.4092	0.66
0.4731	44.0	1100	0.4248	0.69
0.4731	45.0	1125	0.4218	0.68
0.4731	46.0	1150	0.4371	0.7
0.4731	47.0	1175	0.4099	0.69
0.4731	48.0	1200	0.4300	0.69
0.4731	49.0	1225	0.4094	0.72
0.4731	50.0	1250	0.4206	0.71
0.4731	51.0	1275	0.4241	0.72
0.4731	52.0	1300	0.4253	0.66
0.4731	53.0	1325	0.4117	0.66
0.4731	54.0	1350	0.4174	0.67
0.4731	55.0	1375	0.4131	0.67
0.4731	56.0	1400	0.4231	0.67
0.4731	57.0	1425	0.4059	0.7
0.4731	58.0	1450	0.4168	0.72
0.4731	59.0	1475	0.4236	0.68
0.4204	60.0	1500	0.4001	0.68
0.4204	61.0	1525	0.4158	0.71
0.4204	62.0	1550	0.4303	0.68
0.4204	63.0	1575	0.4155	0.65
0.4204	64.0	1600	0.4195	0.66
0.4204	65.0	1625	0.4315	0.67
0.4204	66.0	1650	0.4240	0.71
0.4204	67.0	1675	0.4191	0.68
0.4204	68.0	1700	0.4214	0.71
0.4204	69.0	1725	0.4170	0.71
0.4204	70.0	1750	0.4158	0.68
0.4204	71.0	1775	0.4230	0.69
0.4204	72.0	1800	0.4106	0.69
0.4204	73.0	1825	0.4255	0.68
0.4204	74.0	1850	0.4223	0.67
0.4204	75.0	1875	0.4124	0.7
0.4204	76.0	1900	0.4114	0.7
0.4204	77.0	1925	0.4115	0.71
0.4204	78.0	1950	0.4136	0.71
0.4204	79.0	1975	0.4150	0.71
0.3939	80.0	2000	0.4136	0.71

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826054840

20230826054840

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826054840

Evaluation results