20230826121217

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.4150
Accuracy: 0.63

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.4146	0.66
No log	2.0	50	0.4116	0.66
No log	3.0	75	0.4139	0.66
No log	4.0	100	0.4170	0.64
No log	5.0	125	0.4182	0.65
No log	6.0	150	0.4208	0.57
No log	7.0	175	0.4115	0.66
No log	8.0	200	0.4157	0.66
No log	9.0	225	0.4229	0.64
No log	10.0	250	0.4205	0.65
No log	11.0	275	0.4178	0.64
No log	12.0	300	0.4131	0.67
No log	13.0	325	0.4146	0.65
No log	14.0	350	0.4202	0.63
No log	15.0	375	0.4331	0.62
No log	16.0	400	0.4120	0.66
No log	17.0	425	0.4144	0.63
No log	18.0	450	0.4182	0.64
No log	19.0	475	0.4184	0.59
0.5392	20.0	500	0.4161	0.65
0.5392	21.0	525	0.4185	0.64
0.5392	22.0	550	0.4187	0.59
0.5392	23.0	575	0.4186	0.62
0.5392	24.0	600	0.4159	0.65
0.5392	25.0	625	0.4152	0.64
0.5392	26.0	650	0.4151	0.62
0.5392	27.0	675	0.4136	0.63
0.5392	28.0	700	0.4190	0.65
0.5392	29.0	725	0.4225	0.61
0.5392	30.0	750	0.4209	0.57
0.5392	31.0	775	0.4167	0.63
0.5392	32.0	800	0.4153	0.62
0.5392	33.0	825	0.4236	0.6
0.5392	34.0	850	0.4191	0.58
0.5392	35.0	875	0.4160	0.61
0.5392	36.0	900	0.4163	0.62
0.5392	37.0	925	0.4193	0.59
0.5392	38.0	950	0.4208	0.62
0.5392	39.0	975	0.4163	0.6
0.5359	40.0	1000	0.4159	0.6
0.5359	41.0	1025	0.4146	0.62
0.5359	42.0	1050	0.4158	0.6
0.5359	43.0	1075	0.4211	0.59
0.5359	44.0	1100	0.4203	0.59
0.5359	45.0	1125	0.4217	0.57
0.5359	46.0	1150	0.4183	0.6
0.5359	47.0	1175	0.4138	0.63
0.5359	48.0	1200	0.4124	0.63
0.5359	49.0	1225	0.4140	0.63
0.5359	50.0	1250	0.4118	0.64
0.5359	51.0	1275	0.4137	0.62
0.5359	52.0	1300	0.4113	0.63
0.5359	53.0	1325	0.4112	0.62
0.5359	54.0	1350	0.4140	0.63
0.5359	55.0	1375	0.4129	0.64
0.5359	56.0	1400	0.4151	0.64
0.5359	57.0	1425	0.4155	0.63
0.5359	58.0	1450	0.4140	0.63
0.5359	59.0	1475	0.4145	0.64
0.5347	60.0	1500	0.4158	0.63
0.5347	61.0	1525	0.4148	0.62
0.5347	62.0	1550	0.4147	0.6
0.5347	63.0	1575	0.4153	0.64
0.5347	64.0	1600	0.4156	0.63
0.5347	65.0	1625	0.4152	0.64
0.5347	66.0	1650	0.4146	0.64
0.5347	67.0	1675	0.4151	0.64
0.5347	68.0	1700	0.4145	0.61
0.5347	69.0	1725	0.4153	0.61
0.5347	70.0	1750	0.4147	0.64
0.5347	71.0	1775	0.4146	0.64
0.5347	72.0	1800	0.4134	0.62
0.5347	73.0	1825	0.4140	0.63
0.5347	74.0	1850	0.4141	0.64
0.5347	75.0	1875	0.4151	0.63
0.5347	76.0	1900	0.4150	0.62
0.5347	77.0	1925	0.4148	0.61
0.5347	78.0	1950	0.4149	0.62
0.5347	79.0	1975	0.4150	0.63
0.5285	80.0	2000	0.4150	0.63

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826121217

20230826121217

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826121217

Evaluation results