20230826040634

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.4165
Accuracy: 0.67

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.6199	0.4
No log	2.0	50	0.6643	0.59
No log	3.0	75	0.5067	0.54
No log	4.0	100	0.4272	0.63
No log	5.0	125	0.4341	0.49
No log	6.0	150	0.4488	0.44
No log	7.0	175	0.4092	0.69
No log	8.0	200	0.4564	0.61
No log	9.0	225	0.4367	0.6
No log	10.0	250	0.4343	0.65
No log	11.0	275	0.4121	0.66
No log	12.0	300	0.4300	0.64
No log	13.0	325	0.4239	0.67
No log	14.0	350	0.4148	0.65
No log	15.0	375	0.4311	0.67
No log	16.0	400	0.4143	0.62
No log	17.0	425	0.4166	0.65
No log	18.0	450	0.4120	0.63
No log	19.0	475	0.4121	0.63
0.6423	20.0	500	0.4066	0.67
0.6423	21.0	525	0.4047	0.64
0.6423	22.0	550	0.4215	0.63
0.6423	23.0	575	0.4074	0.61
0.6423	24.0	600	0.4068	0.66
0.6423	25.0	625	0.4191	0.61
0.6423	26.0	650	0.4035	0.6
0.6423	27.0	675	0.4228	0.58
0.6423	28.0	700	0.4242	0.66
0.6423	29.0	725	0.4238	0.64
0.6423	30.0	750	0.4788	0.62
0.6423	31.0	775	0.4214	0.64
0.6423	32.0	800	0.4283	0.63
0.6423	33.0	825	0.4222	0.64
0.6423	34.0	850	0.4233	0.66
0.6423	35.0	875	0.4401	0.67
0.6423	36.0	900	0.4584	0.66
0.6423	37.0	925	0.4362	0.68
0.6423	38.0	950	0.3989	0.67
0.6423	39.0	975	0.4379	0.67
0.5234	40.0	1000	0.4094	0.7
0.5234	41.0	1025	0.4683	0.68
0.5234	42.0	1050	0.4360	0.65
0.5234	43.0	1075	0.4382	0.65
0.5234	44.0	1100	0.4057	0.67
0.5234	45.0	1125	0.4300	0.65
0.5234	46.0	1150	0.4253	0.67
0.5234	47.0	1175	0.4346	0.65
0.5234	48.0	1200	0.4167	0.66
0.5234	49.0	1225	0.4572	0.65
0.5234	50.0	1250	0.4413	0.67
0.5234	51.0	1275	0.4160	0.66
0.5234	52.0	1300	0.4044	0.67
0.5234	53.0	1325	0.4246	0.67
0.5234	54.0	1350	0.4075	0.69
0.5234	55.0	1375	0.4202	0.68
0.5234	56.0	1400	0.4382	0.68
0.5234	57.0	1425	0.4282	0.68
0.5234	58.0	1450	0.4145	0.67
0.5234	59.0	1475	0.4202	0.67
0.4334	60.0	1500	0.4233	0.68
0.4334	61.0	1525	0.4285	0.67
0.4334	62.0	1550	0.4272	0.67
0.4334	63.0	1575	0.4233	0.67
0.4334	64.0	1600	0.4339	0.67
0.4334	65.0	1625	0.4171	0.67
0.4334	66.0	1650	0.4095	0.67
0.4334	67.0	1675	0.4198	0.67
0.4334	68.0	1700	0.4170	0.67
0.4334	69.0	1725	0.4264	0.67
0.4334	70.0	1750	0.4363	0.67
0.4334	71.0	1775	0.4206	0.67
0.4334	72.0	1800	0.4197	0.67
0.4334	73.0	1825	0.4302	0.67
0.4334	74.0	1850	0.4257	0.68
0.4334	75.0	1875	0.4187	0.68
0.4334	76.0	1900	0.4252	0.68
0.4334	77.0	1925	0.4272	0.68
0.4334	78.0	1950	0.4203	0.68
0.4334	79.0	1975	0.4160	0.67
0.4063	80.0	2000	0.4165	0.67

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826040634

20230826040634

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826040634

Evaluation results