20230826130948

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5311
Accuracy: 0.65

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.5416	0.66
No log	2.0	50	0.5394	0.64
No log	3.0	75	0.5376	0.65
No log	4.0	100	0.5476	0.65
No log	5.0	125	0.5371	0.64
No log	6.0	150	0.5442	0.63
No log	7.0	175	0.5413	0.65
No log	8.0	200	0.5381	0.65
No log	9.0	225	0.5366	0.65
No log	10.0	250	0.5402	0.65
No log	11.0	275	0.5405	0.65
No log	12.0	300	0.5396	0.65
No log	13.0	325	0.5379	0.66
No log	14.0	350	0.5375	0.66
No log	15.0	375	0.5393	0.65
No log	16.0	400	0.5371	0.66
No log	17.0	425	0.5286	0.66
No log	18.0	450	0.5313	0.65
No log	19.0	475	0.5427	0.62
0.616	20.0	500	0.5469	0.63
0.616	21.0	525	0.5348	0.65
0.616	22.0	550	0.5352	0.64
0.616	23.0	575	0.5434	0.63
0.616	24.0	600	0.5437	0.62
0.616	25.0	625	0.5344	0.65
0.616	26.0	650	0.5344	0.66
0.616	27.0	675	0.5319	0.66
0.616	28.0	700	0.5329	0.66
0.616	29.0	725	0.5313	0.66
0.616	30.0	750	0.5321	0.66
0.616	31.0	775	0.5342	0.65
0.616	32.0	800	0.5364	0.66
0.616	33.0	825	0.5350	0.65
0.616	34.0	850	0.5382	0.65
0.616	35.0	875	0.5330	0.65
0.616	36.0	900	0.5361	0.64
0.616	37.0	925	0.5379	0.63
0.616	38.0	950	0.5314	0.64
0.616	39.0	975	0.5308	0.65
0.6054	40.0	1000	0.5348	0.65
0.6054	41.0	1025	0.5374	0.64
0.6054	42.0	1050	0.5363	0.64
0.6054	43.0	1075	0.5361	0.64
0.6054	44.0	1100	0.5333	0.65
0.6054	45.0	1125	0.5346	0.65
0.6054	46.0	1150	0.5354	0.65
0.6054	47.0	1175	0.5338	0.64
0.6054	48.0	1200	0.5332	0.65
0.6054	49.0	1225	0.5334	0.65
0.6054	50.0	1250	0.5361	0.65
0.6054	51.0	1275	0.5311	0.65
0.6054	52.0	1300	0.5332	0.66
0.6054	53.0	1325	0.5312	0.65
0.6054	54.0	1350	0.5334	0.65
0.6054	55.0	1375	0.5306	0.66
0.6054	56.0	1400	0.5326	0.65
0.6054	57.0	1425	0.5336	0.65
0.6054	58.0	1450	0.5361	0.65
0.6054	59.0	1475	0.5359	0.63
0.5996	60.0	1500	0.5342	0.65
0.5996	61.0	1525	0.5346	0.66
0.5996	62.0	1550	0.5333	0.64
0.5996	63.0	1575	0.5322	0.65
0.5996	64.0	1600	0.5307	0.65
0.5996	65.0	1625	0.5298	0.65
0.5996	66.0	1650	0.5300	0.65
0.5996	67.0	1675	0.5306	0.65
0.5996	68.0	1700	0.5311	0.65
0.5996	69.0	1725	0.5318	0.65
0.5996	70.0	1750	0.5320	0.65
0.5996	71.0	1775	0.5320	0.65
0.5996	72.0	1800	0.5309	0.65
0.5996	73.0	1825	0.5307	0.65
0.5996	74.0	1850	0.5306	0.65
0.5996	75.0	1875	0.5314	0.65
0.5996	76.0	1900	0.5311	0.65
0.5996	77.0	1925	0.5311	0.65
0.5996	78.0	1950	0.5311	0.65
0.5996	79.0	1975	0.5311	0.65
0.596	80.0	2000	0.5311	0.65

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826130948

20230826130948

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826130948

Evaluation results