20230826040158

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5369
Accuracy: 0.72

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.6901	0.36
No log	2.0	50	0.7172	0.56
No log	3.0	75	0.6264	0.58
No log	4.0	100	0.5864	0.61
No log	5.0	125	0.5717	0.47
No log	6.0	150	0.6748	0.4
No log	7.0	175	0.5272	0.66
No log	8.0	200	0.5651	0.64
No log	9.0	225	0.5785	0.65
No log	10.0	250	0.5773	0.65
No log	11.0	275	0.5287	0.65
No log	12.0	300	0.5612	0.64
No log	13.0	325	0.5734	0.66
No log	14.0	350	0.5196	0.65
No log	15.0	375	0.5491	0.66
No log	16.0	400	0.5137	0.63
No log	17.0	425	0.5333	0.67
No log	18.0	450	0.5518	0.66
No log	19.0	475	0.5222	0.66
0.7077	20.0	500	0.4976	0.67
0.7077	21.0	525	0.4995	0.67
0.7077	22.0	550	0.5837	0.65
0.7077	23.0	575	0.5801	0.62
0.7077	24.0	600	0.5377	0.63
0.7077	25.0	625	0.5509	0.63
0.7077	26.0	650	0.5863	0.67
0.7077	27.0	675	0.5980	0.65
0.7077	28.0	700	0.6482	0.67
0.7077	29.0	725	0.5851	0.66
0.7077	30.0	750	0.6651	0.67
0.7077	31.0	775	0.5497	0.69
0.7077	32.0	800	0.5907	0.72
0.7077	33.0	825	0.5805	0.68
0.7077	34.0	850	0.5844	0.69
0.7077	35.0	875	0.5750	0.69
0.7077	36.0	900	0.6175	0.7
0.7077	37.0	925	0.5754	0.68
0.7077	38.0	950	0.5758	0.69
0.7077	39.0	975	0.6013	0.69
0.4491	40.0	1000	0.5384	0.68
0.4491	41.0	1025	0.5931	0.7
0.4491	42.0	1050	0.6030	0.7
0.4491	43.0	1075	0.5630	0.67
0.4491	44.0	1100	0.5599	0.67
0.4491	45.0	1125	0.5799	0.66
0.4491	46.0	1150	0.5545	0.69
0.4491	47.0	1175	0.5643	0.68
0.4491	48.0	1200	0.5845	0.7
0.4491	49.0	1225	0.5781	0.69
0.4491	50.0	1250	0.5623	0.7
0.4491	51.0	1275	0.5528	0.69
0.4491	52.0	1300	0.5442	0.71
0.4491	53.0	1325	0.5498	0.69
0.4491	54.0	1350	0.5391	0.7
0.4491	55.0	1375	0.5570	0.71
0.4491	56.0	1400	0.5729	0.71
0.4491	57.0	1425	0.5352	0.72
0.4491	58.0	1450	0.5538	0.7
0.4491	59.0	1475	0.5563	0.71
0.3353	60.0	1500	0.5704	0.71
0.3353	61.0	1525	0.5726	0.7
0.3353	62.0	1550	0.5694	0.7
0.3353	63.0	1575	0.5714	0.71
0.3353	64.0	1600	0.5551	0.7
0.3353	65.0	1625	0.5548	0.7
0.3353	66.0	1650	0.5430	0.7
0.3353	67.0	1675	0.5449	0.71
0.3353	68.0	1700	0.5461	0.71
0.3353	69.0	1725	0.5440	0.71
0.3353	70.0	1750	0.5590	0.71
0.3353	71.0	1775	0.5391	0.71
0.3353	72.0	1800	0.5516	0.71
0.3353	73.0	1825	0.5474	0.72
0.3353	74.0	1850	0.5477	0.72
0.3353	75.0	1875	0.5372	0.71
0.3353	76.0	1900	0.5445	0.71
0.3353	77.0	1925	0.5421	0.71
0.3353	78.0	1950	0.5376	0.7
0.3353	79.0	1975	0.5358	0.72
0.3108	80.0	2000	0.5369	0.72

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826040158

20230826040158

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826040158

Evaluation results