20230826065732

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5294
Accuracy: 0.67

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.02
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.6448	0.4
No log	2.0	50	0.7950	0.65
No log	3.0	75	0.6181	0.54
No log	4.0	100	0.5601	0.6
No log	5.0	125	0.5816	0.42
No log	6.0	150	0.5957	0.43
No log	7.0	175	0.5331	0.61
No log	8.0	200	0.5507	0.61
No log	9.0	225	0.5438	0.62
No log	10.0	250	0.5455	0.65
No log	11.0	275	0.5141	0.65
No log	12.0	300	0.5019	0.71
No log	13.0	325	0.6824	0.7
No log	14.0	350	0.5735	0.73
No log	15.0	375	0.5578	0.69
No log	16.0	400	0.5607	0.72
No log	17.0	425	0.5974	0.71
No log	18.0	450	0.8102	0.71
No log	19.0	475	0.6757	0.73
0.7598	20.0	500	0.5266	0.74
0.7598	21.0	525	0.6271	0.69
0.7598	22.0	550	0.6341	0.7
0.7598	23.0	575	0.6874	0.7
0.7598	24.0	600	0.5264	0.72
0.7598	25.0	625	0.5148	0.73
0.7598	26.0	650	0.5760	0.77
0.7598	27.0	675	0.6581	0.71
0.7598	28.0	700	0.6479	0.71
0.7598	29.0	725	0.6960	0.69
0.7598	30.0	750	0.6919	0.7
0.7598	31.0	775	0.6421	0.68
0.7598	32.0	800	0.5681	0.68
0.7598	33.0	825	0.5631	0.68
0.7598	34.0	850	0.5676	0.66
0.7598	35.0	875	0.5389	0.68
0.7598	36.0	900	0.6267	0.68
0.7598	37.0	925	0.6107	0.65
0.7598	38.0	950	0.5359	0.66
0.7598	39.0	975	0.5741	0.67
0.4266	40.0	1000	0.5928	0.69
0.4266	41.0	1025	0.5307	0.68
0.4266	42.0	1050	0.5909	0.66
0.4266	43.0	1075	0.5733	0.66
0.4266	44.0	1100	0.5561	0.66
0.4266	45.0	1125	0.5600	0.69
0.4266	46.0	1150	0.5228	0.66
0.4266	47.0	1175	0.5383	0.7
0.4266	48.0	1200	0.5643	0.69
0.4266	49.0	1225	0.5493	0.7
0.4266	50.0	1250	0.5576	0.7
0.4266	51.0	1275	0.5543	0.68
0.4266	52.0	1300	0.5615	0.69
0.4266	53.0	1325	0.5358	0.67
0.4266	54.0	1350	0.5405	0.69
0.4266	55.0	1375	0.5327	0.69
0.4266	56.0	1400	0.5645	0.67
0.4266	57.0	1425	0.5240	0.67
0.4266	58.0	1450	0.5402	0.67
0.4266	59.0	1475	0.5495	0.68
0.3249	60.0	1500	0.5624	0.66
0.3249	61.0	1525	0.5513	0.67
0.3249	62.0	1550	0.5537	0.68
0.3249	63.0	1575	0.5444	0.68
0.3249	64.0	1600	0.5553	0.68
0.3249	65.0	1625	0.5221	0.68
0.3249	66.0	1650	0.5136	0.68
0.3249	67.0	1675	0.5231	0.69
0.3249	68.0	1700	0.5305	0.69
0.3249	69.0	1725	0.5278	0.68
0.3249	70.0	1750	0.5440	0.66
0.3249	71.0	1775	0.5411	0.67
0.3249	72.0	1800	0.5346	0.69
0.3249	73.0	1825	0.5241	0.67
0.3249	74.0	1850	0.5425	0.67
0.3249	75.0	1875	0.5213	0.67
0.3249	76.0	1900	0.5405	0.66
0.3249	77.0	1925	0.5251	0.67
0.3249	78.0	1950	0.5300	0.67
0.3249	79.0	1975	0.5285	0.67
0.2946	80.0	2000	0.5294	0.67

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826065732

20230826065732

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826065732

Evaluation results