20230826114434

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5403
Accuracy: 0.64

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.5448	0.65
No log	2.0	50	0.5352	0.66
No log	3.0	75	0.5467	0.65
No log	4.0	100	0.5432	0.65
No log	5.0	125	0.5446	0.66
No log	6.0	150	0.5419	0.63
No log	7.0	175	0.5364	0.66
No log	8.0	200	0.5400	0.66
No log	9.0	225	0.5460	0.66
No log	10.0	250	0.5479	0.66
No log	11.0	275	0.5429	0.66
No log	12.0	300	0.5363	0.66
No log	13.0	325	0.5432	0.66
No log	14.0	350	0.5446	0.63
No log	15.0	375	0.5619	0.65
No log	16.0	400	0.5400	0.66
No log	17.0	425	0.5395	0.66
No log	18.0	450	0.5439	0.66
No log	19.0	475	0.5420	0.66
0.6126	20.0	500	0.5402	0.66
0.6126	21.0	525	0.5431	0.65
0.6126	22.0	550	0.5421	0.62
0.6126	23.0	575	0.5432	0.65
0.6126	24.0	600	0.5438	0.65
0.6126	25.0	625	0.5364	0.64
0.6126	26.0	650	0.5414	0.63
0.6126	27.0	675	0.5395	0.65
0.6126	28.0	700	0.5440	0.65
0.6126	29.0	725	0.5446	0.63
0.6126	30.0	750	0.5472	0.59
0.6126	31.0	775	0.5419	0.65
0.6126	32.0	800	0.5413	0.62
0.6126	33.0	825	0.5530	0.62
0.6126	34.0	850	0.5461	0.62
0.6126	35.0	875	0.5440	0.64
0.6126	36.0	900	0.5437	0.64
0.6126	37.0	925	0.5435	0.63
0.6126	38.0	950	0.5482	0.63
0.6126	39.0	975	0.5449	0.64
0.6037	40.0	1000	0.5442	0.64
0.6037	41.0	1025	0.5377	0.62
0.6037	42.0	1050	0.5411	0.64
0.6037	43.0	1075	0.5482	0.59
0.6037	44.0	1100	0.5494	0.62
0.6037	45.0	1125	0.5510	0.6
0.6037	46.0	1150	0.5472	0.61
0.6037	47.0	1175	0.5416	0.64
0.6037	48.0	1200	0.5397	0.64
0.6037	49.0	1225	0.5417	0.64
0.6037	50.0	1250	0.5390	0.64
0.6037	51.0	1275	0.5389	0.63
0.6037	52.0	1300	0.5366	0.64
0.6037	53.0	1325	0.5368	0.64
0.6037	54.0	1350	0.5393	0.64
0.6037	55.0	1375	0.5378	0.64
0.6037	56.0	1400	0.5391	0.64
0.6037	57.0	1425	0.5383	0.63
0.6037	58.0	1450	0.5379	0.63
0.6037	59.0	1475	0.5381	0.64
0.6021	60.0	1500	0.5410	0.64
0.6021	61.0	1525	0.5401	0.64
0.6021	62.0	1550	0.5403	0.64
0.6021	63.0	1575	0.5411	0.64
0.6021	64.0	1600	0.5415	0.64
0.6021	65.0	1625	0.5415	0.64
0.6021	66.0	1650	0.5409	0.64
0.6021	67.0	1675	0.5419	0.64
0.6021	68.0	1700	0.5401	0.64
0.6021	69.0	1725	0.5424	0.64
0.6021	70.0	1750	0.5420	0.64
0.6021	71.0	1775	0.5415	0.64
0.6021	72.0	1800	0.5391	0.64
0.6021	73.0	1825	0.5396	0.64
0.6021	74.0	1850	0.5396	0.64
0.6021	75.0	1875	0.5405	0.64
0.6021	76.0	1900	0.5404	0.64
0.6021	77.0	1925	0.5400	0.64
0.6021	78.0	1950	0.5401	0.64
0.6021	79.0	1975	0.5403	0.64
0.5946	80.0	2000	0.5403	0.64

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826114434

20230826114434

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826114434

Evaluation results