20230903230355

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6499
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.6198	0.5
0.6345	2.0	680	0.6217	0.5
0.6271	3.0	1020	0.6081	0.5
0.6271	4.0	1360	0.6146	0.5
0.6166	5.0	1700	0.6180	0.5
0.619	6.0	2040	0.6220	0.5
0.619	7.0	2380	0.6023	0.5
0.605	8.0	2720	0.5987	0.5
0.5863	9.0	3060	0.6086	0.5016
0.5863	10.0	3400	0.6292	0.5047
0.5789	11.0	3740	0.6150	0.5016
0.5716	12.0	4080	0.5969	0.5
0.5716	13.0	4420	0.6045	0.5
0.5599	14.0	4760	0.6281	0.4969
0.5555	15.0	5100	0.6021	0.5
0.5555	16.0	5440	0.6161	0.5
0.553	17.0	5780	0.6050	0.5
0.5412	18.0	6120	0.6483	0.4984
0.5412	19.0	6460	0.6169	0.5
0.5403	20.0	6800	0.6287	0.5
0.5349	21.0	7140	0.6369	0.5
0.5349	22.0	7480	0.6163	0.5
0.5341	23.0	7820	0.6180	0.4984
0.5264	24.0	8160	0.6171	0.5
0.5265	25.0	8500	0.6289	0.5
0.5265	26.0	8840	0.6161	0.5
0.5218	27.0	9180	0.6542	0.4984
0.5204	28.0	9520	0.6246	0.5
0.5204	29.0	9860	0.6192	0.5
0.5164	30.0	10200	0.6213	0.5
0.5136	31.0	10540	0.6256	0.5
0.5136	32.0	10880	0.6605	0.5
0.5113	33.0	11220	0.6310	0.5
0.5101	34.0	11560	0.6348	0.5
0.5101	35.0	11900	0.6392	0.5
0.5095	36.0	12240	0.6291	0.5
0.5058	37.0	12580	0.6399	0.5
0.5058	38.0	12920	0.6546	0.5
0.5022	39.0	13260	0.6294	0.5
0.5009	40.0	13600	0.6348	0.5
0.5009	41.0	13940	0.6261	0.5
0.5005	42.0	14280	0.6442	0.5
0.4952	43.0	14620	0.6338	0.5
0.4952	44.0	14960	0.6358	0.5
0.5019	45.0	15300	0.6387	0.5
0.4968	46.0	15640	0.6383	0.5
0.4968	47.0	15980	0.6361	0.5
0.4972	48.0	16320	0.6428	0.4984
0.4947	49.0	16660	0.6308	0.5
0.4958	50.0	17000	0.6443	0.5
0.4958	51.0	17340	0.6520	0.5
0.4926	52.0	17680	0.6491	0.5
0.4942	53.0	18020	0.6400	0.5
0.4942	54.0	18360	0.6373	0.5
0.4895	55.0	18700	0.6579	0.5
0.4908	56.0	19040	0.6611	0.5
0.4908	57.0	19380	0.6474	0.5
0.4916	58.0	19720	0.6537	0.5
0.492	59.0	20060	0.6507	0.5
0.492	60.0	20400	0.6582	0.5
0.4855	61.0	20740	0.6578	0.5
0.4874	62.0	21080	0.6498	0.5
0.4874	63.0	21420	0.6445	0.5
0.485	64.0	21760	0.6470	0.5
0.4889	65.0	22100	0.6483	0.5
0.4889	66.0	22440	0.6412	0.5
0.4778	67.0	22780	0.6437	0.5
0.4862	68.0	23120	0.6509	0.5
0.4862	69.0	23460	0.6491	0.5
0.4834	70.0	23800	0.6485	0.5
0.4802	71.0	24140	0.6444	0.5
0.4802	72.0	24480	0.6460	0.5
0.4818	73.0	24820	0.6500	0.5
0.4815	74.0	25160	0.6549	0.5
0.4804	75.0	25500	0.6577	0.5
0.4804	76.0	25840	0.6533	0.5
0.4812	77.0	26180	0.6516	0.5
0.4801	78.0	26520	0.6513	0.5
0.4801	79.0	26860	0.6519	0.5
0.48	80.0	27200	0.6499	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230903230355

20230903230355

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230903230355

Evaluation results