20230903172232

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5325
Accuracy: 0.6552

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.4929	0.5
0.5083	2.0	680	0.4935	0.5
0.5024	3.0	1020	0.4979	0.5
0.5024	4.0	1360	0.4791	0.5
0.4909	5.0	1700	0.4873	0.5470
0.4734	6.0	2040	0.4883	0.5345
0.4734	7.0	2380	0.4729	0.5502
0.4616	8.0	2720	0.4693	0.4922
0.446	9.0	3060	0.4777	0.5549
0.446	10.0	3400	0.5197	0.6348
0.444	11.0	3740	0.4987	0.6207
0.4366	12.0	4080	0.4764	0.5846
0.4366	13.0	4420	0.4807	0.5705
0.4257	14.0	4760	0.5061	0.6332
0.4205	15.0	5100	0.4879	0.5204
0.4205	16.0	5440	0.5076	0.6301
0.419	17.0	5780	0.4885	0.5909
0.4103	18.0	6120	0.5273	0.6583
0.4103	19.0	6460	0.4833	0.5423
0.4107	20.0	6800	0.5060	0.5784
0.4015	21.0	7140	0.5064	0.6489
0.4015	22.0	7480	0.4873	0.5298
0.4032	23.0	7820	0.5016	0.6458
0.3949	24.0	8160	0.4993	0.6301
0.3961	25.0	8500	0.4975	0.6113
0.3961	26.0	8840	0.4924	0.5674
0.3917	27.0	9180	0.5187	0.6708
0.3894	28.0	9520	0.4951	0.5909
0.3894	29.0	9860	0.5029	0.6113
0.3867	30.0	10200	0.5276	0.6677
0.3842	31.0	10540	0.5023	0.6285
0.3842	32.0	10880	0.5175	0.6599
0.3845	33.0	11220	0.5094	0.6348
0.3798	34.0	11560	0.5120	0.6411
0.3798	35.0	11900	0.5237	0.6646
0.3799	36.0	12240	0.5030	0.5737
0.3807	37.0	12580	0.5234	0.6520
0.3807	38.0	12920	0.5183	0.6536
0.373	39.0	13260	0.5078	0.6034
0.375	40.0	13600	0.5172	0.6536
0.375	41.0	13940	0.5164	0.6505
0.3738	42.0	14280	0.5180	0.6332
0.369	43.0	14620	0.5145	0.6301
0.369	44.0	14960	0.5153	0.6223
0.3722	45.0	15300	0.5289	0.6818
0.3685	46.0	15640	0.5203	0.6567
0.3685	47.0	15980	0.5210	0.6285
0.3688	48.0	16320	0.5113	0.6144
0.3661	49.0	16660	0.5097	0.5439
0.3657	50.0	17000	0.5166	0.6536
0.3657	51.0	17340	0.5208	0.6552
0.3656	52.0	17680	0.5249	0.6646
0.3643	53.0	18020	0.5056	0.5940
0.3643	54.0	18360	0.5122	0.6583
0.3611	55.0	18700	0.5247	0.6395
0.3629	56.0	19040	0.5301	0.6599
0.3629	57.0	19380	0.5284	0.6473
0.3597	58.0	19720	0.5316	0.6473
0.361	59.0	20060	0.5315	0.6552
0.361	60.0	20400	0.5424	0.6567
0.3587	61.0	20740	0.5338	0.6442
0.3557	62.0	21080	0.5283	0.6285
0.3557	63.0	21420	0.5287	0.6599
0.3556	64.0	21760	0.5307	0.6426
0.3578	65.0	22100	0.5326	0.6489
0.3578	66.0	22440	0.5207	0.5784
0.3504	67.0	22780	0.5271	0.6348
0.3588	68.0	23120	0.5338	0.6489
0.3588	69.0	23460	0.5386	0.6583
0.3553	70.0	23800	0.5308	0.6567
0.3511	71.0	24140	0.5325	0.6473
0.3511	72.0	24480	0.5403	0.6614
0.3522	73.0	24820	0.5319	0.6379
0.3534	74.0	25160	0.5332	0.6505
0.3495	75.0	25500	0.5343	0.6505
0.3495	76.0	25840	0.5312	0.6567
0.3535	77.0	26180	0.5356	0.6505
0.3491	78.0	26520	0.5342	0.6536
0.3491	79.0	26860	0.5327	0.6552
0.3518	80.0	27200	0.5325	0.6552

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230903172232

20230903172232

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230903172232

Evaluation results