20230831144955

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6197
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.6175	0.5
0.6359	2.0	680	0.6236	0.5
0.635	3.0	1020	0.6211	0.5
0.635	4.0	1360	0.6253	0.5
0.6306	5.0	1700	0.6421	0.5
0.6268	6.0	2040	0.6297	0.5
0.6268	7.0	2380	0.6351	0.5
0.6314	8.0	2720	0.6053	0.5
0.6135	9.0	3060	0.6185	0.5
0.6135	10.0	3400	0.6316	0.5
0.6245	11.0	3740	0.6219	0.5
0.6198	12.0	4080	0.6203	0.5
0.6198	13.0	4420	0.6516	0.5
0.6151	14.0	4760	0.6231	0.5
0.6223	15.0	5100	0.6235	0.5
0.6223	16.0	5440	0.6204	0.5
0.6216	17.0	5780	0.6225	0.5
0.6168	18.0	6120	0.6176	0.5
0.6168	19.0	6460	0.6204	0.5
0.6179	20.0	6800	0.6179	0.5
0.6169	21.0	7140	0.6193	0.5
0.6169	22.0	7480	0.6414	0.5
0.6206	23.0	7820	0.6196	0.5
0.6181	24.0	8160	0.6248	0.5
0.6269	25.0	8500	0.6173	0.5
0.6269	26.0	8840	0.6234	0.5
0.6201	27.0	9180	0.6239	0.5
0.6162	28.0	9520	0.6182	0.5
0.6162	29.0	9860	0.6260	0.5
0.6166	30.0	10200	0.6190	0.5
0.6159	31.0	10540	0.6192	0.5
0.6159	32.0	10880	0.6261	0.5
0.6158	33.0	11220	0.6295	0.5
0.6166	34.0	11560	0.6238	0.5
0.6166	35.0	11900	0.6221	0.5
0.6163	36.0	12240	0.6198	0.5
0.6177	37.0	12580	0.6177	0.5
0.6177	38.0	12920	0.6202	0.5
0.6158	39.0	13260	0.6231	0.5
0.6147	40.0	13600	0.6209	0.5
0.6147	41.0	13940	0.6191	0.5
0.6173	42.0	14280	0.6195	0.5
0.6129	43.0	14620	0.6213	0.5
0.6129	44.0	14960	0.6245	0.5
0.6173	45.0	15300	0.6235	0.5
0.6128	46.0	15640	0.6184	0.5
0.6128	47.0	15980	0.6252	0.5
0.6174	48.0	16320	0.6216	0.5
0.6157	49.0	16660	0.6248	0.5
0.6151	50.0	17000	0.6191	0.5
0.6151	51.0	17340	0.6212	0.5
0.6132	52.0	17680	0.6197	0.5
0.6173	53.0	18020	0.6233	0.5
0.6173	54.0	18360	0.6223	0.5
0.6132	55.0	18700	0.6173	0.5
0.6129	56.0	19040	0.6218	0.5
0.6129	57.0	19380	0.6178	0.5
0.614	58.0	19720	0.6239	0.5
0.616	59.0	20060	0.6258	0.5
0.616	60.0	20400	0.6181	0.5
0.6136	61.0	20740	0.6195	0.5
0.6132	62.0	21080	0.6205	0.5
0.6132	63.0	21420	0.6177	0.5
0.6121	64.0	21760	0.6221	0.5
0.6164	65.0	22100	0.6190	0.5
0.6164	66.0	22440	0.6225	0.5
0.6073	67.0	22780	0.6205	0.5
0.615	68.0	23120	0.6189	0.5
0.615	69.0	23460	0.6188	0.5
0.6136	70.0	23800	0.6200	0.5
0.6127	71.0	24140	0.6197	0.5
0.6127	72.0	24480	0.6213	0.5
0.6111	73.0	24820	0.6197	0.5
0.6133	74.0	25160	0.6215	0.5
0.6113	75.0	25500	0.6197	0.5
0.6113	76.0	25840	0.6209	0.5
0.6124	77.0	26180	0.6192	0.5
0.6112	78.0	26520	0.6200	0.5
0.6112	79.0	26860	0.6198	0.5
0.612	80.0	27200	0.6197	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230831144955

20230831144955

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230831144955

Evaluation results