20230831190406

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6234
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0007
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.6536	0.5
0.6466	2.0	680	0.6207	0.5
0.6506	3.0	1020	0.6654	0.5
0.6506	4.0	1360	0.6698	0.5
0.6458	5.0	1700	0.6234	0.5
0.6363	6.0	2040	0.6246	0.5
0.6363	7.0	2380	0.6367	0.5
0.6401	8.0	2720	0.6582	0.5
0.6347	9.0	3060	0.6257	0.5
0.6347	10.0	3400	0.6281	0.5
0.6378	11.0	3740	0.6234	0.5
0.637	12.0	4080	0.6274	0.5
0.637	13.0	4420	0.6362	0.5
0.6313	14.0	4760	0.6290	0.5
0.6359	15.0	5100	0.6302	0.5
0.6359	16.0	5440	0.6246	0.5
0.639	17.0	5780	0.6319	0.5
0.6302	18.0	6120	0.6255	0.5
0.6302	19.0	6460	0.6325	0.5
0.6329	20.0	6800	0.6434	0.5
0.6309	21.0	7140	0.6238	0.5
0.6309	22.0	7480	0.6237	0.5
0.6325	23.0	7820	0.6296	0.5
0.6303	24.0	8160	0.6249	0.5
0.6357	25.0	8500	0.6235	0.5
0.6357	26.0	8840	0.6258	0.5
0.6327	27.0	9180	0.6442	0.5
0.6309	28.0	9520	0.6329	0.5
0.6309	29.0	9860	0.6374	0.5
0.6304	30.0	10200	0.6243	0.5
0.6311	31.0	10540	0.6302	0.5
0.6311	32.0	10880	0.6247	0.5
0.6294	33.0	11220	0.6233	0.5
0.6303	34.0	11560	0.6252	0.5
0.6303	35.0	11900	0.6365	0.5
0.63	36.0	12240	0.6300	0.5
0.6304	37.0	12580	0.6290	0.5
0.6304	38.0	12920	0.6243	0.5
0.6288	39.0	13260	0.6440	0.5
0.6298	40.0	13600	0.6260	0.5
0.6298	41.0	13940	0.6296	0.5
0.6292	42.0	14280	0.6245	0.5
0.6255	43.0	14620	0.6253	0.5
0.6255	44.0	14960	0.6459	0.5
0.631	45.0	15300	0.6321	0.5
0.6248	46.0	15640	0.6314	0.5
0.6248	47.0	15980	0.6335	0.5
0.6293	48.0	16320	0.6240	0.5
0.6285	49.0	16660	0.6238	0.5
0.6277	50.0	17000	0.6247	0.5
0.6277	51.0	17340	0.6378	0.5
0.625	52.0	17680	0.6237	0.5
0.6301	53.0	18020	0.6246	0.5
0.6301	54.0	18360	0.6236	0.5
0.6247	55.0	18700	0.6237	0.5
0.6253	56.0	19040	0.6252	0.5
0.6253	57.0	19380	0.6261	0.5
0.6243	58.0	19720	0.6250	0.5
0.6268	59.0	20060	0.6387	0.5
0.6268	60.0	20400	0.6233	0.5
0.625	61.0	20740	0.6239	0.5
0.6245	62.0	21080	0.6233	0.5
0.6245	63.0	21420	0.6256	0.5
0.6232	64.0	21760	0.6263	0.5
0.6279	65.0	22100	0.6233	0.5
0.6279	66.0	22440	0.6339	0.5
0.6185	67.0	22780	0.6237	0.5
0.627	68.0	23120	0.6246	0.5
0.627	69.0	23460	0.6241	0.5
0.6242	70.0	23800	0.6254	0.5
0.6229	71.0	24140	0.6236	0.5
0.6229	72.0	24480	0.6242	0.5
0.621	73.0	24820	0.6238	0.5
0.6226	74.0	25160	0.6237	0.5
0.6222	75.0	25500	0.6233	0.5
0.6222	76.0	25840	0.6244	0.5
0.6224	77.0	26180	0.6234	0.5
0.6212	78.0	26520	0.6239	0.5
0.6212	79.0	26860	0.6238	0.5
0.6222	80.0	27200	0.6234	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230831190406

20230831190406

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for dkqjrm/20230831190406

Dataset used to train dkqjrm/20230831190406

Evaluation results