20230831143012

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6198
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.6258	0.5
0.6312	2.0	680	0.6164	0.5
0.6295	3.0	1020	0.6237	0.5
0.6295	4.0	1360	0.6170	0.5
0.6241	5.0	1700	0.6181	0.5
0.6236	6.0	2040	0.6191	0.5
0.6236	7.0	2380	0.6189	0.5
0.6239	8.0	2720	0.6261	0.5
0.6189	9.0	3060	0.6188	0.5
0.6189	10.0	3400	0.6264	0.5
0.623	11.0	3740	0.6200	0.5
0.6207	12.0	4080	0.6273	0.5
0.6207	13.0	4420	0.6450	0.5
0.6183	14.0	4760	0.6217	0.5
0.6235	15.0	5100	0.6226	0.5
0.6235	16.0	5440	0.6237	0.5
0.623	17.0	5780	0.6185	0.5
0.6176	18.0	6120	0.6202	0.5
0.6176	19.0	6460	0.6180	0.5
0.6204	20.0	6800	0.6195	0.5
0.6186	21.0	7140	0.6174	0.5
0.6186	22.0	7480	0.6283	0.5
0.621	23.0	7820	0.6254	0.5
0.6196	24.0	8160	0.6169	0.5
0.6218	25.0	8500	0.6170	0.5
0.6218	26.0	8840	0.6256	0.5
0.621	27.0	9180	0.6479	0.5
0.6189	28.0	9520	0.6170	0.5
0.6189	29.0	9860	0.6219	0.5
0.619	30.0	10200	0.6169	0.5
0.6175	31.0	10540	0.6169	0.5
0.6175	32.0	10880	0.6379	0.5
0.6181	33.0	11220	0.6193	0.5
0.6185	34.0	11560	0.6219	0.5
0.6185	35.0	11900	0.6188	0.5
0.6186	36.0	12240	0.6196	0.5
0.6185	37.0	12580	0.6170	0.5
0.6185	38.0	12920	0.6238	0.5
0.6167	39.0	13260	0.6332	0.5
0.6164	40.0	13600	0.6207	0.5
0.6164	41.0	13940	0.6176	0.5
0.6174	42.0	14280	0.6190	0.5
0.6137	43.0	14620	0.6190	0.5
0.6137	44.0	14960	0.6175	0.5
0.6179	45.0	15300	0.6263	0.5
0.6141	46.0	15640	0.6183	0.5
0.6141	47.0	15980	0.6275	0.5
0.6176	48.0	16320	0.6174	0.5
0.616	49.0	16660	0.6224	0.5
0.6162	50.0	17000	0.6173	0.5
0.6162	51.0	17340	0.6191	0.5
0.6135	52.0	17680	0.6187	0.5
0.6186	53.0	18020	0.6232	0.5
0.6186	54.0	18360	0.6191	0.5
0.6135	55.0	18700	0.6184	0.5
0.6138	56.0	19040	0.6186	0.5
0.6138	57.0	19380	0.6176	0.5
0.6137	58.0	19720	0.6236	0.5
0.6153	59.0	20060	0.6251	0.5
0.6153	60.0	20400	0.6166	0.5
0.6132	61.0	20740	0.6175	0.5
0.6131	62.0	21080	0.6199	0.5
0.6131	63.0	21420	0.6178	0.5
0.6121	64.0	21760	0.6212	0.5
0.6169	65.0	22100	0.6183	0.5
0.6169	66.0	22440	0.6252	0.5
0.6079	67.0	22780	0.6191	0.5
0.6151	68.0	23120	0.6170	0.5
0.6151	69.0	23460	0.6182	0.5
0.6128	70.0	23800	0.6191	0.5
0.6118	71.0	24140	0.6194	0.5
0.6118	72.0	24480	0.6224	0.5
0.6112	73.0	24820	0.6199	0.5
0.6129	74.0	25160	0.6210	0.5
0.6109	75.0	25500	0.6193	0.5
0.6109	76.0	25840	0.6210	0.5
0.612	77.0	26180	0.6187	0.5
0.6109	78.0	26520	0.6203	0.5
0.6109	79.0	26860	0.6197	0.5
0.6115	80.0	27200	0.6198	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230831143012

20230831143012

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230831143012

Evaluation results