20230831201806

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6291
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.6240	0.5
0.6425	2.0	680	0.6234	0.5
0.6397	3.0	1020	0.6203	0.5
0.6397	4.0	1360	0.6364	0.5
0.6363	5.0	1700	0.7003	0.5
0.6373	6.0	2040	0.6233	0.5
0.6373	7.0	2380	0.6233	0.5
0.637	8.0	2720	0.6515	0.5
0.631	9.0	3060	0.6234	0.5
0.631	10.0	3400	0.6299	0.5
0.633	11.0	3740	0.6315	0.5
0.6325	12.0	4080	0.6281	0.5
0.6325	13.0	4420	0.6434	0.5
0.6267	14.0	4760	0.6233	0.5
0.6323	15.0	5100	0.6253	0.5
0.6323	16.0	5440	0.6233	0.5
0.6325	17.0	5780	0.6314	0.5
0.6274	18.0	6120	0.6265	0.5
0.6274	19.0	6460	0.6298	0.5
0.6301	20.0	6800	0.6363	0.5
0.6268	21.0	7140	0.6296	0.5
0.6268	22.0	7480	0.6402	0.5
0.6316	23.0	7820	0.6282	0.5
0.6272	24.0	8160	0.6233	0.5
0.6314	25.0	8500	0.6245	0.5
0.6314	26.0	8840	0.6702	0.5
0.6298	27.0	9180	0.6484	0.5
0.6282	28.0	9520	0.6235	0.5
0.6282	29.0	9860	0.6524	0.5
0.6259	30.0	10200	0.6245	0.5
0.6271	31.0	10540	0.6233	0.5
0.6271	32.0	10880	0.6320	0.5
0.6264	33.0	11220	0.6240	0.5
0.6265	34.0	11560	0.6325	0.5
0.6265	35.0	11900	0.6329	0.5
0.6268	36.0	12240	0.6377	0.5
0.6261	37.0	12580	0.6234	0.5
0.6261	38.0	12920	0.6323	0.5
0.626	39.0	13260	0.6402	0.5
0.6245	40.0	13600	0.6264	0.5
0.6245	41.0	13940	0.6245	0.5
0.6253	42.0	14280	0.6278	0.5
0.6223	43.0	14620	0.6260	0.5
0.6223	44.0	14960	0.6236	0.5
0.6266	45.0	15300	0.6378	0.5
0.6219	46.0	15640	0.6349	0.5
0.6219	47.0	15980	0.6393	0.5
0.6256	48.0	16320	0.6266	0.5
0.6241	49.0	16660	0.6338	0.5
0.624	50.0	17000	0.6237	0.5
0.624	51.0	17340	0.6265	0.5
0.6214	52.0	17680	0.6259	0.5
0.627	53.0	18020	0.6324	0.5
0.627	54.0	18360	0.6257	0.5
0.6218	55.0	18700	0.6246	0.5
0.621	56.0	19040	0.6242	0.5
0.621	57.0	19380	0.6336	0.5
0.6212	58.0	19720	0.6236	0.5
0.6239	59.0	20060	0.6489	0.5
0.6239	60.0	20400	0.6256	0.5
0.6218	61.0	20740	0.6251	0.5
0.6216	62.0	21080	0.6279	0.5
0.6216	63.0	21420	0.6305	0.5
0.6196	64.0	21760	0.6326	0.5
0.6251	65.0	22100	0.6288	0.5
0.6251	66.0	22440	0.6412	0.5
0.6162	67.0	22780	0.6270	0.5
0.6231	68.0	23120	0.6261	0.5
0.6231	69.0	23460	0.6254	0.5
0.6215	70.0	23800	0.6237	0.5
0.6202	71.0	24140	0.6265	0.5
0.6202	72.0	24480	0.6329	0.5
0.6184	73.0	24820	0.6292	0.5
0.6207	74.0	25160	0.6304	0.5
0.6193	75.0	25500	0.6271	0.5
0.6193	76.0	25840	0.6301	0.5
0.6202	77.0	26180	0.6261	0.5
0.6188	78.0	26520	0.6289	0.5
0.6188	79.0	26860	0.6293	0.5
0.6197	80.0	27200	0.6291	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230831201806

20230831201806

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230831201806

Evaluation results