20230831142618

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6294
Accuracy: 0.5016

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0007
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.6452	0.5
0.6361	2.0	680	0.6325	0.5
0.6378	3.0	1020	0.6175	0.5
0.6378	4.0	1360	0.6705	0.5
0.6367	5.0	1700	0.6476	0.5
0.6284	6.0	2040	0.6180	0.5
0.6284	7.0	2380	0.6174	0.5
0.6308	8.0	2720	0.6441	0.5
0.622	9.0	3060	0.6199	0.5
0.622	10.0	3400	0.6222	0.5
0.6264	11.0	3740	0.6177	0.5
0.6249	12.0	4080	0.6172	0.5
0.6249	13.0	4420	0.6872	0.5
0.6205	14.0	4760	0.6174	0.5
0.6347	15.0	5100	0.6236	0.5
0.6347	16.0	5440	0.6170	0.5
0.6369	17.0	5780	0.6180	0.5
0.623	18.0	6120	0.6256	0.5
0.623	19.0	6460	0.6349	0.5
0.6278	20.0	6800	0.6554	0.5
0.6255	21.0	7140	0.6173	0.5
0.6255	22.0	7480	0.6215	0.5
0.6286	23.0	7820	0.6201	0.5
0.6235	24.0	8160	0.6176	0.5
0.6289	25.0	8500	0.6216	0.5
0.6289	26.0	8840	0.6522	0.5
0.6236	27.0	9180	0.6193	0.5
0.6227	28.0	9520	0.6175	0.5
0.6227	29.0	9860	0.6504	0.5
0.6211	30.0	10200	0.6442	0.5
0.623	31.0	10540	0.6181	0.5
0.623	32.0	10880	0.6220	0.5
0.6206	33.0	11220	0.6185	0.5
0.621	34.0	11560	0.6238	0.5
0.621	35.0	11900	0.6277	0.5
0.6216	36.0	12240	0.6352	0.5
0.6211	37.0	12580	0.6170	0.5
0.6211	38.0	12920	0.6169	0.5
0.6203	39.0	13260	0.6410	0.5
0.619	40.0	13600	0.6190	0.5
0.619	41.0	13940	0.6228	0.5
0.6214	42.0	14280	0.6214	0.5
0.617	43.0	14620	0.6212	0.5
0.617	44.0	14960	0.6172	0.5
0.6211	45.0	15300	0.6309	0.5
0.6168	46.0	15640	0.6250	0.5
0.6168	47.0	15980	0.6371	0.5
0.621	48.0	16320	0.6187	0.5
0.6179	49.0	16660	0.6272	0.5
0.6185	50.0	17000	0.6184	0.5
0.6185	51.0	17340	0.6207	0.5
0.6154	52.0	17680	0.6187	0.5
0.6204	53.0	18020	0.6225	0.5
0.6204	54.0	18360	0.6177	0.5
0.6161	55.0	18700	0.6319	0.5
0.6231	56.0	19040	0.6109	0.5
0.6231	57.0	19380	0.6058	0.5
0.6051	58.0	19720	0.6064	0.5
0.5939	59.0	20060	0.6035	0.5
0.5939	60.0	20400	0.6428	0.5125
0.5818	61.0	20740	0.5962	0.5
0.5724	62.0	21080	0.5954	0.5
0.5724	63.0	21420	0.5971	0.5
0.565	64.0	21760	0.6361	0.5047
0.563	65.0	22100	0.6182	0.5016
0.563	66.0	22440	0.6006	0.5
0.5456	67.0	22780	0.6329	0.5016
0.5507	68.0	23120	0.6332	0.5031
0.5507	69.0	23460	0.6358	0.5
0.5446	70.0	23800	0.6326	0.5031
0.5364	71.0	24140	0.6283	0.5016
0.5364	72.0	24480	0.6214	0.5016
0.5335	73.0	24820	0.6173	0.5
0.532	74.0	25160	0.6214	0.5016
0.5274	75.0	25500	0.6298	0.5016
0.5274	76.0	25840	0.6313	0.5016
0.5265	77.0	26180	0.6241	0.5
0.5233	78.0	26520	0.6215	0.5
0.5233	79.0	26860	0.6280	0.5016
0.5235	80.0	27200	0.6294	0.5016

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230831142618

20230831142618

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230831142618

Evaluation results