20230826105641

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6024
Accuracy: 0.64

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.6078	0.65
No log	2.0	50	0.5963	0.66
No log	3.0	75	0.6125	0.65
No log	4.0	100	0.6042	0.66
No log	5.0	125	0.6065	0.66
No log	6.0	150	0.6020	0.65
No log	7.0	175	0.5987	0.65
No log	8.0	200	0.6016	0.66
No log	9.0	225	0.6066	0.66
No log	10.0	250	0.6112	0.66
No log	11.0	275	0.6085	0.66
No log	12.0	300	0.5976	0.66
No log	13.0	325	0.6074	0.66
No log	14.0	350	0.6060	0.65
No log	15.0	375	0.6254	0.65
No log	16.0	400	0.6031	0.66
No log	17.0	425	0.6011	0.67
No log	18.0	450	0.6063	0.66
No log	19.0	475	0.6031	0.65
0.6484	20.0	500	0.6013	0.65
0.6484	21.0	525	0.6041	0.65
0.6484	22.0	550	0.6037	0.65
0.6484	23.0	575	0.6046	0.65
0.6484	24.0	600	0.6072	0.66
0.6484	25.0	625	0.5980	0.66
0.6484	26.0	650	0.6039	0.64
0.6484	27.0	675	0.6025	0.65
0.6484	28.0	700	0.6062	0.65
0.6484	29.0	725	0.6056	0.64
0.6484	30.0	750	0.6091	0.61
0.6484	31.0	775	0.6037	0.65
0.6484	32.0	800	0.6037	0.63
0.6484	33.0	825	0.6175	0.64
0.6484	34.0	850	0.6089	0.62
0.6484	35.0	875	0.6076	0.64
0.6484	36.0	900	0.6073	0.64
0.6484	37.0	925	0.6059	0.64
0.6484	38.0	950	0.6109	0.63
0.6484	39.0	975	0.6090	0.64
0.6362	40.0	1000	0.6080	0.64
0.6362	41.0	1025	0.5994	0.64
0.6362	42.0	1050	0.6034	0.64
0.6362	43.0	1075	0.6113	0.6
0.6362	44.0	1100	0.6131	0.64
0.6362	45.0	1125	0.6150	0.61
0.6362	46.0	1150	0.6115	0.63
0.6362	47.0	1175	0.6055	0.64
0.6362	48.0	1200	0.6033	0.64
0.6362	49.0	1225	0.6047	0.64
0.6362	50.0	1250	0.6037	0.64
0.6362	51.0	1275	0.6010	0.63
0.6362	52.0	1300	0.5988	0.64
0.6362	53.0	1325	0.5991	0.64
0.6362	54.0	1350	0.6019	0.64
0.6362	55.0	1375	0.6002	0.64
0.6362	56.0	1400	0.6006	0.64
0.6362	57.0	1425	0.5992	0.63
0.6362	58.0	1450	0.5992	0.63
0.6362	59.0	1475	0.5992	0.64
0.6341	60.0	1500	0.6026	0.64
0.6341	61.0	1525	0.6022	0.64
0.6341	62.0	1550	0.6026	0.64
0.6341	63.0	1575	0.6036	0.64
0.6341	64.0	1600	0.6039	0.64
0.6341	65.0	1625	0.6041	0.64
0.6341	66.0	1650	0.6034	0.64
0.6341	67.0	1675	0.6049	0.64
0.6341	68.0	1700	0.6027	0.64
0.6341	69.0	1725	0.6057	0.64
0.6341	70.0	1750	0.6056	0.64
0.6341	71.0	1775	0.6048	0.64
0.6341	72.0	1800	0.6019	0.64
0.6341	73.0	1825	0.6021	0.64
0.6341	74.0	1850	0.6018	0.64
0.6341	75.0	1875	0.6027	0.64
0.6341	76.0	1900	0.6025	0.64
0.6341	77.0	1925	0.6021	0.64
0.6341	78.0	1950	0.6023	0.64
0.6341	79.0	1975	0.6024	0.64
0.626	80.0	2000	0.6024	0.64

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826105641

20230826105641

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826105641

Evaluation results