20230826065621

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6391
Accuracy: 0.67

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.02
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.9872	0.34
No log	2.0	50	0.8547	0.59
No log	3.0	75	0.6062	0.64
No log	4.0	100	0.6097	0.61
No log	5.0	125	0.6064	0.62
No log	6.0	150	0.5974	0.63
No log	7.0	175	0.5723	0.66
No log	8.0	200	0.6179	0.63
No log	9.0	225	0.5842	0.62
No log	10.0	250	0.6117	0.68
No log	11.0	275	0.5444	0.64
No log	12.0	300	0.7898	0.68
No log	13.0	325	0.6851	0.68
No log	14.0	350	0.7716	0.69
No log	15.0	375	0.6750	0.71
No log	16.0	400	0.7645	0.7
No log	17.0	425	0.7338	0.7
No log	18.0	450	0.8156	0.66
No log	19.0	475	0.7524	0.68
0.7431	20.0	500	0.8516	0.65
0.7431	21.0	525	0.8224	0.65
0.7431	22.0	550	1.0607	0.67
0.7431	23.0	575	0.8977	0.66
0.7431	24.0	600	0.7860	0.66
0.7431	25.0	625	0.7285	0.66
0.7431	26.0	650	0.7097	0.64
0.7431	27.0	675	0.7292	0.64
0.7431	28.0	700	0.7131	0.65
0.7431	29.0	725	0.8039	0.65
0.7431	30.0	750	0.7988	0.65
0.7431	31.0	775	0.7809	0.64
0.7431	32.0	800	0.7544	0.64
0.7431	33.0	825	0.7492	0.62
0.7431	34.0	850	0.8206	0.64
0.7431	35.0	875	0.6409	0.66
0.7431	36.0	900	0.7144	0.63
0.7431	37.0	925	0.7414	0.63
0.7431	38.0	950	0.7423	0.65
0.7431	39.0	975	0.7766	0.65
0.3363	40.0	1000	0.7182	0.67
0.3363	41.0	1025	0.7375	0.67
0.3363	42.0	1050	0.7236	0.67
0.3363	43.0	1075	0.7218	0.66
0.3363	44.0	1100	0.7324	0.67
0.3363	45.0	1125	0.7291	0.67
0.3363	46.0	1150	0.6803	0.67
0.3363	47.0	1175	0.6637	0.67
0.3363	48.0	1200	0.7064	0.65
0.3363	49.0	1225	0.6534	0.65
0.3363	50.0	1250	0.7230	0.67
0.3363	51.0	1275	0.7338	0.65
0.3363	52.0	1300	0.6495	0.62
0.3363	53.0	1325	0.6540	0.63
0.3363	54.0	1350	0.6994	0.62
0.3363	55.0	1375	0.7040	0.63
0.3363	56.0	1400	0.6775	0.63
0.3363	57.0	1425	0.6425	0.65
0.3363	58.0	1450	0.6424	0.66
0.3363	59.0	1475	0.6782	0.66
0.2375	60.0	1500	0.6770	0.68
0.2375	61.0	1525	0.7029	0.68
0.2375	62.0	1550	0.6824	0.68
0.2375	63.0	1575	0.6847	0.68
0.2375	64.0	1600	0.6767	0.68
0.2375	65.0	1625	0.6362	0.67
0.2375	66.0	1650	0.6292	0.67
0.2375	67.0	1675	0.6470	0.67
0.2375	68.0	1700	0.6661	0.67
0.2375	69.0	1725	0.6305	0.67
0.2375	70.0	1750	0.6492	0.67
0.2375	71.0	1775	0.6525	0.67
0.2375	72.0	1800	0.6339	0.67
0.2375	73.0	1825	0.6621	0.67
0.2375	74.0	1850	0.6562	0.67
0.2375	75.0	1875	0.6397	0.67
0.2375	76.0	1900	0.6496	0.67
0.2375	77.0	1925	0.6402	0.67
0.2375	78.0	1950	0.6382	0.67
0.2375	79.0	1975	0.6407	0.67
0.2102	80.0	2000	0.6391	0.67

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826065621

20230826065621

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826065621

Evaluation results