20230826022757

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5491
Accuracy: 0.74

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.6588	0.44
No log	2.0	50	0.6258	0.63
No log	3.0	75	0.6839	0.66
No log	4.0	100	0.6238	0.63
No log	5.0	125	0.5878	0.64
No log	6.0	150	0.5895	0.61
No log	7.0	175	0.5951	0.63
No log	8.0	200	0.6701	0.62
No log	9.0	225	0.5858	0.62
No log	10.0	250	0.6603	0.64
No log	11.0	275	0.5708	0.65
No log	12.0	300	0.5657	0.63
No log	13.0	325	0.5691	0.68
No log	14.0	350	0.5820	0.67
No log	15.0	375	0.5245	0.7
No log	16.0	400	0.6291	0.7
No log	17.0	425	0.6177	0.7
No log	18.0	450	0.7375	0.7
No log	19.0	475	0.6500	0.68
0.6647	20.0	500	0.6727	0.71
0.6647	21.0	525	0.7042	0.72
0.6647	22.0	550	0.7448	0.71
0.6647	23.0	575	0.6157	0.72
0.6647	24.0	600	0.7661	0.72
0.6647	25.0	625	0.6832	0.72
0.6647	26.0	650	0.6971	0.72
0.6647	27.0	675	0.6274	0.72
0.6647	28.0	700	0.6846	0.73
0.6647	29.0	725	0.6319	0.73
0.6647	30.0	750	0.7387	0.74
0.6647	31.0	775	0.6482	0.74
0.6647	32.0	800	0.6043	0.73
0.6647	33.0	825	0.6589	0.72
0.6647	34.0	850	0.7023	0.74
0.6647	35.0	875	0.6197	0.74
0.6647	36.0	900	0.6325	0.75
0.6647	37.0	925	0.6264	0.75
0.6647	38.0	950	0.6198	0.73
0.6647	39.0	975	0.6239	0.74
0.2917	40.0	1000	0.6072	0.74
0.2917	41.0	1025	0.6354	0.74
0.2917	42.0	1050	0.5724	0.74
0.2917	43.0	1075	0.5799	0.74
0.2917	44.0	1100	0.5863	0.75
0.2917	45.0	1125	0.6033	0.74
0.2917	46.0	1150	0.6735	0.73
0.2917	47.0	1175	0.6068	0.73
0.2917	48.0	1200	0.6064	0.73
0.2917	49.0	1225	0.6205	0.74
0.2917	50.0	1250	0.5605	0.74
0.2917	51.0	1275	0.6015	0.75
0.2917	52.0	1300	0.5771	0.75
0.2917	53.0	1325	0.5400	0.75
0.2917	54.0	1350	0.5911	0.76
0.2917	55.0	1375	0.5665	0.76
0.2917	56.0	1400	0.5658	0.75
0.2917	57.0	1425	0.5775	0.75
0.2917	58.0	1450	0.5690	0.74
0.2917	59.0	1475	0.5689	0.75
0.2234	60.0	1500	0.5793	0.74
0.2234	61.0	1525	0.5490	0.75
0.2234	62.0	1550	0.5899	0.75
0.2234	63.0	1575	0.5612	0.75
0.2234	64.0	1600	0.5451	0.75
0.2234	65.0	1625	0.5690	0.74
0.2234	66.0	1650	0.5391	0.74
0.2234	67.0	1675	0.5607	0.74
0.2234	68.0	1700	0.5451	0.74
0.2234	69.0	1725	0.5675	0.74
0.2234	70.0	1750	0.5486	0.74
0.2234	71.0	1775	0.5502	0.74
0.2234	72.0	1800	0.5445	0.74
0.2234	73.0	1825	0.5577	0.74
0.2234	74.0	1850	0.5533	0.74
0.2234	75.0	1875	0.5534	0.74
0.2234	76.0	1900	0.5549	0.74
0.2234	77.0	1925	0.5495	0.74
0.2234	78.0	1950	0.5492	0.74
0.2234	79.0	1975	0.5488	0.74
0.2032	80.0	2000	0.5491	0.74

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826022757

20230826022757

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826022757

Evaluation results