20230826081833

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6393
Accuracy: 0.69

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.05
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.7348	0.6
No log	2.0	50	0.6045	0.61
No log	3.0	75	0.9239	0.62
No log	4.0	100	0.6379	0.69
No log	5.0	125	0.5724	0.72
No log	6.0	150	1.2083	0.69
No log	7.0	175	1.3074	0.67
No log	8.0	200	1.1626	0.7
No log	9.0	225	1.0019	0.64
No log	10.0	250	0.6240	0.73
No log	11.0	275	1.0829	0.66
No log	12.0	300	0.8053	0.66
No log	13.0	325	1.1526	0.63
No log	14.0	350	1.2006	0.69
No log	15.0	375	1.1382	0.67
No log	16.0	400	1.1345	0.71
No log	17.0	425	1.5029	0.67
No log	18.0	450	1.3780	0.67
No log	19.0	475	1.1811	0.66
1.3151	20.0	500	1.2461	0.7
1.3151	21.0	525	1.2269	0.68
1.3151	22.0	550	1.1515	0.68
1.3151	23.0	575	0.9944	0.66
1.3151	24.0	600	1.2708	0.67
1.3151	25.0	625	1.5817	0.65
1.3151	26.0	650	1.0934	0.71
1.3151	27.0	675	1.4179	0.67
1.3151	28.0	700	1.4260	0.65
1.3151	29.0	725	1.3818	0.65
1.3151	30.0	750	1.7166	0.66
1.3151	31.0	775	1.1710	0.64
1.3151	32.0	800	1.0660	0.64
1.3151	33.0	825	1.0127	0.69
1.3151	34.0	850	0.9810	0.68
1.3151	35.0	875	1.1077	0.7
1.3151	36.0	900	1.0629	0.66
1.3151	37.0	925	1.5933	0.69
1.3151	38.0	950	1.1322	0.71
1.3151	39.0	975	1.0735	0.73
0.6791	40.0	1000	0.8940	0.72
0.6791	41.0	1025	0.9349	0.67
0.6791	42.0	1050	0.8962	0.67
0.6791	43.0	1075	1.0663	0.69
0.6791	44.0	1100	0.9681	0.69
0.6791	45.0	1125	0.7694	0.68
0.6791	46.0	1150	1.0311	0.71
0.6791	47.0	1175	0.7407	0.7
0.6791	48.0	1200	0.6861	0.69
0.6791	49.0	1225	0.9920	0.69
0.6791	50.0	1250	0.7187	0.69
0.6791	51.0	1275	0.7602	0.72
0.6791	52.0	1300	0.7285	0.69
0.6791	53.0	1325	0.8233	0.68
0.6791	54.0	1350	0.7932	0.7
0.6791	55.0	1375	0.8861	0.71
0.6791	56.0	1400	0.7877	0.71
0.6791	57.0	1425	0.7689	0.7
0.6791	58.0	1450	0.7919	0.7
0.6791	59.0	1475	0.7441	0.7
0.3594	60.0	1500	0.8327	0.69
0.3594	61.0	1525	0.6414	0.71
0.3594	62.0	1550	0.6702	0.71
0.3594	63.0	1575	0.6862	0.71
0.3594	64.0	1600	0.6349	0.68
0.3594	65.0	1625	0.6800	0.69
0.3594	66.0	1650	0.7005	0.69
0.3594	67.0	1675	0.7058	0.71
0.3594	68.0	1700	0.6880	0.73
0.3594	69.0	1725	0.6774	0.72
0.3594	70.0	1750	0.6816	0.73
0.3594	71.0	1775	0.7138	0.72
0.3594	72.0	1800	0.6311	0.69
0.3594	73.0	1825	0.6579	0.69
0.3594	74.0	1850	0.6956	0.69
0.3594	75.0	1875	0.6341	0.69
0.3594	76.0	1900	0.6722	0.7
0.3594	77.0	1925	0.6459	0.7
0.3594	78.0	1950	0.6351	0.68
0.3594	79.0	1975	0.6436	0.68
0.2323	80.0	2000	0.6393	0.69

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826081833

20230826081833

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826081833

Evaluation results