20230826083203

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.2932
Accuracy: 0.6

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.05
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.5569	0.62
No log	2.0	50	0.3272	0.4
No log	3.0	75	0.2999	0.48
No log	4.0	100	0.3037	0.58
No log	5.0	125	0.3092	0.39
No log	6.0	150	0.3147	0.37
No log	7.0	175	0.2872	0.61
No log	8.0	200	0.2897	0.68
No log	9.0	225	0.2950	0.41
No log	10.0	250	0.2779	0.63
No log	11.0	275	0.2977	0.41
No log	12.0	300	0.2909	0.59
No log	13.0	325	0.2940	0.49
No log	14.0	350	0.2929	0.49
No log	15.0	375	0.2948	0.49
No log	16.0	400	0.2935	0.57
No log	17.0	425	0.2949	0.43
No log	18.0	450	0.2925	0.59
No log	19.0	475	0.2927	0.57
1.2287	20.0	500	0.2934	0.58
1.2287	21.0	525	0.2947	0.44
1.2287	22.0	550	0.2934	0.6
1.2287	23.0	575	0.2930	0.6
1.2287	24.0	600	0.2944	0.4
1.2287	25.0	625	0.2970	0.39
1.2287	26.0	650	0.2949	0.39
1.2287	27.0	675	0.2942	0.43
1.2287	28.0	700	0.2940	0.43
1.2287	29.0	725	0.2933	0.58
1.2287	30.0	750	0.2930	0.62
1.2287	31.0	775	0.2934	0.6
1.2287	32.0	800	0.2934	0.57
1.2287	33.0	825	0.2932	0.54
1.2287	34.0	850	0.2921	0.54
1.2287	35.0	875	0.2950	0.44
1.2287	36.0	900	0.2944	0.41
1.2287	37.0	925	0.2941	0.43
1.2287	38.0	950	0.2930	0.55
1.2287	39.0	975	0.2932	0.57
0.8805	40.0	1000	0.2923	0.57
0.8805	41.0	1025	0.2932	0.61
0.8805	42.0	1050	0.2936	0.46
0.8805	43.0	1075	0.2924	0.55
0.8805	44.0	1100	0.2937	0.44
0.8805	45.0	1125	0.2927	0.55
0.8805	46.0	1150	0.2923	0.56
0.8805	47.0	1175	0.2930	0.6
0.8805	48.0	1200	0.2936	0.43
0.8805	49.0	1225	0.2935	0.56
0.8805	50.0	1250	0.2937	0.46
0.8805	51.0	1275	0.2929	0.59
0.8805	52.0	1300	0.2932	0.55
0.8805	53.0	1325	0.2940	0.48
0.8805	54.0	1350	0.2933	0.53
0.8805	55.0	1375	0.2934	0.55
0.8805	56.0	1400	0.2936	0.49
0.8805	57.0	1425	0.2928	0.59
0.8805	58.0	1450	0.2927	0.53
0.8805	59.0	1475	0.2930	0.6
0.6612	60.0	1500	0.2936	0.47
0.6612	61.0	1525	0.2933	0.53
0.6612	62.0	1550	0.2932	0.62
0.6612	63.0	1575	0.2937	0.41
0.6612	64.0	1600	0.2932	0.54
0.6612	65.0	1625	0.2940	0.42
0.6612	66.0	1650	0.2931	0.56
0.6612	67.0	1675	0.2937	0.36
0.6612	68.0	1700	0.2930	0.63
0.6612	69.0	1725	0.2934	0.63
0.6612	70.0	1750	0.2937	0.36
0.6612	71.0	1775	0.2930	0.63
0.6612	72.0	1800	0.2932	0.63
0.6612	73.0	1825	0.2930	0.61
0.6612	74.0	1850	0.2932	0.53
0.6612	75.0	1875	0.2932	0.58
0.6612	76.0	1900	0.2935	0.53
0.6612	77.0	1925	0.2931	0.62
0.6612	78.0	1950	0.2933	0.54
0.6612	79.0	1975	0.2932	0.61
0.5295	80.0	2000	0.2932	0.6

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826083203

20230826083203

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826083203

Evaluation results