20230826114726

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.2883
Accuracy: 0.59

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.2910	0.6
No log	2.0	50	0.2911	0.64
No log	3.0	75	0.2875	0.65
No log	4.0	100	0.2909	0.62
No log	5.0	125	0.2935	0.62
No log	6.0	150	0.2977	0.58
No log	7.0	175	0.2854	0.65
No log	8.0	200	0.2900	0.65
No log	9.0	225	0.2985	0.53
No log	10.0	250	0.2906	0.64
No log	11.0	275	0.2979	0.63
No log	12.0	300	0.2891	0.63
No log	13.0	325	0.2885	0.63
No log	14.0	350	0.2904	0.64
No log	15.0	375	0.3056	0.58
No log	16.0	400	0.2860	0.65
No log	17.0	425	0.2887	0.62
No log	18.0	450	0.2968	0.59
No log	19.0	475	0.2927	0.51
0.4646	20.0	500	0.2887	0.59
0.4646	21.0	525	0.2917	0.62
0.4646	22.0	550	0.2940	0.53
0.4646	23.0	575	0.2914	0.58
0.4646	24.0	600	0.2875	0.61
0.4646	25.0	625	0.2928	0.63
0.4646	26.0	650	0.2887	0.57
0.4646	27.0	675	0.2871	0.58
0.4646	28.0	700	0.2925	0.64
0.4646	29.0	725	0.2963	0.6
0.4646	30.0	750	0.2922	0.56
0.4646	31.0	775	0.2902	0.59
0.4646	32.0	800	0.2885	0.59
0.4646	33.0	825	0.2940	0.57
0.4646	34.0	850	0.2912	0.53
0.4646	35.0	875	0.2879	0.59
0.4646	36.0	900	0.2880	0.59
0.4646	37.0	925	0.2945	0.47
0.4646	38.0	950	0.2918	0.6
0.4646	39.0	975	0.2887	0.58
0.4656	40.0	1000	0.2874	0.59
0.4656	41.0	1025	0.2898	0.56
0.4656	42.0	1050	0.2897	0.59
0.4656	43.0	1075	0.2924	0.5
0.4656	44.0	1100	0.2898	0.58
0.4656	45.0	1125	0.2921	0.58
0.4656	46.0	1150	0.2895	0.56
0.4656	47.0	1175	0.2862	0.59
0.4656	48.0	1200	0.2869	0.57
0.4656	49.0	1225	0.2855	0.61
0.4656	50.0	1250	0.2859	0.59
0.4656	51.0	1275	0.2899	0.58
0.4656	52.0	1300	0.2851	0.59
0.4656	53.0	1325	0.2852	0.61
0.4656	54.0	1350	0.2887	0.6
0.4656	55.0	1375	0.2870	0.59
0.4656	56.0	1400	0.2895	0.63
0.4656	57.0	1425	0.2893	0.62
0.4656	58.0	1450	0.2891	0.63
0.4656	59.0	1475	0.2890	0.62
0.4637	60.0	1500	0.2890	0.62
0.4637	61.0	1525	0.2883	0.59
0.4637	62.0	1550	0.2882	0.58
0.4637	63.0	1575	0.2883	0.63
0.4637	64.0	1600	0.2884	0.59
0.4637	65.0	1625	0.2876	0.63
0.4637	66.0	1650	0.2871	0.62
0.4637	67.0	1675	0.2879	0.6
0.4637	68.0	1700	0.2879	0.58
0.4637	69.0	1725	0.2877	0.59
0.4637	70.0	1750	0.2871	0.6
0.4637	71.0	1775	0.2875	0.6
0.4637	72.0	1800	0.2870	0.59
0.4637	73.0	1825	0.2875	0.59
0.4637	74.0	1850	0.2879	0.59
0.4637	75.0	1875	0.2887	0.59
0.4637	76.0	1900	0.2883	0.59
0.4637	77.0	1925	0.2882	0.58
0.4637	78.0	1950	0.2883	0.59
0.4637	79.0	1975	0.2884	0.59
0.4587	80.0	2000	0.2883	0.59

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826114726

20230826114726

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826114726

Evaluation results