20230826035826

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.2806
Accuracy: 0.72

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.3229	0.4
No log	2.0	50	0.3507	0.63
No log	3.0	75	0.3165	0.39
No log	4.0	100	0.3159	0.59
No log	5.0	125	0.3276	0.35
No log	6.0	150	0.3255	0.37
No log	7.0	175	0.2893	0.63
No log	8.0	200	0.3066	0.63
No log	9.0	225	0.3015	0.64
No log	10.0	250	0.2933	0.62
No log	11.0	275	0.2953	0.45
No log	12.0	300	0.2943	0.62
No log	13.0	325	0.2867	0.62
No log	14.0	350	0.2882	0.59
No log	15.0	375	0.2922	0.63
No log	16.0	400	0.2895	0.59
No log	17.0	425	0.2901	0.65
No log	18.0	450	0.2877	0.64
No log	19.0	475	0.2909	0.6
0.5537	20.0	500	0.2871	0.62
0.5537	21.0	525	0.2855	0.61
0.5537	22.0	550	0.2863	0.64
0.5537	23.0	575	0.2859	0.61
0.5537	24.0	600	0.2854	0.6
0.5537	25.0	625	0.2839	0.59
0.5537	26.0	650	0.2859	0.56
0.5537	27.0	675	0.2821	0.58
0.5537	28.0	700	0.2831	0.64
0.5537	29.0	725	0.2813	0.66
0.5537	30.0	750	0.2812	0.67
0.5537	31.0	775	0.2790	0.64
0.5537	32.0	800	0.2801	0.64
0.5537	33.0	825	0.2805	0.65
0.5537	34.0	850	0.2850	0.64
0.5537	35.0	875	0.2781	0.66
0.5537	36.0	900	0.2800	0.65
0.5537	37.0	925	0.2864	0.64
0.5537	38.0	950	0.2816	0.65
0.5537	39.0	975	0.2886	0.67
0.5047	40.0	1000	0.3101	0.67
0.5047	41.0	1025	0.2826	0.66
0.5047	42.0	1050	0.2801	0.62
0.5047	43.0	1075	0.2907	0.68
0.5047	44.0	1100	0.2894	0.64
0.5047	45.0	1125	0.2855	0.68
0.5047	46.0	1150	0.2811	0.67
0.5047	47.0	1175	0.2947	0.7
0.5047	48.0	1200	0.2952	0.69
0.5047	49.0	1225	0.2832	0.69
0.5047	50.0	1250	0.2954	0.68
0.5047	51.0	1275	0.2840	0.68
0.5047	52.0	1300	0.3079	0.67
0.5047	53.0	1325	0.2796	0.66
0.5047	54.0	1350	0.2862	0.67
0.5047	55.0	1375	0.2853	0.69
0.5047	56.0	1400	0.2969	0.69
0.5047	57.0	1425	0.2866	0.69
0.5047	58.0	1450	0.2895	0.69
0.5047	59.0	1475	0.3058	0.69
0.4502	60.0	1500	0.2998	0.68
0.4502	61.0	1525	0.2974	0.69
0.4502	62.0	1550	0.2788	0.69
0.4502	63.0	1575	0.2882	0.69
0.4502	64.0	1600	0.2893	0.7
0.4502	65.0	1625	0.2834	0.7
0.4502	66.0	1650	0.2889	0.72
0.4502	67.0	1675	0.2851	0.73
0.4502	68.0	1700	0.2773	0.7
0.4502	69.0	1725	0.2855	0.72
0.4502	70.0	1750	0.2903	0.69
0.4502	71.0	1775	0.2851	0.7
0.4502	72.0	1800	0.2892	0.69
0.4502	73.0	1825	0.2811	0.71
0.4502	74.0	1850	0.2881	0.71
0.4502	75.0	1875	0.2892	0.71
0.4502	76.0	1900	0.2835	0.71
0.4502	77.0	1925	0.2800	0.72
0.4502	78.0	1950	0.2809	0.72
0.4502	79.0	1975	0.2801	0.71
0.4329	80.0	2000	0.2806	0.72

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826035826

20230826035826

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826035826

Evaluation results