20230826064921

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.2753
Accuracy: 0.71

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.02
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.3969	0.6
No log	2.0	50	0.4709	0.5
No log	3.0	75	0.3341	0.42
No log	4.0	100	0.3011	0.54
No log	5.0	125	0.3119	0.36
No log	6.0	150	0.3297	0.37
No log	7.0	175	0.2928	0.53
No log	8.0	200	0.3079	0.63
No log	9.0	225	0.2875	0.61
No log	10.0	250	0.2906	0.54
No log	11.0	275	0.2904	0.62
No log	12.0	300	0.2946	0.52
No log	13.0	325	0.2942	0.51
No log	14.0	350	0.2935	0.56
No log	15.0	375	0.2913	0.58
No log	16.0	400	0.2886	0.6
No log	17.0	425	0.2900	0.6
No log	18.0	450	0.2874	0.59
No log	19.0	475	0.2910	0.6
0.6674	20.0	500	0.2931	0.47
0.6674	21.0	525	0.2909	0.51
0.6674	22.0	550	0.2855	0.62
0.6674	23.0	575	0.2881	0.61
0.6674	24.0	600	0.2878	0.6
0.6674	25.0	625	0.2874	0.57
0.6674	26.0	650	0.2857	0.54
0.6674	27.0	675	0.2871	0.6
0.6674	28.0	700	0.2864	0.59
0.6674	29.0	725	0.2862	0.62
0.6674	30.0	750	0.2866	0.58
0.6674	31.0	775	0.2837	0.63
0.6674	32.0	800	0.2859	0.58
0.6674	33.0	825	0.2841	0.59
0.6674	34.0	850	0.2878	0.62
0.6674	35.0	875	0.2889	0.61
0.6674	36.0	900	0.2830	0.59
0.6674	37.0	925	0.2824	0.59
0.6674	38.0	950	0.2801	0.63
0.6674	39.0	975	0.2931	0.65
0.5477	40.0	1000	0.2788	0.64
0.5477	41.0	1025	0.2892	0.63
0.5477	42.0	1050	0.2937	0.58
0.5477	43.0	1075	0.2886	0.66
0.5477	44.0	1100	0.2842	0.62
0.5477	45.0	1125	0.2857	0.6
0.5477	46.0	1150	0.2834	0.62
0.5477	47.0	1175	0.2824	0.56
0.5477	48.0	1200	0.2866	0.65
0.5477	49.0	1225	0.2801	0.63
0.5477	50.0	1250	0.2851	0.62
0.5477	51.0	1275	0.2829	0.6
0.5477	52.0	1300	0.2900	0.59
0.5477	53.0	1325	0.2782	0.59
0.5477	54.0	1350	0.2793	0.59
0.5477	55.0	1375	0.2809	0.6
0.5477	56.0	1400	0.2815	0.64
0.5477	57.0	1425	0.2798	0.68
0.5477	58.0	1450	0.2831	0.67
0.5477	59.0	1475	0.2795	0.66
0.4601	60.0	1500	0.2747	0.68
0.4601	61.0	1525	0.2725	0.73
0.4601	62.0	1550	0.2840	0.66
0.4601	63.0	1575	0.2739	0.67
0.4601	64.0	1600	0.2796	0.69
0.4601	65.0	1625	0.2782	0.65
0.4601	66.0	1650	0.2757	0.7
0.4601	67.0	1675	0.2759	0.69
0.4601	68.0	1700	0.2779	0.67
0.4601	69.0	1725	0.2822	0.67
0.4601	70.0	1750	0.2813	0.65
0.4601	71.0	1775	0.2818	0.68
0.4601	72.0	1800	0.2865	0.69
0.4601	73.0	1825	0.2770	0.71
0.4601	74.0	1850	0.2822	0.69
0.4601	75.0	1875	0.2783	0.71
0.4601	76.0	1900	0.2764	0.71
0.4601	77.0	1925	0.2772	0.69
0.4601	78.0	1950	0.2759	0.7
0.4601	79.0	1975	0.2751	0.72
0.4329	80.0	2000	0.2753	0.71

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826064921

20230826064921

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826064921

Evaluation results