20230825071702

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.2804
Accuracy: 0.7617

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.6793	0.5307
No log	2.0	312	0.9039	0.4765
No log	3.0	468	0.7107	0.4729
0.8982	4.0	624	0.6969	0.5199
0.8982	5.0	780	0.5729	0.5560
0.8982	6.0	936	0.6447	0.5596
0.8495	7.0	1092	0.6093	0.5921
0.8495	8.0	1248	0.4289	0.6679
0.8495	9.0	1404	0.4954	0.6282
0.751	10.0	1560	0.3952	0.6715
0.751	11.0	1716	0.6147	0.6462
0.751	12.0	1872	0.4183	0.7004
0.6407	13.0	2028	0.3743	0.6968
0.6407	14.0	2184	0.3907	0.7292
0.6407	15.0	2340	0.3409	0.7148
0.6407	16.0	2496	0.5288	0.6426
0.6476	17.0	2652	0.4492	0.7220
0.6476	18.0	2808	0.3312	0.7220
0.6476	19.0	2964	0.4062	0.6606
0.6425	20.0	3120	0.3715	0.6859
0.6425	21.0	3276	0.3305	0.7256
0.6425	22.0	3432	0.6557	0.6245
0.5658	23.0	3588	0.3943	0.6859
0.5658	24.0	3744	0.3394	0.7040
0.5658	25.0	3900	0.4640	0.6823
0.5333	26.0	4056	0.3419	0.7220
0.5333	27.0	4212	0.3646	0.7112
0.5333	28.0	4368	0.3626	0.7184
0.5164	29.0	4524	0.3215	0.7473
0.5164	30.0	4680	0.2941	0.7581
0.5164	31.0	4836	0.4957	0.6173
0.5164	32.0	4992	0.3362	0.7329
0.4676	33.0	5148	0.3116	0.7437
0.4676	34.0	5304	0.3344	0.7401
0.4676	35.0	5460	0.4769	0.7220
0.4443	36.0	5616	0.2822	0.7509
0.4443	37.0	5772	0.3748	0.6859
0.4443	38.0	5928	0.2989	0.7509
0.4179	39.0	6084	0.3193	0.7292
0.4179	40.0	6240	0.3725	0.6715
0.4179	41.0	6396	0.3336	0.7509
0.3974	42.0	6552	0.2967	0.7365
0.3974	43.0	6708	0.2908	0.7545
0.3974	44.0	6864	0.2887	0.7473
0.3774	45.0	7020	0.3012	0.7401
0.3774	46.0	7176	0.3437	0.7509
0.3774	47.0	7332	0.3390	0.7292
0.3774	48.0	7488	0.2952	0.7473
0.3419	49.0	7644	0.3116	0.7401
0.3419	50.0	7800	0.2856	0.7473
0.3419	51.0	7956	0.3227	0.7256
0.3275	52.0	8112	0.2861	0.7509
0.3275	53.0	8268	0.3534	0.7401
0.3275	54.0	8424	0.3395	0.7256
0.3225	55.0	8580	0.3113	0.7401
0.3225	56.0	8736	0.2932	0.7473
0.3225	57.0	8892	0.4312	0.7112
0.3104	58.0	9048	0.3085	0.7509
0.3104	59.0	9204	0.3164	0.7545
0.3104	60.0	9360	0.2758	0.7473
0.3164	61.0	9516	0.3183	0.7220
0.3164	62.0	9672	0.3571	0.7220
0.3164	63.0	9828	0.3156	0.7365
0.3164	64.0	9984	0.2756	0.7653
0.2939	65.0	10140	0.2859	0.7437
0.2939	66.0	10296	0.2934	0.7545
0.2939	67.0	10452	0.2977	0.7690
0.2826	68.0	10608	0.2871	0.7653
0.2826	69.0	10764	0.2903	0.7653
0.2826	70.0	10920	0.2974	0.7581
0.2663	71.0	11076	0.2778	0.7509
0.2663	72.0	11232	0.2849	0.7365
0.2663	73.0	11388	0.2970	0.7653
0.2637	74.0	11544	0.3025	0.7545
0.2637	75.0	11700	0.2793	0.7617
0.2637	76.0	11856	0.2778	0.7545
0.2699	77.0	12012	0.2861	0.7617
0.2699	78.0	12168	0.2857	0.7690
0.2699	79.0	12324	0.2774	0.7617
0.2699	80.0	12480	0.2804	0.7617

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230825071702

20230825071702

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230825071702

Evaluation results