20230825183835

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3648
Accuracy: 0.7473

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.8052	0.5307
No log	2.0	312	0.6957	0.4801
No log	3.0	468	0.9722	0.4801
0.8916	4.0	624	0.7219	0.5560
0.8916	5.0	780	0.5572	0.5921
0.8916	6.0	936	0.4803	0.6534
0.8141	7.0	1092	0.6885	0.6318
0.8141	8.0	1248	0.4588	0.6895
0.8141	9.0	1404	1.0159	0.4729
0.7176	10.0	1560	0.4835	0.6823
0.7176	11.0	1716	0.5513	0.6823
0.7176	12.0	1872	0.4150	0.7184
0.6445	13.0	2028	0.4789	0.7148
0.6445	14.0	2184	0.4414	0.7220
0.6445	15.0	2340	0.3778	0.6968
0.6445	16.0	2496	0.5422	0.6823
0.6267	17.0	2652	0.3654	0.7220
0.6267	18.0	2808	0.7434	0.6390
0.6267	19.0	2964	0.3713	0.7112
0.5715	20.0	3120	0.3942	0.6931
0.5715	21.0	3276	0.3785	0.7112
0.5715	22.0	3432	0.5429	0.6570
0.5015	23.0	3588	0.3600	0.7365
0.5015	24.0	3744	0.4567	0.7473
0.5015	25.0	3900	0.3680	0.7148
0.4739	26.0	4056	0.3348	0.7292
0.4739	27.0	4212	0.4191	0.7437
0.4739	28.0	4368	0.4034	0.7401
0.4139	29.0	4524	0.3887	0.7112
0.4139	30.0	4680	0.4222	0.7004
0.4139	31.0	4836	0.3804	0.7220
0.4139	32.0	4992	0.3842	0.7256
0.3958	33.0	5148	0.3851	0.7365
0.3958	34.0	5304	0.4758	0.7040
0.3958	35.0	5460	0.3569	0.7473
0.3561	36.0	5616	0.3971	0.7256
0.3561	37.0	5772	0.4006	0.7545
0.3561	38.0	5928	0.5292	0.7220
0.3349	39.0	6084	0.4014	0.7329
0.3349	40.0	6240	0.3285	0.7473
0.3349	41.0	6396	0.3665	0.7581
0.2946	42.0	6552	0.3843	0.7690
0.2946	43.0	6708	0.3634	0.7509
0.2946	44.0	6864	0.3518	0.7437
0.2813	45.0	7020	0.4009	0.7473
0.2813	46.0	7176	0.4073	0.7653
0.2813	47.0	7332	0.3974	0.7473
0.2813	48.0	7488	0.4134	0.7437
0.2601	49.0	7644	0.3661	0.7437
0.2601	50.0	7800	0.3733	0.7437
0.2601	51.0	7956	0.3425	0.7509
0.242	52.0	8112	0.4186	0.7473
0.242	53.0	8268	0.4262	0.7401
0.242	54.0	8424	0.3627	0.7437
0.2356	55.0	8580	0.3966	0.7473
0.2356	56.0	8736	0.3819	0.7509
0.2356	57.0	8892	0.4087	0.7473
0.2198	58.0	9048	0.3691	0.7365
0.2198	59.0	9204	0.4938	0.7437
0.2198	60.0	9360	0.4097	0.7581
0.1995	61.0	9516	0.3870	0.7509
0.1995	62.0	9672	0.4417	0.7473
0.1995	63.0	9828	0.3596	0.7509
0.1995	64.0	9984	0.3483	0.7473
0.1933	65.0	10140	0.4424	0.7545
0.1933	66.0	10296	0.3443	0.7437
0.1933	67.0	10452	0.3820	0.7437
0.1898	68.0	10608	0.3889	0.7473
0.1898	69.0	10764	0.3841	0.7437
0.1898	70.0	10920	0.4081	0.7581
0.1813	71.0	11076	0.3680	0.7473
0.1813	72.0	11232	0.3775	0.7473
0.1813	73.0	11388	0.3713	0.7473
0.1688	74.0	11544	0.3765	0.7473
0.1688	75.0	11700	0.3580	0.7509
0.1688	76.0	11856	0.3485	0.7437
0.1663	77.0	12012	0.3601	0.7509
0.1663	78.0	12168	0.3721	0.7509
0.1663	79.0	12324	0.3633	0.7473
0.1663	80.0	12480	0.3648	0.7473

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230825183835

20230825183835

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230825183835

Evaluation results