20230825093306

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1749
Accuracy: 0.7545

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.9098	0.5307
No log	2.0	312	1.3904	0.4765
No log	3.0	468	1.0371	0.4729
0.9793	4.0	624	0.6882	0.5090
0.9793	5.0	780	0.5519	0.5523
0.9793	6.0	936	0.6019	0.5560
0.8653	7.0	1092	0.6463	0.5596
0.8653	8.0	1248	0.4313	0.6245
0.8653	9.0	1404	0.3395	0.6787
0.7164	10.0	1560	0.6637	0.5921
0.7164	11.0	1716	0.2853	0.6859
0.7164	12.0	1872	0.3014	0.7112
0.6696	13.0	2028	0.3778	0.6895
0.6696	14.0	2184	0.2711	0.7184
0.6696	15.0	2340	0.2947	0.6643
0.6696	16.0	2496	0.4965	0.6282
0.5962	17.0	2652	0.3037	0.7184
0.5962	18.0	2808	0.4431	0.7184
0.5962	19.0	2964	0.2407	0.7184
0.5972	20.0	3120	0.2475	0.7148
0.5972	21.0	3276	0.2248	0.7329
0.5972	22.0	3432	0.3476	0.6643
0.567	23.0	3588	0.2318	0.7112
0.567	24.0	3744	0.3517	0.7292
0.567	25.0	3900	0.3102	0.6643
0.5253	26.0	4056	0.2331	0.7148
0.5253	27.0	4212	0.3600	0.7292
0.5253	28.0	4368	0.1932	0.7292
0.5076	29.0	4524	0.1979	0.7292
0.5076	30.0	4680	0.2349	0.7437
0.5076	31.0	4836	0.2877	0.6715
0.5076	32.0	4992	0.2023	0.7401
0.4592	33.0	5148	0.2016	0.7437
0.4592	34.0	5304	0.2073	0.7076
0.4592	35.0	5460	0.2725	0.7617
0.434	36.0	5616	0.3714	0.6534
0.434	37.0	5772	0.2117	0.7112
0.434	38.0	5928	0.2338	0.6968
0.4114	39.0	6084	0.2117	0.7148
0.4114	40.0	6240	0.2254	0.7148
0.4114	41.0	6396	0.1978	0.7509
0.3906	42.0	6552	0.1965	0.7401
0.3906	43.0	6708	0.1828	0.7329
0.3906	44.0	6864	0.1891	0.7473
0.3651	45.0	7020	0.1917	0.7509
0.3651	46.0	7176	0.1888	0.7329
0.3651	47.0	7332	0.2906	0.7690
0.3651	48.0	7488	0.1945	0.7365
0.3358	49.0	7644	0.2083	0.7401
0.3358	50.0	7800	0.1822	0.7437
0.3358	51.0	7956	0.1848	0.7437
0.324	52.0	8112	0.1706	0.7437
0.324	53.0	8268	0.2049	0.7365
0.324	54.0	8424	0.1933	0.7509
0.3105	55.0	8580	0.1782	0.7365
0.3105	56.0	8736	0.1809	0.7365
0.3105	57.0	8892	0.1788	0.7292
0.2976	58.0	9048	0.2209	0.7617
0.2976	59.0	9204	0.1784	0.7473
0.2976	60.0	9360	0.1750	0.7617
0.2867	61.0	9516	0.1884	0.7401
0.2867	62.0	9672	0.1805	0.7509
0.2867	63.0	9828	0.1828	0.7509
0.2867	64.0	9984	0.1863	0.7545
0.2852	65.0	10140	0.1818	0.7581
0.2852	66.0	10296	0.1778	0.7545
0.2852	67.0	10452	0.1908	0.7581
0.2663	68.0	10608	0.1799	0.7545
0.2663	69.0	10764	0.1808	0.7581
0.2663	70.0	10920	0.1797	0.7437
0.2681	71.0	11076	0.1835	0.7581
0.2681	72.0	11232	0.1812	0.7581
0.2681	73.0	11388	0.1799	0.7617
0.2564	74.0	11544	0.1874	0.7581
0.2564	75.0	11700	0.1766	0.7581
0.2564	76.0	11856	0.1782	0.7545
0.2633	77.0	12012	0.1772	0.7545
0.2633	78.0	12168	0.1743	0.7617
0.2633	79.0	12324	0.1749	0.7545
0.2633	80.0	12480	0.1749	0.7545

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230825093306

20230825093306

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230825093306

Evaluation results