20230825183837

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5509
Accuracy: 0.7401

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	1.0350	0.5307
No log	2.0	312	0.7083	0.5199
No log	3.0	468	0.8268	0.4801
0.9653	4.0	624	0.7385	0.5199
0.9653	5.0	780	0.6701	0.5271
0.9653	6.0	936	0.6090	0.6029
0.8296	7.0	1092	0.5400	0.6282
0.8296	8.0	1248	0.5084	0.6715
0.8296	9.0	1404	0.5534	0.6606
0.7744	10.0	1560	0.4802	0.6895
0.7744	11.0	1716	0.5757	0.6715
0.7744	12.0	1872	0.5599	0.6787
0.6735	13.0	2028	0.4614	0.7220
0.6735	14.0	2184	0.4656	0.7004
0.6735	15.0	2340	0.5463	0.6859
0.6735	16.0	2496	0.5148	0.6968
0.642	17.0	2652	0.4414	0.7292
0.642	18.0	2808	0.6131	0.6931
0.642	19.0	2964	0.4674	0.7184
0.6495	20.0	3120	0.5114	0.7004
0.6495	21.0	3276	0.4827	0.7365
0.6495	22.0	3432	0.7846	0.6245
0.5629	23.0	3588	0.4956	0.7148
0.5629	24.0	3744	0.4705	0.7617
0.5629	25.0	3900	0.4782	0.7220
0.5208	26.0	4056	0.4177	0.7365
0.5208	27.0	4212	0.6597	0.6931
0.5208	28.0	4368	0.5945	0.6931
0.5051	29.0	4524	0.5733	0.7184
0.5051	30.0	4680	0.4994	0.7437
0.5051	31.0	4836	0.5630	0.6895
0.5051	32.0	4992	0.5061	0.7437
0.4822	33.0	5148	0.5961	0.6968
0.4822	34.0	5304	0.5072	0.7329
0.4822	35.0	5460	0.5716	0.7473
0.4437	36.0	5616	0.5670	0.7076
0.4437	37.0	5772	0.5414	0.7112
0.4437	38.0	5928	0.5748	0.6931
0.436	39.0	6084	0.5068	0.7545
0.436	40.0	6240	0.5532	0.7076
0.436	41.0	6396	0.5705	0.7545
0.3882	42.0	6552	0.5622	0.7545
0.3882	43.0	6708	0.5511	0.7112
0.3882	44.0	6864	0.5306	0.7473
0.3639	45.0	7020	0.5418	0.7148
0.3639	46.0	7176	0.5856	0.7256
0.3639	47.0	7332	0.5920	0.7581
0.3639	48.0	7488	0.6323	0.7112
0.3344	49.0	7644	0.5837	0.7256
0.3344	50.0	7800	0.5591	0.7329
0.3344	51.0	7956	0.6241	0.7401
0.3131	52.0	8112	0.5855	0.7365
0.3131	53.0	8268	0.5593	0.7401
0.3131	54.0	8424	0.5920	0.7401
0.319	55.0	8580	0.5000	0.7401
0.319	56.0	8736	0.6601	0.7004
0.319	57.0	8892	0.7536	0.7076
0.2995	58.0	9048	0.5308	0.7256
0.2995	59.0	9204	0.7136	0.7365
0.2995	60.0	9360	0.5192	0.7581
0.2865	61.0	9516	0.5491	0.7365
0.2865	62.0	9672	0.5884	0.7292
0.2865	63.0	9828	0.5730	0.7329
0.2865	64.0	9984	0.5539	0.7365
0.2779	65.0	10140	0.5626	0.7401
0.2779	66.0	10296	0.5826	0.7545
0.2779	67.0	10452	0.6070	0.7473
0.2621	68.0	10608	0.5399	0.7509
0.2621	69.0	10764	0.5598	0.7437
0.2621	70.0	10920	0.5688	0.7401
0.2549	71.0	11076	0.5407	0.7437
0.2549	72.0	11232	0.5516	0.7473
0.2549	73.0	11388	0.5699	0.7148
0.2453	74.0	11544	0.5284	0.7437
0.2453	75.0	11700	0.5615	0.7401
0.2453	76.0	11856	0.5336	0.7365
0.2478	77.0	12012	0.5502	0.7401
0.2478	78.0	12168	0.5507	0.7401
0.2478	79.0	12324	0.5451	0.7401
0.2478	80.0	12480	0.5509	0.7401

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230825183837

20230825183837

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230825183837

Evaluation results