20230824164344

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.9440
Accuracy: 0.7329

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.7410	0.5162
No log	2.0	312	1.0443	0.4729
No log	3.0	468	0.6773	0.5054
0.9803	4.0	624	0.8278	0.5343
0.9803	5.0	780	0.6367	0.6137
0.9803	6.0	936	0.6217	0.6426
0.8339	7.0	1092	1.2109	0.5776
0.8339	8.0	1248	0.5718	0.6859
0.8339	9.0	1404	0.7100	0.6606
0.7334	10.0	1560	1.3794	0.5993
0.7334	11.0	1716	0.7077	0.5668
0.7334	12.0	1872	0.5683	0.7040
0.6828	13.0	2028	0.5391	0.7329
0.6828	14.0	2184	0.7041	0.7292
0.6828	15.0	2340	0.7170	0.6679
0.6828	16.0	2496	1.1745	0.6029
0.622	17.0	2652	0.6299	0.7112
0.622	18.0	2808	0.5566	0.7437
0.622	19.0	2964	0.5614	0.7509
0.5718	20.0	3120	1.6971	0.6390
0.5718	21.0	3276	0.6663	0.7076
0.5718	22.0	3432	0.6859	0.6498
0.5554	23.0	3588	0.7722	0.7112
0.5554	24.0	3744	0.6040	0.7256
0.5554	25.0	3900	0.8333	0.7329
0.4565	26.0	4056	0.5782	0.7220
0.4565	27.0	4212	0.6536	0.6968
0.4565	28.0	4368	0.8468	0.7292
0.4326	29.0	4524	0.7304	0.7148
0.4326	30.0	4680	0.8690	0.6968
0.4326	31.0	4836	0.8080	0.7148
0.4326	32.0	4992	0.6306	0.7292
0.3528	33.0	5148	0.8862	0.7220
0.3528	34.0	5304	0.8333	0.7365
0.3528	35.0	5460	0.6612	0.7329
0.3155	36.0	5616	0.7407	0.7401
0.3155	37.0	5772	0.8019	0.7365
0.3155	38.0	5928	0.9540	0.7401
0.2632	39.0	6084	0.9973	0.7365
0.2632	40.0	6240	0.7745	0.7401
0.2632	41.0	6396	0.7636	0.7473
0.2516	42.0	6552	0.8117	0.7401
0.2516	43.0	6708	0.8688	0.7329
0.2516	44.0	6864	0.8390	0.7509
0.219	45.0	7020	0.9181	0.7401
0.219	46.0	7176	0.8596	0.7509
0.219	47.0	7332	0.9130	0.7437
0.219	48.0	7488	0.9129	0.7437
0.2039	49.0	7644	0.7271	0.7545
0.2039	50.0	7800	0.8405	0.7437
0.2039	51.0	7956	0.8249	0.7653
0.1809	52.0	8112	0.8916	0.7581
0.1809	53.0	8268	0.9851	0.7437
0.1809	54.0	8424	0.8449	0.7653
0.1588	55.0	8580	0.8400	0.7437
0.1588	56.0	8736	0.9869	0.7473
0.1588	57.0	8892	0.7289	0.7509
0.1563	58.0	9048	0.9168	0.7437
0.1563	59.0	9204	1.0048	0.7401
0.1563	60.0	9360	0.9174	0.7581
0.1434	61.0	9516	1.0328	0.7437
0.1434	62.0	9672	0.9543	0.7509
0.1434	63.0	9828	0.9841	0.7509
0.1434	64.0	9984	0.9057	0.7509
0.1345	65.0	10140	0.9597	0.7509
0.1345	66.0	10296	0.9686	0.7509
0.1345	67.0	10452	0.9621	0.7581
0.1363	68.0	10608	1.0869	0.7292
0.1363	69.0	10764	1.0265	0.7365
0.1363	70.0	10920	0.9629	0.7509
0.1166	71.0	11076	0.8672	0.7509
0.1166	72.0	11232	0.9515	0.7401
0.1166	73.0	11388	0.9453	0.7401
0.1196	74.0	11544	0.9168	0.7473
0.1196	75.0	11700	0.9455	0.7437
0.1196	76.0	11856	0.9246	0.7437
0.1184	77.0	12012	1.0048	0.7329
0.1184	78.0	12168	0.9510	0.7329
0.1184	79.0	12324	0.9356	0.7365
0.1184	80.0	12480	0.9440	0.7329

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824164344

20230824164344

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824164344

Evaluation results