20230824164412

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 1.4601
Accuracy: 0.7617

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	1.2623	0.4729
No log	2.0	312	1.7988	0.4729
No log	3.0	468	1.1894	0.5596
1.5423	4.0	624	1.1452	0.6029
1.5423	5.0	780	1.5302	0.5704
1.5423	6.0	936	1.0779	0.6643
1.2833	7.0	1092	1.3023	0.6643
1.2833	8.0	1248	1.0901	0.6787
1.2833	9.0	1404	1.0524	0.7040
1.137	10.0	1560	1.1486	0.7040
1.137	11.0	1716	0.9741	0.7220
1.137	12.0	1872	0.9392	0.7401
1.0902	13.0	2028	0.9919	0.7329
1.0902	14.0	2184	0.9693	0.7292
1.0902	15.0	2340	1.3303	0.6570
1.0902	16.0	2496	1.6827	0.6245
0.9851	17.0	2652	1.0073	0.7220
0.9851	18.0	2808	1.0058	0.7220
0.9851	19.0	2964	1.0158	0.7437
0.8583	20.0	3120	1.9128	0.6679
0.8583	21.0	3276	1.0963	0.7148
0.8583	22.0	3432	1.3230	0.7184
0.7482	23.0	3588	1.3272	0.7040
0.7482	24.0	3744	1.2003	0.7401
0.7482	25.0	3900	1.4140	0.7076
0.6935	26.0	4056	1.1536	0.7509
0.6935	27.0	4212	1.1267	0.7401
0.6935	28.0	4368	1.1595	0.7473
0.6056	29.0	4524	1.4403	0.7401
0.6056	30.0	4680	1.3101	0.7617
0.6056	31.0	4836	1.8018	0.7040
0.6056	32.0	4992	1.1681	0.7653
0.5191	33.0	5148	1.5214	0.7690
0.5191	34.0	5304	1.2349	0.7509
0.5191	35.0	5460	1.3993	0.7437
0.4549	36.0	5616	1.5260	0.7040
0.4549	37.0	5772	1.5437	0.7401
0.4549	38.0	5928	1.4679	0.7401
0.4181	39.0	6084	1.5237	0.7437
0.4181	40.0	6240	1.2788	0.7545
0.4181	41.0	6396	1.2741	0.7581
0.3694	42.0	6552	1.4069	0.7653
0.3694	43.0	6708	1.6243	0.7473
0.3694	44.0	6864	1.5139	0.7509
0.3477	45.0	7020	1.3648	0.7617
0.3477	46.0	7176	1.3082	0.7581
0.3477	47.0	7332	1.3837	0.7509
0.3477	48.0	7488	1.4072	0.7726
0.3048	49.0	7644	1.3494	0.7690
0.3048	50.0	7800	1.5970	0.7437
0.3048	51.0	7956	1.5230	0.7509
0.2879	52.0	8112	1.4555	0.7690
0.2879	53.0	8268	1.6442	0.7437
0.2879	54.0	8424	1.4267	0.7473
0.2545	55.0	8580	1.4977	0.7473
0.2545	56.0	8736	1.5389	0.7509
0.2545	57.0	8892	1.2889	0.7581
0.2434	58.0	9048	1.5166	0.7545
0.2434	59.0	9204	1.5143	0.7581
0.2434	60.0	9360	1.6968	0.7437
0.2309	61.0	9516	1.6144	0.7545
0.2309	62.0	9672	1.5494	0.7581
0.2309	63.0	9828	1.4832	0.7545
0.2309	64.0	9984	1.4073	0.7581
0.2194	65.0	10140	1.4524	0.7581
0.2194	66.0	10296	1.4490	0.7509
0.2194	67.0	10452	1.5948	0.7545
0.2037	68.0	10608	1.5180	0.7653
0.2037	69.0	10764	1.6394	0.7581
0.2037	70.0	10920	1.5999	0.7617
0.2017	71.0	11076	1.3414	0.7653
0.2017	72.0	11232	1.4794	0.7617
0.2017	73.0	11388	1.3894	0.7653
0.1889	74.0	11544	1.3723	0.7690
0.1889	75.0	11700	1.4901	0.7581
0.1889	76.0	11856	1.4329	0.7617
0.1929	77.0	12012	1.4548	0.7653
0.1929	78.0	12168	1.4404	0.7617
0.1929	79.0	12324	1.4248	0.7653
0.1929	80.0	12480	1.4601	0.7617

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824164412

20230824164412

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824164412

Evaluation results