20230822235943

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.9555
Accuracy: 0.7437

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.8690	0.4729
No log	2.0	312	0.7262	0.5271
No log	3.0	468	0.7646	0.4693
0.8294	4.0	624	0.7044	0.5884
0.8294	5.0	780	0.7099	0.5884
0.8294	6.0	936	0.6449	0.6245
0.785	7.0	1092	0.7755	0.6245
0.785	8.0	1248	0.6443	0.6606
0.785	9.0	1404	0.6349	0.6859
0.6665	10.0	1560	0.9544	0.6462
0.6665	11.0	1716	0.6008	0.7184
0.6665	12.0	1872	0.6503	0.7076
0.6276	13.0	2028	0.6269	0.7076
0.6276	14.0	2184	0.5788	0.7148
0.6276	15.0	2340	0.6645	0.7076
0.6276	16.0	2496	0.9684	0.6426
0.587	17.0	2652	0.6227	0.7184
0.587	18.0	2808	0.6449	0.7076
0.587	19.0	2964	0.6651	0.7365
0.5287	20.0	3120	1.1324	0.6498
0.5287	21.0	3276	0.7391	0.6895
0.5287	22.0	3432	1.0194	0.6643
0.5035	23.0	3588	0.7838	0.7040
0.5035	24.0	3744	0.8647	0.7184
0.5035	25.0	3900	1.0974	0.6715
0.4533	26.0	4056	0.5861	0.7292
0.4533	27.0	4212	0.6685	0.7437
0.4533	28.0	4368	0.6998	0.7256
0.4398	29.0	4524	0.7596	0.7329
0.4398	30.0	4680	0.6967	0.7437
0.4398	31.0	4836	0.7041	0.7473
0.4398	32.0	4992	0.7617	0.7329
0.3837	33.0	5148	0.7991	0.7329
0.3837	34.0	5304	0.8229	0.7473
0.3837	35.0	5460	0.7745	0.7401
0.3471	36.0	5616	0.7787	0.7437
0.3471	37.0	5772	0.7991	0.7365
0.3471	38.0	5928	1.0206	0.7256
0.3303	39.0	6084	0.8977	0.7292
0.3303	40.0	6240	0.7327	0.7220
0.3303	41.0	6396	0.8102	0.7292
0.2991	42.0	6552	0.7347	0.7473
0.2991	43.0	6708	0.8677	0.7473
0.2991	44.0	6864	0.9774	0.7365
0.275	45.0	7020	0.8557	0.7581
0.275	46.0	7176	0.9789	0.7437
0.275	47.0	7332	1.0015	0.7437
0.275	48.0	7488	0.8450	0.7401
0.2596	49.0	7644	0.8222	0.7581
0.2596	50.0	7800	0.8968	0.7401
0.2596	51.0	7956	0.8584	0.7437
0.2469	52.0	8112	0.9157	0.7401
0.2469	53.0	8268	0.9732	0.7365
0.2469	54.0	8424	1.0671	0.7401
0.2303	55.0	8580	0.9512	0.7473
0.2303	56.0	8736	0.8708	0.7473
0.2303	57.0	8892	0.9290	0.7437
0.2275	58.0	9048	0.8866	0.7401
0.2275	59.0	9204	0.9366	0.7365
0.2275	60.0	9360	0.9555	0.7437

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822235943

20230822235943

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822235943

Evaluation results