20230822173821

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3484
Accuracy: 0.6751

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.004
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.5182	0.4729
0.543	2.0	624	0.3851	0.4801
0.543	3.0	936	0.5255	0.4729
0.4553	4.0	1248	0.5462	0.5271
0.4979	5.0	1560	0.4904	0.5415
0.4979	6.0	1872	0.3574	0.5271
0.4681	7.0	2184	0.3976	0.5487
0.4681	8.0	2496	0.3657	0.5343
0.4011	9.0	2808	0.3503	0.4946
0.384	10.0	3120	0.3703	0.5668
0.384	11.0	3432	0.3402	0.6029
0.3704	12.0	3744	0.3394	0.5668
0.3653	13.0	4056	0.3450	0.5451
0.3653	14.0	4368	0.3365	0.6282
0.3572	15.0	4680	0.3487	0.5921
0.3572	16.0	4992	0.3502	0.6462
0.3604	17.0	5304	0.3491	0.6137
0.3494	18.0	5616	0.3459	0.6318
0.3494	19.0	5928	0.3353	0.6498
0.3542	20.0	6240	0.3559	0.6209
0.3418	21.0	6552	0.3340	0.6462
0.3418	22.0	6864	0.3586	0.6426
0.3407	23.0	7176	0.3657	0.6245
0.3407	24.0	7488	0.3359	0.6606
0.3481	25.0	7800	0.3398	0.6462
0.3334	26.0	8112	0.3468	0.6318
0.3334	27.0	8424	0.3321	0.6498
0.3348	28.0	8736	0.3341	0.6787
0.3325	29.0	9048	0.3343	0.6534
0.3325	30.0	9360	0.3496	0.6354
0.3335	31.0	9672	0.3661	0.6354
0.3335	32.0	9984	0.3327	0.6643
0.3271	33.0	10296	0.3390	0.6823
0.3235	34.0	10608	0.3351	0.6643
0.3235	35.0	10920	0.3366	0.6679
0.3232	36.0	11232	0.3338	0.6606
0.3209	37.0	11544	0.3435	0.6534
0.3209	38.0	11856	0.3430	0.6426
0.3202	39.0	12168	0.3478	0.6570
0.3202	40.0	12480	0.3371	0.6606
0.3205	41.0	12792	0.3381	0.6643
0.3169	42.0	13104	0.3433	0.6679
0.3169	43.0	13416	0.3459	0.6643
0.316	44.0	13728	0.3551	0.6498
0.3139	45.0	14040	0.3449	0.6679
0.3139	46.0	14352	0.3482	0.6715
0.3123	47.0	14664	0.3455	0.6643
0.3123	48.0	14976	0.3541	0.6679
0.3108	49.0	15288	0.3562	0.6715
0.308	50.0	15600	0.3421	0.6679
0.308	51.0	15912	0.3376	0.6606
0.3104	52.0	16224	0.3390	0.6751
0.3078	53.0	16536	0.3515	0.6643
0.3078	54.0	16848	0.3561	0.6679
0.305	55.0	17160	0.3430	0.6643
0.305	56.0	17472	0.3541	0.6643
0.3067	57.0	17784	0.3468	0.6679
0.3025	58.0	18096	0.3472	0.6679
0.3025	59.0	18408	0.3492	0.6715
0.304	60.0	18720	0.3484	0.6751

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822173821

20230822173821

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822173821

Evaluation results