20230822173808

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3493
Accuracy: 0.6968

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.004
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.3774	0.5162
0.5343	2.0	624	0.3506	0.5018
0.5343	3.0	936	0.4575	0.4729
0.4659	4.0	1248	0.3759	0.5307
0.4691	5.0	1560	0.3500	0.5812
0.4691	6.0	1872	0.3457	0.5993
0.4442	7.0	2184	0.3500	0.6101
0.4442	8.0	2496	0.3403	0.6173
0.4366	9.0	2808	0.3840	0.5776
0.4097	10.0	3120	0.4391	0.5487
0.4097	11.0	3432	0.3584	0.6029
0.3922	12.0	3744	0.3356	0.6498
0.3564	13.0	4056	0.3275	0.6931
0.3564	14.0	4368	0.3283	0.7076
0.3343	15.0	4680	0.3377	0.6462
0.3343	16.0	4992	0.3550	0.6390
0.335	17.0	5304	0.3370	0.6895
0.3233	18.0	5616	0.3256	0.6787
0.3233	19.0	5928	0.3174	0.7112
0.3232	20.0	6240	0.3440	0.6643
0.3102	21.0	6552	0.3375	0.6895
0.3102	22.0	6864	0.3433	0.6787
0.3064	23.0	7176	0.3690	0.6715
0.3064	24.0	7488	0.3394	0.6931
0.3004	25.0	7800	0.3377	0.7256
0.2962	26.0	8112	0.3435	0.6751
0.2962	27.0	8424	0.3182	0.7329
0.2937	28.0	8736	0.3306	0.7112
0.2905	29.0	9048	0.3362	0.7148
0.2905	30.0	9360	0.3675	0.6751
0.2865	31.0	9672	0.3406	0.7076
0.2865	32.0	9984	0.3343	0.7040
0.2812	33.0	10296	0.3472	0.6859
0.2727	34.0	10608	0.3372	0.7292
0.2727	35.0	10920	0.3575	0.7076
0.2735	36.0	11232	0.3300	0.7076
0.2701	37.0	11544	0.3585	0.6968
0.2701	38.0	11856	0.3422	0.7148
0.2688	39.0	12168	0.3579	0.6931
0.2688	40.0	12480	0.3326	0.7148
0.2644	41.0	12792	0.3464	0.7256
0.2637	42.0	13104	0.3579	0.6931
0.2637	43.0	13416	0.3489	0.7040
0.26	44.0	13728	0.3439	0.7076
0.2582	45.0	14040	0.3585	0.7004
0.2582	46.0	14352	0.3535	0.7076
0.2533	47.0	14664	0.3440	0.7148
0.2533	48.0	14976	0.3506	0.7040
0.2535	49.0	15288	0.3519	0.7040
0.2498	50.0	15600	0.3457	0.6931
0.2498	51.0	15912	0.3494	0.7112
0.2504	52.0	16224	0.3431	0.7040
0.2499	53.0	16536	0.3450	0.7040
0.2499	54.0	16848	0.3485	0.6895
0.2488	55.0	17160	0.3437	0.7004
0.2488	56.0	17472	0.3465	0.7004
0.2479	57.0	17784	0.3479	0.6895
0.247	58.0	18096	0.3447	0.7004
0.247	59.0	18408	0.3521	0.7004
0.2468	60.0	18720	0.3493	0.6968

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822173808

20230822173808

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822173808

Evaluation results