20230822163753

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3363
Accuracy: 0.7256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.6253	0.5307
0.4958	2.0	624	0.3817	0.5415
0.4958	3.0	936	0.5426	0.4729
0.4406	4.0	1248	0.7363	0.5379
0.4205	5.0	1560	0.3395	0.6498
0.4205	6.0	1872	0.3422	0.6354
0.4134	7.0	2184	0.4093	0.5487
0.4134	8.0	2496	0.4435	0.5487
0.4124	9.0	2808	0.3364	0.6065
0.3904	10.0	3120	0.3570	0.6029
0.3904	11.0	3432	0.3988	0.5596
0.376	12.0	3744	0.3339	0.6751
0.3501	13.0	4056	0.3348	0.6606
0.3501	14.0	4368	0.3288	0.6715
0.3336	15.0	4680	0.3261	0.6823
0.3336	16.0	4992	0.3326	0.7040
0.333	17.0	5304	0.3264	0.7112
0.3259	18.0	5616	0.3259	0.6968
0.3259	19.0	5928	0.3253	0.6643
0.3281	20.0	6240	0.3261	0.7184
0.3191	21.0	6552	0.3227	0.7220
0.3191	22.0	6864	0.3371	0.6931
0.3164	23.0	7176	0.3522	0.6895
0.3164	24.0	7488	0.3275	0.7040
0.3133	25.0	7800	0.3234	0.7329
0.308	26.0	8112	0.3352	0.6931
0.308	27.0	8424	0.3167	0.7184
0.3075	28.0	8736	0.3378	0.6968
0.3064	29.0	9048	0.3370	0.7112
0.3064	30.0	9360	0.3432	0.7004
0.3021	31.0	9672	0.3305	0.7148
0.3021	32.0	9984	0.3218	0.7220
0.2983	33.0	10296	0.3349	0.7112
0.2933	34.0	10608	0.3208	0.7256
0.2933	35.0	10920	0.3243	0.7220
0.2931	36.0	11232	0.3206	0.7292
0.2903	37.0	11544	0.3643	0.6895
0.2903	38.0	11856	0.3254	0.7473
0.2895	39.0	12168	0.3350	0.7148
0.2895	40.0	12480	0.3325	0.7076
0.2852	41.0	12792	0.3289	0.7256
0.2857	42.0	13104	0.3281	0.7256
0.2857	43.0	13416	0.3373	0.7184
0.2805	44.0	13728	0.3414	0.7040
0.2806	45.0	14040	0.3346	0.7292
0.2806	46.0	14352	0.3383	0.7220
0.2777	47.0	14664	0.3285	0.7220
0.2777	48.0	14976	0.3385	0.7148
0.2768	49.0	15288	0.3403	0.7148
0.2732	50.0	15600	0.3336	0.7256
0.2732	51.0	15912	0.3306	0.7184
0.274	52.0	16224	0.3300	0.7292
0.272	53.0	16536	0.3318	0.7220
0.272	54.0	16848	0.3403	0.7220
0.2701	55.0	17160	0.3252	0.7292
0.2701	56.0	17472	0.3391	0.7220
0.2695	57.0	17784	0.3304	0.7292
0.2694	58.0	18096	0.3300	0.7220
0.2694	59.0	18408	0.3347	0.7292
0.2689	60.0	18720	0.3363	0.7256

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822163753

20230822163753

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822163753

Evaluation results