20230822105331

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3495
Accuracy: 0.4729

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.05
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	7.6648	0.5271
3.587	2.0	624	0.6882	0.4729
3.587	3.0	936	1.0025	0.4729
2.3815	4.0	1248	2.3514	0.5271
2.2566	5.0	1560	2.2928	0.5271
2.2566	6.0	1872	1.7104	0.5271
2.15	7.0	2184	1.0133	0.5271
2.15	8.0	2496	2.0623	0.4729
1.9744	9.0	2808	1.7197	0.4729
2.0161	10.0	3120	2.4539	0.5271
2.0161	11.0	3432	0.3721	0.4729
1.9705	12.0	3744	1.6829	0.4729
1.9852	13.0	4056	1.6828	0.4729
1.9852	14.0	4368	0.4861	0.4729
1.8881	15.0	4680	0.9674	0.5271
1.8881	16.0	4992	0.4690	0.5271
1.6994	17.0	5304	1.8712	0.4729
1.6662	18.0	5616	1.5880	0.4729
1.6662	19.0	5928	0.8004	0.4729
1.6315	20.0	6240	1.1683	0.4729
1.5675	21.0	6552	0.7509	0.5271
1.5675	22.0	6864	0.4691	0.5271
1.6442	23.0	7176	0.5092	0.4729
1.6442	24.0	7488	0.3482	0.5271
1.4097	25.0	7800	1.3770	0.5271
1.3654	26.0	8112	0.9837	0.5271
1.3654	27.0	8424	1.5820	0.5271
1.3798	28.0	8736	2.0902	0.4729
1.2375	29.0	9048	0.3487	0.4729
1.2375	30.0	9360	1.7541	0.5271
1.1474	31.0	9672	0.6072	0.5271
1.1474	32.0	9984	0.6279	0.5271
1.1276	33.0	10296	0.3904	0.4729
1.0103	34.0	10608	0.3875	0.4729
1.0103	35.0	10920	0.6633	0.5271
1.0402	36.0	11232	0.3507	0.4729
0.9725	37.0	11544	0.4593	0.5271
0.9725	38.0	11856	0.4105	0.4729
0.8985	39.0	12168	0.3554	0.5271
0.8985	40.0	12480	1.4254	0.4729
0.93	41.0	12792	0.4509	0.4729
0.8076	42.0	13104	0.3815	0.5271
0.8076	43.0	13416	0.4002	0.4729
0.7373	44.0	13728	0.4687	0.4729
0.7011	45.0	14040	0.3481	0.5271
0.7011	46.0	14352	0.3538	0.4729
0.6638	47.0	14664	0.4579	0.5271
0.6638	48.0	14976	0.3623	0.4729
0.6146	49.0	15288	0.3498	0.4729
0.5636	50.0	15600	0.4416	0.5271
0.5636	51.0	15912	0.3922	0.4729
0.5368	52.0	16224	0.4049	0.5271
0.4917	53.0	16536	0.3605	0.4729
0.4917	54.0	16848	0.3491	0.5271
0.4658	55.0	17160	0.3615	0.4729
0.4658	56.0	17472	0.3505	0.5271
0.4389	57.0	17784	0.3542	0.4729
0.4097	58.0	18096	0.3499	0.4729
0.4097	59.0	18408	0.3565	0.5271
0.3867	60.0	18720	0.3495	0.4729

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822105331

20230822105331

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822105331

Evaluation results