20230823074620

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.0693
Accuracy: 0.5812

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.1529	0.4729
0.2201	2.0	624	0.1022	0.5271
0.2201	3.0	936	0.2619	0.4729
0.1563	4.0	1248	0.0738	0.5199
0.0889	5.0	1560	0.0709	0.4982
0.0889	6.0	1872	0.0758	0.4729
0.0808	7.0	2184	0.0732	0.4729
0.0808	8.0	2496	0.0716	0.5596
0.0802	9.0	2808	0.0707	0.5307
0.0819	10.0	3120	0.0712	0.4729
0.0819	11.0	3432	0.0706	0.4765
0.0818	12.0	3744	0.0703	0.4621
0.08	13.0	4056	0.0737	0.4729
0.08	14.0	4368	0.0712	0.5307
0.0803	15.0	4680	0.0738	0.4729
0.0803	16.0	4992	0.0708	0.4729
0.0807	17.0	5304	0.0709	0.5487
0.082	18.0	5616	0.0720	0.5523
0.082	19.0	5928	0.0712	0.4729
0.0806	20.0	6240	0.0703	0.5090
0.0801	21.0	6552	0.0710	0.4729
0.0801	22.0	6864	0.0701	0.4874
0.0798	23.0	7176	0.0703	0.4874
0.0798	24.0	7488	0.0705	0.4765
0.0854	25.0	7800	0.0704	0.5523
0.0793	26.0	8112	0.0702	0.4910
0.0793	27.0	8424	0.0721	0.4729
0.0792	28.0	8736	0.0720	0.4729
0.0794	29.0	9048	0.0713	0.4765
0.0794	30.0	9360	0.0701	0.5632
0.0785	31.0	9672	0.0710	0.6101
0.0785	32.0	9984	0.0703	0.4801
0.0786	33.0	10296	0.0728	0.4729
0.0791	34.0	10608	0.0703	0.5054
0.0791	35.0	10920	0.0716	0.6173
0.0789	36.0	11232	0.0708	0.4765
0.0786	37.0	11544	0.0770	0.4729
0.0786	38.0	11856	0.0718	0.4729
0.0784	39.0	12168	0.0700	0.4838
0.0784	40.0	12480	0.0699	0.5235
0.0775	41.0	12792	0.0698	0.6137
0.0779	42.0	13104	0.0697	0.5199
0.0779	43.0	13416	0.0698	0.6534
0.0777	44.0	13728	0.0697	0.5848
0.0776	45.0	14040	0.0699	0.6426
0.0776	46.0	14352	0.0697	0.6029
0.0769	47.0	14664	0.0705	0.4874
0.0769	48.0	14976	0.0695	0.6209
0.077	49.0	15288	0.0695	0.5668
0.077	50.0	15600	0.0696	0.5018
0.077	51.0	15912	0.0700	0.4946
0.0774	52.0	16224	0.0701	0.4982
0.0767	53.0	16536	0.0694	0.5812
0.0767	54.0	16848	0.0701	0.6462
0.0761	55.0	17160	0.0706	0.4874
0.0761	56.0	17472	0.0695	0.6787
0.0762	57.0	17784	0.0693	0.6029
0.0763	58.0	18096	0.0696	0.5199
0.0763	59.0	18408	0.0693	0.5740
0.0763	60.0	18720	0.0693	0.5812

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230823074620

20230823074620

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230823074620

Evaluation results