20230823053830

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.0703
Accuracy: 0.4729

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.0743	0.4729
0.0882	2.0	624	0.0731	0.4729
0.0882	3.0	936	0.0718	0.4729
0.0871	4.0	1248	0.0712	0.4838
0.0857	5.0	1560	0.0709	0.4765
0.0857	6.0	1872	0.0718	0.4729
0.084	7.0	2184	0.0709	0.4765
0.084	8.0	2496	0.0705	0.4729
0.0831	9.0	2808	0.0710	0.4729
0.0826	10.0	3120	0.0705	0.4729
0.0826	11.0	3432	0.0726	0.4729
0.0823	12.0	3744	0.0722	0.4729
0.0814	13.0	4056	0.0710	0.4729
0.0814	14.0	4368	0.0710	0.4585
0.0807	15.0	4680	0.0706	0.4729
0.0807	16.0	4992	0.0709	0.4729
0.0803	17.0	5304	0.0709	0.4693
0.0798	18.0	5616	0.0711	0.5307
0.0798	19.0	5928	0.0708	0.4729
0.0798	20.0	6240	0.0710	0.4801
0.0792	21.0	6552	0.0710	0.5307
0.0792	22.0	6864	0.0728	0.5379
0.0797	23.0	7176	0.0707	0.4657
0.0797	24.0	7488	0.0711	0.4729
0.0793	25.0	7800	0.0706	0.4729
0.0783	26.0	8112	0.0704	0.4729
0.0783	27.0	8424	0.0706	0.4729
0.0783	28.0	8736	0.0709	0.4729
0.0782	29.0	9048	0.0703	0.4729
0.0782	30.0	9360	0.0705	0.4765
0.0782	31.0	9672	0.0709	0.5054
0.0782	32.0	9984	0.0705	0.4729
0.0786	33.0	10296	0.0704	0.4729
0.0779	34.0	10608	0.0705	0.4729
0.0779	35.0	10920	0.0715	0.4729
0.0779	36.0	11232	0.0707	0.4765
0.0779	37.0	11544	0.0703	0.4729
0.0779	38.0	11856	0.0704	0.4765
0.0778	39.0	12168	0.0704	0.4729
0.0778	40.0	12480	0.0704	0.4693
0.0776	41.0	12792	0.0704	0.4729
0.0777	42.0	13104	0.0703	0.4729
0.0777	43.0	13416	0.0707	0.4585
0.0775	44.0	13728	0.0703	0.4729
0.0777	45.0	14040	0.0705	0.4729
0.0777	46.0	14352	0.0704	0.4729
0.0772	47.0	14664	0.0730	0.4729
0.0772	48.0	14976	0.0703	0.4729
0.0774	49.0	15288	0.0706	0.4549
0.0774	50.0	15600	0.0704	0.4729
0.0774	51.0	15912	0.0706	0.4729
0.0778	52.0	16224	0.0705	0.4729
0.0775	53.0	16536	0.0704	0.4729
0.0775	54.0	16848	0.0704	0.4765
0.0772	55.0	17160	0.0704	0.4729
0.0772	56.0	17472	0.0703	0.4729
0.077	57.0	17784	0.0703	0.4729
0.0774	58.0	18096	0.0706	0.4729
0.0774	59.0	18408	0.0704	0.4729
0.0776	60.0	18720	0.0703	0.4729

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230823053830

20230823053830

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230823053830

Evaluation results