20230823015121

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.0704
Accuracy: 0.4729

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.002
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	0.0750	0.4657
0.1248	2.0	624	0.0716	0.5343
0.1248	3.0	936	0.1400	0.4729
0.0956	4.0	1248	0.1243	0.5271
0.0927	5.0	1560	0.0735	0.5668
0.0927	6.0	1872	0.1167	0.5271
0.0924	7.0	2184	0.0803	0.4729
0.0924	8.0	2496	0.0714	0.4982
0.0913	9.0	2808	0.0730	0.4729
0.0885	10.0	3120	0.0708	0.5343
0.0885	11.0	3432	0.0702	0.4910
0.0817	12.0	3744	0.0703	0.5307
0.0789	13.0	4056	0.0723	0.4729
0.0789	14.0	4368	0.0700	0.4874
0.0785	15.0	4680	0.0700	0.4765
0.0785	16.0	4992	0.0701	0.4801
0.0788	17.0	5304	0.0733	0.4549
0.0926	18.0	5616	0.0858	0.5307
0.0926	19.0	5928	0.0739	0.4982
0.0845	20.0	6240	0.0944	0.5235
0.0826	21.0	6552	0.0717	0.4621
0.0826	22.0	6864	0.0710	0.4729
0.0818	23.0	7176	0.0712	0.4910
0.0818	24.0	7488	0.0714	0.4838
0.0809	25.0	7800	0.0745	0.5126
0.0805	26.0	8112	0.0714	0.4729
0.0805	27.0	8424	0.0738	0.4729
0.0805	28.0	8736	0.0709	0.4765
0.0809	29.0	9048	0.0737	0.4729
0.0809	30.0	9360	0.0729	0.5596
0.0797	31.0	9672	0.0736	0.5596
0.0797	32.0	9984	0.0705	0.4693
0.08	33.0	10296	0.0711	0.4657
0.0798	34.0	10608	0.0731	0.5199
0.0798	35.0	10920	0.0744	0.4729
0.0795	36.0	11232	0.0721	0.4729
0.0796	37.0	11544	0.0708	0.4765
0.0796	38.0	11856	0.0714	0.4729
0.0792	39.0	12168	0.0707	0.4729
0.0792	40.0	12480	0.0705	0.4693
0.0785	41.0	12792	0.0706	0.4729
0.0782	42.0	13104	0.0708	0.4765
0.0782	43.0	13416	0.0709	0.4765
0.0779	44.0	13728	0.0705	0.4729
0.078	45.0	14040	0.0705	0.4729
0.078	46.0	14352	0.0704	0.4729
0.0776	47.0	14664	0.0708	0.4729
0.0776	48.0	14976	0.0704	0.4729
0.0778	49.0	15288	0.0704	0.4729
0.0778	50.0	15600	0.0709	0.4729
0.0778	51.0	15912	0.0709	0.4729
0.078	52.0	16224	0.0712	0.4729
0.0776	53.0	16536	0.0704	0.4729
0.0776	54.0	16848	0.0708	0.4729
0.0772	55.0	17160	0.0717	0.4729
0.0772	56.0	17472	0.0705	0.4729
0.0772	57.0	17784	0.0703	0.4729
0.0774	58.0	18096	0.0710	0.4729
0.0774	59.0	18408	0.0704	0.4729
0.0774	60.0	18720	0.0704	0.4729

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230823015121

20230823015121

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230823015121

Evaluation results