20230831011453

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.4974
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0007
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.5155	0.5
0.5119	2.0	680	0.5111	0.5
0.51	3.0	1020	0.5526	0.5078
0.51	4.0	1360	0.5257	0.5
0.5162	5.0	1700	0.5071	0.5
0.5071	6.0	2040	0.4929	0.5
0.5071	7.0	2380	0.4955	0.5
0.509	8.0	2720	0.5280	0.5
0.5022	9.0	3060	0.4958	0.5
0.5022	10.0	3400	0.4944	0.5
0.5017	11.0	3740	0.4931	0.5
0.5011	12.0	4080	0.4944	0.5
0.5011	13.0	4420	0.5177	0.5
0.4978	14.0	4760	0.4933	0.5
0.5039	15.0	5100	0.5001	0.5
0.5039	16.0	5440	0.4929	0.5
0.5008	17.0	5780	0.4961	0.5
0.4986	18.0	6120	0.4948	0.5
0.4986	19.0	6460	0.4993	0.5
0.499	20.0	6800	0.4943	0.5
0.4981	21.0	7140	0.4930	0.5
0.4981	22.0	7480	0.5119	0.5
0.5013	23.0	7820	0.4972	0.5
0.498	24.0	8160	0.4938	0.5
0.5	25.0	8500	0.4946	0.5
0.5	26.0	8840	0.5212	0.5
0.4994	27.0	9180	0.5028	0.5
0.4978	28.0	9520	0.4929	0.5
0.4978	29.0	9860	0.4993	0.5
0.4991	30.0	10200	0.4925	0.5
0.4987	31.0	10540	0.4929	0.5
0.4987	32.0	10880	0.5076	0.5
0.4989	33.0	11220	0.4931	0.5
0.4982	34.0	11560	0.5071	0.5
0.4982	35.0	11900	0.4959	0.5
0.4978	36.0	12240	0.5013	0.5
0.4982	37.0	12580	0.4927	0.5
0.4982	38.0	12920	0.4938	0.5
0.4968	39.0	13260	0.5018	0.5
0.4961	40.0	13600	0.4958	0.5
0.4961	41.0	13940	0.4928	0.5
0.4969	42.0	14280	0.4950	0.5
0.4951	43.0	14620	0.4929	0.5
0.4951	44.0	14960	0.4928	0.5
0.4964	45.0	15300	0.4965	0.5
0.4943	46.0	15640	0.4943	0.5
0.4943	47.0	15980	0.4982	0.5
0.4965	48.0	16320	0.4926	0.5
0.497	49.0	16660	0.4969	0.5
0.4959	50.0	17000	0.4930	0.5
0.4959	51.0	17340	0.4928	0.5
0.4932	52.0	17680	0.4926	0.5
0.4969	53.0	18020	0.4961	0.5
0.4969	54.0	18360	0.4935	0.5
0.4937	55.0	18700	0.4926	0.5
0.4937	56.0	19040	0.4926	0.5
0.4937	57.0	19380	0.5036	0.5
0.4951	58.0	19720	0.4930	0.5
0.4939	59.0	20060	0.5071	0.5
0.4939	60.0	20400	0.4927	0.5
0.4929	61.0	20740	0.4928	0.5
0.4926	62.0	21080	0.4928	0.5
0.4926	63.0	21420	0.4936	0.5
0.4917	64.0	21760	0.4967	0.5
0.4951	65.0	22100	0.4941	0.5
0.4951	66.0	22440	0.5071	0.5
0.4895	67.0	22780	0.4932	0.5
0.4939	68.0	23120	0.4930	0.5
0.4939	69.0	23460	0.4938	0.5
0.4919	70.0	23800	0.4935	0.5
0.4915	71.0	24140	0.4934	0.5
0.4915	72.0	24480	0.4962	0.5
0.4898	73.0	24820	0.4958	0.5
0.4919	74.0	25160	0.4967	0.5
0.4905	75.0	25500	0.4961	0.5
0.4905	76.0	25840	0.4986	0.5
0.4908	77.0	26180	0.4958	0.5
0.4897	78.0	26520	0.4974	0.5
0.4897	79.0	26860	0.4992	0.5
0.4897	80.0	27200	0.4974	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230831011453

20230831011453

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230831011453

Evaluation results