20230831092835

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.4933
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.5114	0.5
0.5104	2.0	680	0.5011	0.5
0.5162	3.0	1020	0.5183	0.5
0.5162	4.0	1360	0.4985	0.5
0.5087	5.0	1700	0.5279	0.5
0.5026	6.0	2040	0.4974	0.5
0.5026	7.0	2380	0.4970	0.5
0.5035	8.0	2720	0.5153	0.5
0.4963	9.0	3060	0.4956	0.5
0.4963	10.0	3400	0.5024	0.5
0.4986	11.0	3740	0.4932	0.5
0.4975	12.0	4080	0.4948	0.5
0.4975	13.0	4420	0.5179	0.5
0.4951	14.0	4760	0.4950	0.5
0.4987	15.0	5100	0.4946	0.5
0.4987	16.0	5440	0.4961	0.5
0.4983	17.0	5780	0.4991	0.5
0.4947	18.0	6120	0.4941	0.5
0.4947	19.0	6460	0.4925	0.5
0.4957	20.0	6800	0.4976	0.5
0.4949	21.0	7140	0.4938	0.5
0.4949	22.0	7480	0.5070	0.5
0.497	23.0	7820	0.4950	0.5
0.4958	24.0	8160	0.4959	0.5
0.4962	25.0	8500	0.4925	0.5
0.4962	26.0	8840	0.5414	0.5
0.5006	27.0	9180	0.4947	0.5
0.4998	28.0	9520	0.4976	0.5
0.4998	29.0	9860	0.5053	0.5
0.4973	30.0	10200	0.4925	0.5
0.4972	31.0	10540	0.4929	0.5
0.4972	32.0	10880	0.5097	0.5
0.4974	33.0	11220	0.4925	0.5
0.4968	34.0	11560	0.4985	0.5
0.4968	35.0	11900	0.4975	0.5
0.4975	36.0	12240	0.4971	0.5
0.4966	37.0	12580	0.4925	0.5
0.4966	38.0	12920	0.4933	0.5
0.4961	39.0	13260	0.5030	0.5
0.4944	40.0	13600	0.4939	0.5
0.4944	41.0	13940	0.4926	0.5
0.4957	42.0	14280	0.4955	0.5
0.4933	43.0	14620	0.4937	0.5
0.4933	44.0	14960	0.4942	0.5
0.496	45.0	15300	0.5004	0.5
0.493	46.0	15640	0.4936	0.5
0.493	47.0	15980	0.4977	0.5
0.4953	48.0	16320	0.4927	0.5
0.4948	49.0	16660	0.4993	0.5
0.4939	50.0	17000	0.4928	0.5
0.4939	51.0	17340	0.4925	0.5
0.4927	52.0	17680	0.4934	0.5
0.4962	53.0	18020	0.4943	0.5
0.4962	54.0	18360	0.4928	0.5
0.493	55.0	18700	0.4926	0.5
0.4925	56.0	19040	0.4929	0.5
0.4925	57.0	19380	0.4926	0.5
0.493	58.0	19720	0.4931	0.5
0.4938	59.0	20060	0.5001	0.5
0.4938	60.0	20400	0.4925	0.5
0.4923	61.0	20740	0.4928	0.5
0.4924	62.0	21080	0.4927	0.5
0.4924	63.0	21420	0.4931	0.5
0.492	64.0	21760	0.4944	0.5
0.4945	65.0	22100	0.4928	0.5
0.4945	66.0	22440	0.4954	0.5
0.4892	67.0	22780	0.4925	0.5
0.4932	68.0	23120	0.4934	0.5
0.4932	69.0	23460	0.4932	0.5
0.4919	70.0	23800	0.4925	0.5
0.4916	71.0	24140	0.4930	0.5
0.4916	72.0	24480	0.4952	0.5
0.4904	73.0	24820	0.4936	0.5
0.4924	74.0	25160	0.4951	0.5
0.4913	75.0	25500	0.4934	0.5
0.4913	76.0	25840	0.4937	0.5
0.4921	77.0	26180	0.4927	0.5
0.4913	78.0	26520	0.4933	0.5
0.4913	79.0	26860	0.4933	0.5
0.4917	80.0	27200	0.4933	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230831092835

20230831092835

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230831092835

Evaluation results