20230824083011

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3090
Accuracy: 0.7401

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 4
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.7501	1.0	623	0.9859	0.4729
0.6252	2.0	1246	0.4891	0.4801
0.5769	3.0	1869	1.1271	0.4729
0.5672	4.0	2492	0.4257	0.5632
0.5439	5.0	3115	0.5883	0.5415
0.5426	6.0	3738	0.3734	0.6245
0.61	7.0	4361	0.4410	0.5848
0.4937	8.0	4984	0.4091	0.5632
0.4293	9.0	5607	0.3712	0.6282
0.3897	10.0	6230	0.3441	0.6931
0.3759	11.0	6853	0.3400	0.7004
0.379	12.0	7476	0.3802	0.6787
0.3661	13.0	8099	0.3456	0.7184
0.374	14.0	8722	0.3545	0.6859
0.3441	15.0	9345	0.3219	0.7112
0.3339	16.0	9968	0.3192	0.7184
0.3324	17.0	10591	0.3290	0.7184
0.324	18.0	11214	0.3284	0.7112
0.3641	19.0	11837	0.3100	0.7292
0.3138	20.0	12460	0.3102	0.7365
0.3099	21.0	13083	0.3887	0.7076
0.3095	22.0	13706	0.3443	0.7004
0.3039	23.0	14329	0.3937	0.6895
0.287	24.0	14952	0.3071	0.7473
0.2718	25.0	15575	0.3097	0.7184
0.2711	26.0	16198	0.2888	0.7329
0.2738	27.0	16821	0.2920	0.7220
0.2697	28.0	17444	0.2986	0.7329
0.2589	29.0	18067	0.3092	0.7437
0.2536	30.0	18690	0.3141	0.7292
0.2564	31.0	19313	0.3134	0.7401
0.2493	32.0	19936	0.2962	0.7365
0.2428	33.0	20559	0.3358	0.7256
0.2425	34.0	21182	0.3155	0.7148
0.2342	35.0	21805	0.3000	0.7220
0.2394	36.0	22428	0.2955	0.7329
0.2257	37.0	23051	0.3070	0.7509
0.2272	38.0	23674	0.2959	0.7365
0.2197	39.0	24297	0.3100	0.7401
0.2144	40.0	24920	0.3009	0.7365
0.2164	41.0	25543	0.2957	0.7256
0.2129	42.0	26166	0.3133	0.7292
0.2106	43.0	26789	0.3110	0.7329
0.2069	44.0	27412	0.3072	0.7329
0.2051	45.0	28035	0.3300	0.7292
0.2064	46.0	28658	0.3106	0.7256
0.2039	47.0	29281	0.3114	0.7292
0.2106	48.0	29904	0.3180	0.7365
0.2008	49.0	30527	0.3099	0.7329
0.1945	50.0	31150	0.3066	0.7329
0.1958	51.0	31773	0.3124	0.7401
0.1939	52.0	32396	0.3230	0.7401
0.1942	53.0	33019	0.3105	0.7365
0.1887	54.0	33642	0.3014	0.7256
0.185	55.0	34265	0.3052	0.7365
0.1868	56.0	34888	0.3155	0.7365
0.1888	57.0	35511	0.3056	0.7256
0.1885	58.0	36134	0.3069	0.7329
0.192	59.0	36757	0.3076	0.7329
0.1807	60.0	37380	0.3090	0.7401

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824083011

20230824083011

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824083011

Evaluation results