20230903121524

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.9097
Accuracy: 0.6442

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.7286	0.5
0.7482	2.0	680	0.7273	0.5
0.7442	3.0	1020	0.7313	0.5
0.7442	4.0	1360	0.7599	0.5
0.7355	5.0	1700	0.7222	0.6113
0.6979	6.0	2040	0.7373	0.6160
0.6979	7.0	2380	0.6950	0.6583
0.6629	8.0	2720	0.6711	0.6740
0.6282	9.0	3060	0.7543	0.6599
0.6282	10.0	3400	0.7217	0.6520
0.6023	11.0	3740	0.7513	0.6426
0.5705	12.0	4080	0.6886	0.6693
0.5705	13.0	4420	0.6779	0.6755
0.5607	14.0	4760	0.7978	0.6489
0.527	15.0	5100	0.6722	0.6771
0.527	16.0	5440	0.8047	0.6317
0.5226	17.0	5780	0.7721	0.6740
0.5133	18.0	6120	0.7900	0.6552
0.5133	19.0	6460	0.7563	0.6599
0.5054	20.0	6800	0.8456	0.6411
0.4836	21.0	7140	0.8232	0.6426
0.4836	22.0	7480	0.7993	0.6270
0.4796	23.0	7820	0.8026	0.6426
0.4659	24.0	8160	0.8306	0.6254
0.4669	25.0	8500	0.8153	0.6505
0.4669	26.0	8840	0.8499	0.6489
0.4487	27.0	9180	0.8366	0.6332
0.4499	28.0	9520	0.7661	0.6567
0.4499	29.0	9860	0.7668	0.6630
0.4483	30.0	10200	0.8147	0.6520
0.4303	31.0	10540	0.8030	0.6442
0.4303	32.0	10880	0.8346	0.6285
0.4272	33.0	11220	0.7779	0.6489
0.43	34.0	11560	0.8193	0.6599
0.43	35.0	11900	0.8792	0.6411
0.4139	36.0	12240	0.8091	0.6332
0.4139	37.0	12580	0.7939	0.6458
0.4139	38.0	12920	0.8626	0.6505
0.4102	39.0	13260	0.8111	0.6442
0.4065	40.0	13600	0.8054	0.6583
0.4065	41.0	13940	0.8704	0.6520
0.4049	42.0	14280	0.8441	0.6348
0.3978	43.0	14620	0.8723	0.6411
0.3978	44.0	14960	0.8747	0.6552
0.4074	45.0	15300	0.8662	0.6505
0.3952	46.0	15640	0.8432	0.6442
0.3952	47.0	15980	0.8837	0.6552
0.3868	48.0	16320	0.8219	0.6583
0.3805	49.0	16660	0.7792	0.6536
0.386	50.0	17000	0.8385	0.6520
0.386	51.0	17340	0.8554	0.6505
0.3869	52.0	17680	0.8655	0.6583
0.3772	53.0	18020	0.8613	0.6552
0.3772	54.0	18360	0.9268	0.6364
0.3744	55.0	18700	0.8710	0.6473
0.378	56.0	19040	0.9222	0.6395
0.378	57.0	19380	0.8803	0.6536
0.3702	58.0	19720	0.9055	0.6364
0.3687	59.0	20060	0.8305	0.6630
0.3687	60.0	20400	0.9229	0.6395
0.3677	61.0	20740	0.9214	0.6301
0.3635	62.0	21080	0.9074	0.6458
0.3635	63.0	21420	0.8890	0.6520
0.3613	64.0	21760	0.8725	0.6426
0.3634	65.0	22100	0.8860	0.6489
0.3634	66.0	22440	0.8428	0.6614
0.3528	67.0	22780	0.8792	0.6458
0.3613	68.0	23120	0.8840	0.6254
0.3613	69.0	23460	0.8960	0.6489
0.3516	70.0	23800	0.8763	0.6567
0.348	71.0	24140	0.8935	0.6332
0.348	72.0	24480	0.9031	0.6442
0.3567	73.0	24820	0.9070	0.6458
0.3514	74.0	25160	0.8997	0.6426
0.3543	75.0	25500	0.9025	0.6458
0.3543	76.0	25840	0.9028	0.6379
0.3457	77.0	26180	0.9155	0.6364
0.3452	78.0	26520	0.8973	0.6426
0.3452	79.0	26860	0.9085	0.6458
0.3379	80.0	27200	0.9097	0.6442

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230903121524

20230903121524

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230903121524

Evaluation results