20230824062849

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 1.2256
Accuracy: 0.7473

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	1.2170	0.5307
0.9844	2.0	624	0.7365	0.5090
0.9844	3.0	936	0.6978	0.5632
0.8956	4.0	1248	0.8855	0.4765
0.8957	5.0	1560	1.0223	0.5379
0.8957	6.0	1872	0.6873	0.6137
0.7665	7.0	2184	0.8629	0.6173
0.7665	8.0	2496	0.6861	0.6570
0.734	9.0	2808	0.6714	0.7076
0.7238	10.0	3120	0.6298	0.7184
0.7238	11.0	3432	0.5975	0.7184
0.6786	12.0	3744	0.8311	0.6968
0.6396	13.0	4056	0.7136	0.6751
0.6396	14.0	4368	0.7183	0.6859
0.6481	15.0	4680	0.6652	0.7076
0.6481	16.0	4992	1.0367	0.6823
0.6106	17.0	5304	0.7197	0.6895
0.6011	18.0	5616	0.6058	0.7292
0.6011	19.0	5928	0.7227	0.7112
0.5978	20.0	6240	1.1472	0.6570
0.5309	21.0	6552	0.6741	0.7256
0.5309	22.0	6864	0.9335	0.6787
0.5392	23.0	7176	0.8296	0.7365
0.5392	24.0	7488	0.9097	0.7040
0.5058	25.0	7800	0.8278	0.7292
0.4669	26.0	8112	1.0859	0.6498
0.4669	27.0	8424	0.9387	0.7184
0.462	28.0	8736	1.0893	0.7365
0.4757	29.0	9048	1.3568	0.6859
0.4757	30.0	9360	1.0252	0.7040
0.4237	31.0	9672	1.0489	0.7329
0.4237	32.0	9984	0.8661	0.7292
0.4275	33.0	10296	0.9781	0.7437
0.3722	34.0	10608	0.8879	0.7329
0.3722	35.0	10920	0.9932	0.7292
0.3741	36.0	11232	1.0509	0.7365
0.3358	37.0	11544	1.3875	0.7329
0.3358	38.0	11856	1.2366	0.7220
0.3415	39.0	12168	1.0563	0.7329
0.3415	40.0	12480	0.9688	0.7401
0.3357	41.0	12792	0.8598	0.7329
0.3094	42.0	13104	1.0506	0.7329
0.3094	43.0	13416	1.3257	0.7365
0.2947	44.0	13728	1.1759	0.7365
0.2832	45.0	14040	1.1699	0.7329
0.2832	46.0	14352	1.1070	0.7401
0.2808	47.0	14664	1.1519	0.7473
0.2808	48.0	14976	1.0674	0.7401
0.2715	49.0	15288	1.1491	0.7401
0.252	50.0	15600	1.0819	0.7473
0.252	51.0	15912	0.9650	0.7473
0.2577	52.0	16224	1.0753	0.7437
0.2579	53.0	16536	1.0896	0.7473
0.2579	54.0	16848	1.0579	0.7401
0.2395	55.0	17160	1.1172	0.7509
0.2395	56.0	17472	1.1540	0.7509
0.2392	57.0	17784	1.2162	0.7509
0.22	58.0	18096	1.1978	0.7509
0.22	59.0	18408	1.2381	0.7473
0.2242	60.0	18720	1.2256	0.7473

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824062849

20230824062849

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824062849

Evaluation results