20230824002455

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.7440
Accuracy: 0.7473

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 4
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
1.0306	1.0	623	0.6949	0.4729
0.8552	2.0	1246	0.7454	0.5596
0.9623	3.0	1869	0.8165	0.4874
0.8291	4.0	2492	1.1894	0.5704
0.8201	5.0	3115	0.6677	0.6823
0.8297	6.0	3738	0.6379	0.7256
0.7792	7.0	4361	0.6572	0.6931
0.6925	8.0	4984	0.6975	0.6498
0.7243	9.0	5607	0.7871	0.6679
0.69	10.0	6230	0.7707	0.7148
0.6492	11.0	6853	0.7202	0.7004
0.6448	12.0	7476	0.6862	0.7329
0.6571	13.0	8099	0.6079	0.7256
0.6558	14.0	8722	0.8183	0.7329
0.5996	15.0	9345	0.5783	0.7256
0.5494	16.0	9968	0.5463	0.7473
0.4964	17.0	10591	0.7906	0.7040
0.4914	18.0	11214	0.5334	0.7220
0.4933	19.0	11837	0.6681	0.7329
0.4655	20.0	12460	0.8837	0.7401
0.4432	21.0	13083	0.7407	0.7473
0.4051	22.0	13706	0.7213	0.7509
0.4018	23.0	14329	0.8420	0.7365
0.3745	24.0	14952	0.6421	0.7365
0.3558	25.0	15575	0.5727	0.7437
0.3325	26.0	16198	0.6941	0.7545
0.3471	27.0	16821	0.8213	0.7545
0.3405	28.0	17444	0.7249	0.7292
0.3079	29.0	18067	0.5829	0.7545
0.3136	30.0	18690	0.7057	0.7617
0.3152	31.0	19313	0.7746	0.7509
0.2989	32.0	19936	0.6028	0.7617
0.2657	33.0	20559	0.8212	0.7509
0.2703	34.0	21182	0.7015	0.7401
0.2562	35.0	21805	0.5706	0.7581
0.2738	36.0	22428	0.7036	0.7690
0.2404	37.0	23051	0.6888	0.7545
0.2595	38.0	23674	0.7086	0.7437
0.245	39.0	24297	0.7283	0.7401
0.2279	40.0	24920	0.7231	0.7401
0.2288	41.0	25543	0.6915	0.7365
0.2166	42.0	26166	0.8110	0.7329
0.219	43.0	26789	0.7984	0.7437
0.1935	44.0	27412	0.8829	0.7401
0.2105	45.0	28035	0.7270	0.7545
0.2079	46.0	28658	0.8026	0.7365
0.1859	47.0	29281	0.6536	0.7617
0.2211	48.0	29904	0.7410	0.7401
0.1862	49.0	30527	0.8433	0.7401
0.2015	50.0	31150	0.6761	0.7437
0.1921	51.0	31773	0.7471	0.7545
0.1899	52.0	32396	0.8135	0.7437
0.188	53.0	33019	0.7556	0.7365
0.1771	54.0	33642	0.7566	0.7365
0.1697	55.0	34265	0.7515	0.7509
0.185	56.0	34888	0.7795	0.7437
0.177	57.0	35511	0.7455	0.7509
0.1663	58.0	36134	0.7345	0.7509
0.1722	59.0	36757	0.7430	0.7509
0.1696	60.0	37380	0.7440	0.7473

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824002455

20230824002455

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824002455

Evaluation results