20230824083855

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.0821
Accuracy: 0.7473

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 4
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.5366	1.0	623	0.8415	0.4729
0.3757	2.0	1246	0.3098	0.4693
0.3001	3.0	1869	0.5999	0.4729
0.3227	4.0	2492	0.2808	0.4729
0.3109	5.0	3115	0.2772	0.5487
0.3034	6.0	3738	0.1529	0.6029
0.2648	7.0	4361	0.1565	0.6029
0.2104	8.0	4984	0.1394	0.6245
0.1926	9.0	5607	0.1404	0.6390
0.175	10.0	6230	0.1292	0.6859
0.1634	11.0	6853	0.1174	0.7004
0.1618	12.0	7476	0.1228	0.6787
0.1555	13.0	8099	0.1287	0.6534
0.1534	14.0	8722	0.1461	0.6570
0.1523	15.0	9345	0.1356	0.6426
0.1448	16.0	9968	0.1065	0.6968
0.1402	17.0	10591	0.1011	0.7292
0.1342	18.0	11214	0.1112	0.6643
0.1388	19.0	11837	0.1255	0.6823
0.1281	20.0	12460	0.0965	0.7220
0.128	21.0	13083	0.0985	0.7040
0.1236	22.0	13706	0.1339	0.7040
0.1267	23.0	14329	0.1238	0.7365
0.1186	24.0	14952	0.0942	0.7292
0.1101	25.0	15575	0.0923	0.7220
0.1122	26.0	16198	0.0919	0.7401
0.1088	27.0	16821	0.0893	0.7292
0.1059	28.0	17444	0.0897	0.7401
0.106	29.0	18067	0.0878	0.7509
0.1019	30.0	18690	0.0945	0.7365
0.1047	31.0	19313	0.0900	0.7256
0.1011	32.0	19936	0.0884	0.7437
0.0962	33.0	20559	0.0874	0.7329
0.0971	34.0	21182	0.0933	0.7329
0.0914	35.0	21805	0.0845	0.7473
0.0965	36.0	22428	0.0914	0.7365
0.0914	37.0	23051	0.0855	0.7292
0.0894	38.0	23674	0.0867	0.7256
0.087	39.0	24297	0.0861	0.7329
0.0865	40.0	24920	0.0830	0.7329
0.0851	41.0	25543	0.0827	0.7473
0.0837	42.0	26166	0.0818	0.7365
0.0865	43.0	26789	0.0840	0.7401
0.0807	44.0	27412	0.0815	0.7292
0.0829	45.0	28035	0.0840	0.7365
0.0814	46.0	28658	0.0851	0.7401
0.0798	47.0	29281	0.0841	0.7401
0.0806	48.0	29904	0.0838	0.7473
0.0773	49.0	30527	0.0823	0.7401
0.0769	50.0	31150	0.0813	0.7329
0.0763	51.0	31773	0.0822	0.7509
0.0792	52.0	32396	0.0833	0.7365
0.0772	53.0	33019	0.0819	0.7365
0.0732	54.0	33642	0.0810	0.7365
0.0708	55.0	34265	0.0808	0.7365
0.0741	56.0	34888	0.0824	0.7509
0.0725	57.0	35511	0.0816	0.7437
0.072	58.0	36134	0.0812	0.7437
0.0712	59.0	36757	0.0827	0.7401
0.0707	60.0	37380	0.0821	0.7473

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824083855

20230824083855

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824083855

Evaluation results