20230824023516

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.7658
Accuracy: 0.7401

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	1.0164	0.5307
0.9117	2.0	624	0.7035	0.5090
0.9117	3.0	936	0.6456	0.5307
0.771	4.0	1248	0.6625	0.5487
0.7935	5.0	1560	0.9135	0.5487
0.7935	6.0	1872	0.7048	0.6426
0.7247	7.0	2184	0.7188	0.6570
0.7247	8.0	2496	0.7428	0.6570
0.6659	9.0	2808	0.5639	0.7076
0.6647	10.0	3120	0.8170	0.6426
0.6647	11.0	3432	0.5627	0.7076
0.6248	12.0	3744	0.7036	0.7040
0.5859	13.0	4056	0.5674	0.7112
0.5859	14.0	4368	0.6351	0.7112
0.599	15.0	4680	0.5921	0.7112
0.599	16.0	4992	0.9538	0.6643
0.5515	17.0	5304	0.6401	0.7004
0.5423	18.0	5616	0.5545	0.7256
0.5423	19.0	5928	0.5583	0.7365
0.5248	20.0	6240	0.8808	0.6534
0.4795	21.0	6552	0.5670	0.7292
0.4795	22.0	6864	0.6174	0.6968
0.4853	23.0	7176	0.8153	0.7112
0.4853	24.0	7488	0.6551	0.7256
0.4379	25.0	7800	0.7501	0.7292
0.4365	26.0	8112	0.8488	0.6895
0.4365	27.0	8424	0.7814	0.7112
0.4204	28.0	8736	0.7393	0.7220
0.434	29.0	9048	0.9116	0.6859
0.434	30.0	9360	0.8298	0.7076
0.4064	31.0	9672	0.7928	0.6968
0.4064	32.0	9984	0.6150	0.7329
0.3869	33.0	10296	0.8984	0.7256
0.3459	34.0	10608	0.6598	0.7401
0.3459	35.0	10920	0.6022	0.7401
0.352	36.0	11232	0.8833	0.7112
0.3268	37.0	11544	0.9331	0.7220
0.3268	38.0	11856	0.8233	0.7401
0.3108	39.0	12168	0.8361	0.7329
0.3108	40.0	12480	0.6123	0.7292
0.3038	41.0	12792	0.6187	0.7292
0.287	42.0	13104	0.7216	0.7401
0.287	43.0	13416	0.9118	0.7148
0.2802	44.0	13728	0.8249	0.7329
0.2756	45.0	14040	0.7843	0.7437
0.2756	46.0	14352	0.7272	0.7365
0.2735	47.0	14664	0.7253	0.7292
0.2735	48.0	14976	0.7766	0.7365
0.2552	49.0	15288	0.7906	0.7401
0.2449	50.0	15600	0.6664	0.7329
0.2449	51.0	15912	0.6854	0.7220
0.248	52.0	16224	0.7260	0.7256
0.2533	53.0	16536	0.7750	0.7329
0.2533	54.0	16848	0.7146	0.7401
0.238	55.0	17160	0.7802	0.7365
0.238	56.0	17472	0.7462	0.7365
0.2412	57.0	17784	0.7619	0.7473
0.2241	58.0	18096	0.6815	0.7437
0.2241	59.0	18408	0.7661	0.7401
0.2293	60.0	18720	0.7658	0.7401

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824023516

20230824023516

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824023516

Evaluation results