20230823213605

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6579
Accuracy: 0.7365

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	1.6256	0.5307
0.8748	2.0	624	0.7617	0.5523
0.8748	3.0	936	0.6603	0.5271
0.7596	4.0	1248	0.6103	0.6101
0.7685	5.0	1560	0.9349	0.5668
0.7685	6.0	1872	0.8351	0.6101
0.6585	7.0	2184	0.5995	0.6823
0.6585	8.0	2496	0.5553	0.7076
0.651	9.0	2808	0.5718	0.7040
0.629	10.0	3120	0.5922	0.7040
0.629	11.0	3432	0.5775	0.7148
0.6145	12.0	3744	0.5886	0.7292
0.595	13.0	4056	0.5959	0.7076
0.595	14.0	4368	0.5683	0.7040
0.5501	15.0	4680	0.5633	0.7329
0.5501	16.0	4992	0.6229	0.7184
0.5382	17.0	5304	0.8960	0.6643
0.4987	18.0	5616	0.5098	0.7076
0.4987	19.0	5928	0.6151	0.7184
0.5146	20.0	6240	0.6031	0.7329
0.4536	21.0	6552	0.7180	0.7329
0.4536	22.0	6864	0.7608	0.7184
0.45	23.0	7176	0.7551	0.7112
0.45	24.0	7488	0.7242	0.7148
0.4336	25.0	7800	0.7373	0.7292
0.396	26.0	8112	0.7001	0.7220
0.396	27.0	8424	0.6008	0.7365
0.3851	28.0	8736	0.5931	0.7148
0.3699	29.0	9048	0.6664	0.7329
0.3699	30.0	9360	0.6632	0.7473
0.3451	31.0	9672	0.6476	0.7437
0.3451	32.0	9984	0.5929	0.7292
0.3273	33.0	10296	0.7271	0.7292
0.3025	34.0	10608	0.6819	0.7292
0.3025	35.0	10920	0.5734	0.7329
0.2981	36.0	11232	0.7307	0.7256
0.2829	37.0	11544	0.8025	0.7329
0.2829	38.0	11856	0.5696	0.7545
0.2724	39.0	12168	0.6290	0.7401
0.2724	40.0	12480	0.6417	0.7292
0.2604	41.0	12792	0.5523	0.7401
0.253	42.0	13104	0.7210	0.7365
0.253	43.0	13416	0.6005	0.7365
0.2469	44.0	13728	0.6808	0.7473
0.2492	45.0	14040	0.6506	0.7509
0.2492	46.0	14352	0.6687	0.7437
0.2413	47.0	14664	0.6401	0.7329
0.2413	48.0	14976	0.6588	0.7329
0.2356	49.0	15288	0.6625	0.7401
0.2251	50.0	15600	0.6472	0.7292
0.2251	51.0	15912	0.6800	0.7401
0.2207	52.0	16224	0.6191	0.7473
0.2127	53.0	16536	0.6478	0.7365
0.2127	54.0	16848	0.6509	0.7329
0.2217	55.0	17160	0.6644	0.7365
0.2217	56.0	17472	0.6360	0.7365
0.2094	57.0	17784	0.6509	0.7365
0.2045	58.0	18096	0.6445	0.7365
0.2045	59.0	18408	0.6659	0.7365
0.2072	60.0	18720	0.6579	0.7365

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230823213605

20230823213605

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230823213605

Evaluation results