20230822202124

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.4836
Accuracy: 0.7437

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	156	0.5548	0.4693
No log	2.0	312	0.5565	0.4838
No log	3.0	468	0.5531	0.4729
0.6259	4.0	624	0.5810	0.4729
0.6259	5.0	780	0.6010	0.5596
0.6259	6.0	936	0.4969	0.6462
0.5907	7.0	1092	0.7982	0.5487
0.5907	8.0	1248	0.4883	0.6318
0.5907	9.0	1404	0.4714	0.6931
0.5602	10.0	1560	0.9236	0.5560
0.5602	11.0	1716	0.4972	0.6968
0.5602	12.0	1872	0.5116	0.6895
0.5015	13.0	2028	0.4913	0.7076
0.5015	14.0	2184	0.4683	0.7112
0.5015	15.0	2340	0.5265	0.6895
0.5015	16.0	2496	0.4616	0.7040
0.4782	17.0	2652	0.5788	0.6679
0.4782	18.0	2808	0.4471	0.7292
0.4782	19.0	2964	0.4588	0.7545
0.4628	20.0	3120	0.6477	0.6426
0.4628	21.0	3276	0.5305	0.6968
0.4628	22.0	3432	0.4549	0.7292
0.4248	23.0	3588	0.5101	0.7256
0.4248	24.0	3744	0.4763	0.7184
0.4248	25.0	3900	0.5809	0.6895
0.4067	26.0	4056	0.4461	0.7473
0.4067	27.0	4212	0.4460	0.7473
0.4067	28.0	4368	0.4454	0.7509
0.3941	29.0	4524	0.4664	0.7365
0.3941	30.0	4680	0.5039	0.7292
0.3941	31.0	4836	0.4548	0.7473
0.3941	32.0	4992	0.4484	0.7437
0.3749	33.0	5148	0.4924	0.7473
0.3749	34.0	5304	0.4569	0.7473
0.3749	35.0	5460	0.4604	0.7617
0.3586	36.0	5616	0.4448	0.7653
0.3586	37.0	5772	0.4768	0.7365
0.3586	38.0	5928	0.5052	0.7473
0.3521	39.0	6084	0.5167	0.7329
0.3521	40.0	6240	0.4425	0.7509
0.3521	41.0	6396	0.4730	0.7545
0.3407	42.0	6552	0.4624	0.7509
0.3407	43.0	6708	0.4847	0.7509
0.3407	44.0	6864	0.5371	0.7329
0.3329	45.0	7020	0.4841	0.7545
0.3329	46.0	7176	0.4815	0.7365
0.3329	47.0	7332	0.4678	0.7509
0.3329	48.0	7488	0.4918	0.7473
0.3235	49.0	7644	0.4592	0.7581
0.3235	50.0	7800	0.5005	0.7437
0.3235	51.0	7956	0.4777	0.7545
0.3193	52.0	8112	0.4558	0.7545
0.3193	53.0	8268	0.4870	0.7437
0.3193	54.0	8424	0.4792	0.7437
0.3132	55.0	8580	0.4673	0.7437
0.3132	56.0	8736	0.4943	0.7437
0.3132	57.0	8892	0.4970	0.7437
0.311	58.0	9048	0.4914	0.7401
0.311	59.0	9204	0.4887	0.7437
0.311	60.0	9360	0.4836	0.7437

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230822202124

20230822202124

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230822202124

Evaluation results