4e-3_10_0.1

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6572
Accuracy: 0.7545

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.004
train_batch_size: 8
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	312	1.1789	0.5271
0.9223	2.0	624	0.8795	0.4729
0.9223	3.0	936	0.6489	0.5668
0.816	4.0	1248	0.6147	0.5632
0.8543	5.0	1560	0.6493	0.6534
0.8543	6.0	1872	0.9731	0.6137
0.7269	7.0	2184	0.9628	0.6029
0.7269	8.0	2496	0.7051	0.6751
0.6757	9.0	2808	0.6159	0.7184
0.649	10.0	3120	0.9342	0.5993
0.649	11.0	3432	0.6097	0.6931
0.6568	12.0	3744	0.6755	0.7004
0.5909	13.0	4056	0.6391	0.7004
0.5909	14.0	4368	0.6791	0.7329
0.543	15.0	4680	0.5279	0.7076
0.543	16.0	4992	0.6385	0.6787
0.4908	17.0	5304	0.7443	0.6931
0.4347	18.0	5616	0.5453	0.7365
0.4347	19.0	5928	0.5740	0.7401
0.4282	20.0	6240	0.7645	0.7256
0.3796	21.0	6552	0.6200	0.7329
0.3796	22.0	6864	0.5916	0.7509
0.3584	23.0	7176	0.6890	0.7545
0.3584	24.0	7488	0.6155	0.7329
0.3471	25.0	7800	0.6455	0.7473
0.3148	26.0	8112	0.6069	0.7545
0.3148	27.0	8424	0.6410	0.7401
0.317	28.0	8736	0.6373	0.7473
0.2959	29.0	9048	0.5946	0.7545
0.2959	30.0	9360	0.6236	0.7545
0.2748	31.0	9672	0.6449	0.7473
0.2748	32.0	9984	0.5963	0.7473
0.2687	33.0	10296	0.6619	0.7401
0.2561	34.0	10608	0.7464	0.7473
0.2561	35.0	10920	0.6339	0.7581
0.2478	36.0	11232	0.6020	0.7509
0.2426	37.0	11544	0.7438	0.7329
0.2426	38.0	11856	0.5934	0.7581
0.2339	39.0	12168	0.6048	0.7581
0.2339	40.0	12480	0.6533	0.7545
0.2252	41.0	12792	0.6122	0.7617
0.2179	42.0	13104	0.6366	0.7762
0.2179	43.0	13416	0.6808	0.7256
0.2232	44.0	13728	0.6474	0.7581
0.214	45.0	14040	0.6993	0.7545
0.214	46.0	14352	0.6351	0.7545
0.2085	47.0	14664	0.6343	0.7509
0.2085	48.0	14976	0.5988	0.7726
0.2059	49.0	15288	0.6607	0.7581
0.2084	50.0	15600	0.6370	0.7581
0.2084	51.0	15912	0.6143	0.7653
0.2018	52.0	16224	0.6106	0.7545
0.2032	53.0	16536	0.6739	0.7473
0.2032	54.0	16848	0.6540	0.7545
0.1993	55.0	17160	0.6367	0.7545
0.1993	56.0	17472	0.6510	0.7545
0.1964	57.0	17784	0.6427	0.7617
0.1877	58.0	18096	0.6658	0.7581
0.1877	59.0	18408	0.6553	0.7581
0.1895	60.0	18720	0.6572	0.7545

Framework versions

Transformers 4.30.0
Pytorch 2.0.1+cu117
Datasets 2.14.4
Tokenizers 0.13.3

Onutoa
/

4e-3_10_0.1

4e-3_10_0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train Onutoa/4e-3_10_0.1

Evaluation results