20230830203443

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.7274
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0007
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.7532	0.5
0.7604	2.0	680	0.7283	0.5
0.7635	3.0	1020	0.7745	0.5
0.7635	4.0	1360	0.8267	0.5
0.7685	5.0	1700	0.7674	0.5
0.7536	6.0	2040	0.7283	0.5
0.7536	7.0	2380	0.7315	0.5
0.7457	8.0	2720	0.7962	0.5
0.7462	9.0	3060	0.7287	0.5
0.7462	10.0	3400	0.7515	0.5
0.7445	11.0	3740	0.7305	0.5
0.7427	12.0	4080	0.7298	0.5
0.7427	13.0	4420	0.8376	0.5
0.7506	14.0	4760	0.7391	0.4749
0.7508	15.0	5100	0.7457	0.5
0.7508	16.0	5440	0.7366	0.5
0.7428	17.0	5780	0.7423	0.5
0.7418	18.0	6120	0.7331	0.5
0.7418	19.0	6460	0.7340	0.5
0.7443	20.0	6800	0.7566	0.5
0.7411	21.0	7140	0.7274	0.5
0.7411	22.0	7480	0.7503	0.5
0.7423	23.0	7820	0.7416	0.5
0.7428	24.0	8160	0.7274	0.5
0.7406	25.0	8500	0.7313	0.5
0.7406	26.0	8840	0.7513	0.5
0.7421	27.0	9180	0.7476	0.5
0.7423	28.0	9520	0.7274	0.5
0.7423	29.0	9860	0.7313	0.5
0.7381	30.0	10200	0.7274	0.5
0.739	31.0	10540	0.7276	0.5
0.739	32.0	10880	0.7727	0.5
0.7392	33.0	11220	0.7287	0.5
0.7389	34.0	11560	0.7376	0.5
0.7389	35.0	11900	0.7278	0.5
0.7391	36.0	12240	0.7296	0.5
0.7369	37.0	12580	0.7307	0.5
0.7369	38.0	12920	0.7304	0.5
0.7391	39.0	13260	0.7358	0.5
0.7366	40.0	13600	0.7298	0.5
0.7366	41.0	13940	0.7284	0.5
0.737	42.0	14280	0.7279	0.5
0.7343	43.0	14620	0.7334	0.5
0.7343	44.0	14960	0.7273	0.5
0.7358	45.0	15300	0.7468	0.5
0.7341	46.0	15640	0.7277	0.5
0.7341	47.0	15980	0.7327	0.5
0.7345	48.0	16320	0.7290	0.5
0.7357	49.0	16660	0.7518	0.5
0.7362	50.0	17000	0.7276	0.5
0.7362	51.0	17340	0.7275	0.5
0.7313	52.0	17680	0.7279	0.5
0.7357	53.0	18020	0.7307	0.5
0.7357	54.0	18360	0.7276	0.5
0.7323	55.0	18700	0.7294	0.5
0.7304	56.0	19040	0.7310	0.5
0.7304	57.0	19380	0.7278	0.5
0.7326	58.0	19720	0.7289	0.5
0.7314	59.0	20060	0.7461	0.5
0.7314	60.0	20400	0.7287	0.5
0.7319	61.0	20740	0.7337	0.5
0.7304	62.0	21080	0.7273	0.5
0.7304	63.0	21420	0.7288	0.5
0.7313	64.0	21760	0.7285	0.5
0.7317	65.0	22100	0.7285	0.5
0.7317	66.0	22440	0.7310	0.5
0.7294	67.0	22780	0.7274	0.5
0.7304	68.0	23120	0.7275	0.5
0.7304	69.0	23460	0.7281	0.5
0.7286	70.0	23800	0.7276	0.5
0.7295	71.0	24140	0.7277	0.5
0.7295	72.0	24480	0.7301	0.5
0.7292	73.0	24820	0.7277	0.5
0.7288	74.0	25160	0.7302	0.5
0.7276	75.0	25500	0.7280	0.5
0.7276	76.0	25840	0.7275	0.5
0.7281	77.0	26180	0.7274	0.5
0.727	78.0	26520	0.7275	0.5
0.727	79.0	26860	0.7275	0.5
0.7279	80.0	27200	0.7274	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230830203443

20230830203443

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230830203443

Evaluation results