20230830190813

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.7333
Accuracy: 0.5141

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.7313	0.5204
0.7523	2.0	680	0.7285	0.5
0.7461	3.0	1020	0.7229	0.5063
0.7461	4.0	1360	0.7062	0.5784
0.7318	5.0	1700	0.7796	0.6034
0.7057	6.0	2040	0.8194	0.5831
0.7057	7.0	2380	0.7297	0.5
0.7178	8.0	2720	0.7423	0.5
0.7417	9.0	3060	0.7280	0.5
0.7417	10.0	3400	0.7606	0.5016
0.7399	11.0	3740	0.7346	0.5172
0.7334	12.0	4080	0.7411	0.5
0.7334	13.0	4420	0.7588	0.5
0.7332	14.0	4760	0.7427	0.4718
0.7345	15.0	5100	0.7317	0.5047
0.7345	16.0	5440	0.7394	0.5031
0.7308	17.0	5780	0.7445	0.5
0.7295	18.0	6120	0.7517	0.4718
0.7295	19.0	6460	0.7323	0.5016
0.728	20.0	6800	0.7320	0.5157
0.73	21.0	7140	0.7309	0.5172
0.73	22.0	7480	0.7434	0.4984
0.7304	23.0	7820	0.7366	0.5094
0.7298	24.0	8160	0.7334	0.5
0.7283	25.0	8500	0.7342	0.5125
0.7283	26.0	8840	0.7311	0.5047
0.7291	27.0	9180	0.7565	0.4702
0.7292	28.0	9520	0.7282	0.5031
0.7292	29.0	9860	0.7333	0.5016
0.7261	30.0	10200	0.7328	0.5125
0.7279	31.0	10540	0.7349	0.5125
0.7279	32.0	10880	0.7592	0.4702
0.7252	33.0	11220	0.7393	0.5094
0.7263	34.0	11560	0.7394	0.5047
0.7263	35.0	11900	0.7465	0.5016
0.7269	36.0	12240	0.7349	0.5141
0.7263	37.0	12580	0.7295	0.5047
0.7263	38.0	12920	0.7329	0.5172
0.728	39.0	13260	0.7401	0.5
0.7254	40.0	13600	0.7331	0.5157
0.7254	41.0	13940	0.7308	0.5172
0.7265	42.0	14280	0.7312	0.5172
0.7234	43.0	14620	0.7393	0.5
0.7234	44.0	14960	0.7392	0.5
0.7254	45.0	15300	0.7389	0.5
0.7225	46.0	15640	0.7312	0.5157
0.7225	47.0	15980	0.7335	0.5
0.7268	48.0	16320	0.7363	0.5016
0.7258	49.0	16660	0.7393	0.5031
0.7253	50.0	17000	0.7306	0.5047
0.7253	51.0	17340	0.7372	0.5094
0.7247	52.0	17680	0.7402	0.5
0.7248	53.0	18020	0.7355	0.5141
0.7248	54.0	18360	0.7369	0.5157
0.7237	55.0	18700	0.7320	0.5141
0.7226	56.0	19040	0.7366	0.5172
0.7226	57.0	19380	0.7315	0.5172
0.7238	58.0	19720	0.7388	0.5016
0.7228	59.0	20060	0.7347	0.5047
0.7228	60.0	20400	0.7313	0.5141
0.7245	61.0	20740	0.7330	0.5141
0.7222	62.0	21080	0.7350	0.5141
0.7222	63.0	21420	0.7314	0.5157
0.724	64.0	21760	0.7327	0.5141
0.7236	65.0	22100	0.7306	0.5172
0.7236	66.0	22440	0.7351	0.5141
0.7205	67.0	22780	0.7343	0.5125
0.7236	68.0	23120	0.7313	0.5157
0.7236	69.0	23460	0.7338	0.5172
0.7221	70.0	23800	0.7317	0.5157
0.7226	71.0	24140	0.7344	0.5141
0.7226	72.0	24480	0.7342	0.5157
0.7209	73.0	24820	0.7333	0.5157
0.7229	74.0	25160	0.7358	0.5141
0.7204	75.0	25500	0.7342	0.5157
0.7204	76.0	25840	0.7329	0.5157
0.7213	77.0	26180	0.7334	0.5141
0.7208	78.0	26520	0.7335	0.5141
0.7208	79.0	26860	0.7330	0.5141
0.7203	80.0	27200	0.7333	0.5141

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230830190813

20230830190813

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230830190813

Evaluation results