20230831234446

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1400
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.1404	0.5
0.1505	2.0	680	0.1392	0.5
0.1421	3.0	1020	0.1408	0.5
0.1421	4.0	1360	0.1390	0.5
0.1395	5.0	1700	0.1398	0.5
0.1373	6.0	2040	0.1377	0.5
0.1373	7.0	2380	0.1365	0.5
0.1371	8.0	2720	0.1365	0.5
0.1343	9.0	3060	0.1394	0.5
0.1343	10.0	3400	0.1394	0.5
0.1349	11.0	3740	0.1383	0.5
0.1344	12.0	4080	0.1386	0.5
0.1344	13.0	4420	0.1368	0.5
0.1324	14.0	4760	0.1392	0.5
0.1334	15.0	5100	0.1380	0.5
0.1334	16.0	5440	0.1384	0.5
0.1334	17.0	5780	0.1371	0.5
0.1312	18.0	6120	0.1422	0.5
0.1312	19.0	6460	0.1376	0.5
0.1322	20.0	6800	0.1412	0.5
0.1313	21.0	7140	0.1411	0.5
0.1313	22.0	7480	0.1376	0.5
0.1318	23.0	7820	0.1390	0.5
0.1304	24.0	8160	0.1370	0.5
0.1313	25.0	8500	0.1382	0.5
0.1313	26.0	8840	0.1376	0.5
0.1302	27.0	9180	0.1403	0.5
0.1298	28.0	9520	0.1395	0.5
0.1298	29.0	9860	0.1371	0.5
0.1299	30.0	10200	0.1371	0.5
0.1295	31.0	10540	0.1400	0.5
0.1295	32.0	10880	0.1389	0.5
0.1293	33.0	11220	0.1391	0.5
0.129	34.0	11560	0.1407	0.5
0.129	35.0	11900	0.1388	0.5
0.1295	36.0	12240	0.1397	0.5
0.1289	37.0	12580	0.1391	0.5
0.1289	38.0	12920	0.1409	0.5
0.1282	39.0	13260	0.1382	0.5
0.1282	40.0	13600	0.1385	0.5
0.1282	41.0	13940	0.1388	0.5
0.1284	42.0	14280	0.1398	0.5
0.1276	43.0	14620	0.1386	0.5
0.1276	44.0	14960	0.1405	0.5
0.1285	45.0	15300	0.1391	0.5
0.1275	46.0	15640	0.1380	0.5
0.1275	47.0	15980	0.1379	0.5
0.1283	48.0	16320	0.1387	0.5
0.1274	49.0	16660	0.1392	0.5
0.1274	50.0	17000	0.1392	0.5
0.1274	51.0	17340	0.1400	0.5
0.1268	52.0	17680	0.1395	0.5
0.1278	53.0	18020	0.1388	0.5
0.1278	54.0	18360	0.1406	0.5
0.127	55.0	18700	0.1395	0.5
0.1272	56.0	19040	0.1403	0.5
0.1272	57.0	19380	0.1398	0.5
0.1268	58.0	19720	0.1399	0.5
0.1273	59.0	20060	0.1385	0.5
0.1273	60.0	20400	0.1407	0.5
0.1265	61.0	20740	0.1398	0.5
0.1266	62.0	21080	0.1398	0.5
0.1266	63.0	21420	0.1394	0.5
0.1261	64.0	21760	0.1394	0.5
0.1276	65.0	22100	0.1398	0.5
0.1276	66.0	22440	0.1391	0.5
0.1247	67.0	22780	0.1405	0.5
0.1274	68.0	23120	0.1410	0.5
0.1274	69.0	23460	0.1407	0.5
0.1266	70.0	23800	0.1403	0.5
0.126	71.0	24140	0.1406	0.5
0.126	72.0	24480	0.1395	0.5
0.1258	73.0	24820	0.1402	0.5
0.1264	74.0	25160	0.1397	0.5
0.1259	75.0	25500	0.1402	0.5
0.1259	76.0	25840	0.1399	0.5
0.1263	77.0	26180	0.1400	0.5
0.1259	78.0	26520	0.1399	0.5
0.1259	79.0	26860	0.1401	0.5
0.1259	80.0	27200	0.1400	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230831234446

20230831234446

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230831234446

Evaluation results