20230901021145

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1391
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.1414	0.5
0.1482	2.0	680	0.1390	0.5
0.142	3.0	1020	0.1382	0.5
0.142	4.0	1360	0.1373	0.5
0.1386	5.0	1700	0.1375	0.5
0.1375	6.0	2040	0.1394	0.5
0.1375	7.0	2380	0.1433	0.5
0.1394	8.0	2720	0.1410	0.5
0.1398	9.0	3060	0.1393	0.5
0.1398	10.0	3400	0.1396	0.5
0.1411	11.0	3740	0.1393	0.5
0.1404	12.0	4080	0.1389	0.5
0.1404	13.0	4420	0.1402	0.5
0.1394	14.0	4760	0.1388	0.5
0.1412	15.0	5100	0.1392	0.5
0.1412	16.0	5440	0.1390	0.5
0.141	17.0	5780	0.1401	0.5
0.1388	18.0	6120	0.1388	0.5
0.1388	19.0	6460	0.1395	0.5
0.14	20.0	6800	0.1394	0.5
0.1397	21.0	7140	0.1388	0.5
0.1397	22.0	7480	0.1399	0.5
0.1402	23.0	7820	0.1403	0.5
0.1373	24.0	8160	0.1373	0.5
0.1358	25.0	8500	0.1381	0.5
0.1358	26.0	8840	0.1373	0.5
0.1341	27.0	9180	0.1395	0.5
0.1333	28.0	9520	0.1393	0.5
0.1333	29.0	9860	0.1376	0.5
0.1333	30.0	10200	0.1374	0.5
0.1326	31.0	10540	0.1388	0.5
0.1326	32.0	10880	0.1386	0.5
0.1323	33.0	11220	0.1406	0.5
0.1321	34.0	11560	0.1391	0.5
0.1321	35.0	11900	0.1381	0.5
0.132	36.0	12240	0.1376	0.5
0.1317	37.0	12580	0.1402	0.5
0.1317	38.0	12920	0.1387	0.5
0.1307	39.0	13260	0.1374	0.5
0.1303	40.0	13600	0.1377	0.5
0.1303	41.0	13940	0.1399	0.5
0.1307	42.0	14280	0.1403	0.5
0.1296	43.0	14620	0.1394	0.5
0.1296	44.0	14960	0.1413	0.5
0.1306	45.0	15300	0.1395	0.5
0.1294	46.0	15640	0.1384	0.5
0.1294	47.0	15980	0.1373	0.5
0.1301	48.0	16320	0.1392	0.5
0.1292	49.0	16660	0.1384	0.5
0.1291	50.0	17000	0.1399	0.5
0.1291	51.0	17340	0.1402	0.5
0.1286	52.0	17680	0.1399	0.5
0.1297	53.0	18020	0.1387	0.5
0.1297	54.0	18360	0.1384	0.5
0.1286	55.0	18700	0.1390	0.5
0.1285	56.0	19040	0.1401	0.5
0.1285	57.0	19380	0.1389	0.5
0.1283	58.0	19720	0.1391	0.5
0.1287	59.0	20060	0.1382	0.5
0.1287	60.0	20400	0.1406	0.5
0.1278	61.0	20740	0.1402	0.5
0.1278	62.0	21080	0.1384	0.5
0.1278	63.0	21420	0.1385	0.5
0.1275	64.0	21760	0.1387	0.5
0.1289	65.0	22100	0.1389	0.5
0.1289	66.0	22440	0.1386	0.5
0.1259	67.0	22780	0.1389	0.5
0.1285	68.0	23120	0.1399	0.5
0.1285	69.0	23460	0.1404	0.5
0.1277	70.0	23800	0.1390	0.5
0.1271	71.0	24140	0.1394	0.5
0.1271	72.0	24480	0.1388	0.5
0.1267	73.0	24820	0.1388	0.5
0.1276	74.0	25160	0.1394	0.5
0.1267	75.0	25500	0.1395	0.5
0.1267	76.0	25840	0.1392	0.5
0.1272	77.0	26180	0.1395	0.5
0.1269	78.0	26520	0.1390	0.5
0.1269	79.0	26860	0.1391	0.5
0.1268	80.0	27200	0.1391	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230901021145

20230901021145

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230901021145

Evaluation results