20230901000318

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1391
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0007
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.1391	0.5
0.1491	2.0	680	0.1390	0.5
0.1436	3.0	1020	0.1392	0.5
0.1436	4.0	1360	0.1396	0.5
0.1421	5.0	1700	0.1444	0.5
0.1411	6.0	2040	0.1388	0.5
0.1411	7.0	2380	0.1390	0.5
0.142	8.0	2720	0.1388	0.5
0.1402	9.0	3060	0.1392	0.5
0.1402	10.0	3400	0.1396	0.5
0.1414	11.0	3740	0.1389	0.5
0.141	12.0	4080	0.1390	0.5
0.141	13.0	4420	0.1396	0.5
0.1407	14.0	4760	0.1421	0.5
0.1425	15.0	5100	0.1411	0.5
0.1425	16.0	5440	0.1397	0.5
0.1417	17.0	5780	0.1388	0.5
0.1393	18.0	6120	0.1397	0.5
0.1393	19.0	6460	0.1409	0.5
0.1406	20.0	6800	0.1389	0.5
0.1404	21.0	7140	0.1391	0.5
0.1404	22.0	7480	0.1404	0.5
0.1406	23.0	7820	0.1398	0.5
0.1399	24.0	8160	0.1389	0.5
0.1411	25.0	8500	0.1388	0.5
0.1411	26.0	8840	0.1398	0.5
0.1405	27.0	9180	0.1388	0.5
0.1399	28.0	9520	0.1398	0.5
0.1399	29.0	9860	0.1421	0.5
0.1406	30.0	10200	0.1407	0.5
0.14	31.0	10540	0.1388	0.5
0.14	32.0	10880	0.1408	0.5
0.1402	33.0	11220	0.1402	0.5
0.1418	34.0	11560	0.1386	0.5
0.1418	35.0	11900	0.1385	0.5
0.139	36.0	12240	0.1374	0.5
0.1371	37.0	12580	0.1408	0.5
0.1371	38.0	12920	0.1427	0.5
0.1353	39.0	13260	0.1379	0.5
0.1346	40.0	13600	0.1398	0.5
0.1346	41.0	13940	0.1412	0.5
0.1343	42.0	14280	0.1373	0.5
0.1329	43.0	14620	0.1386	0.5
0.1329	44.0	14960	0.1374	0.5
0.1335	45.0	15300	0.1387	0.5
0.1319	46.0	15640	0.1366	0.5
0.1319	47.0	15980	0.1371	0.5
0.1326	48.0	16320	0.1395	0.5
0.1313	49.0	16660	0.1379	0.5
0.131	50.0	17000	0.1401	0.5
0.131	51.0	17340	0.1417	0.5
0.1302	52.0	17680	0.1390	0.5
0.1313	53.0	18020	0.1367	0.5
0.1313	54.0	18360	0.1392	0.5
0.13	55.0	18700	0.1381	0.5
0.1299	56.0	19040	0.1397	0.5
0.1299	57.0	19380	0.1381	0.5
0.1293	58.0	19720	0.1376	0.5
0.13	59.0	20060	0.1376	0.5
0.13	60.0	20400	0.1395	0.5
0.1291	61.0	20740	0.1385	0.5
0.129	62.0	21080	0.1385	0.5
0.129	63.0	21420	0.1377	0.5
0.1282	64.0	21760	0.1390	0.5
0.1297	65.0	22100	0.1389	0.5
0.1297	66.0	22440	0.1369	0.5
0.1267	67.0	22780	0.1395	0.5
0.129	68.0	23120	0.1403	0.5
0.129	69.0	23460	0.1390	0.5
0.1282	70.0	23800	0.1393	0.5
0.1277	71.0	24140	0.1396	0.5
0.1277	72.0	24480	0.1391	0.5
0.1273	73.0	24820	0.1389	0.5
0.1279	74.0	25160	0.1398	0.5
0.1272	75.0	25500	0.1393	0.5
0.1272	76.0	25840	0.1392	0.5
0.1277	77.0	26180	0.1397	0.5
0.1271	78.0	26520	0.1386	0.5
0.1271	79.0	26860	0.1394	0.5
0.127	80.0	27200	0.1391	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230901000318

20230901000318

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230901000318

Evaluation results