20230901065829

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1567
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.1589	0.5
0.1842	2.0	680	0.1722	0.5
0.1701	3.0	1020	0.1582	0.5
0.1701	4.0	1360	0.1563	0.5
0.1657	5.0	1700	0.1575	0.5
0.163	6.0	2040	0.1586	0.5
0.163	7.0	2380	0.1568	0.5
0.1627	8.0	2720	0.1596	0.5
0.1588	9.0	3060	0.1578	0.5
0.1588	10.0	3400	0.1597	0.5
0.1598	11.0	3740	0.1567	0.5
0.159	12.0	4080	0.1583	0.5
0.159	13.0	4420	0.1567	0.5
0.157	14.0	4760	0.1640	0.5
0.1588	15.0	5100	0.1564	0.5
0.1588	16.0	5440	0.1555	0.5
0.1595	17.0	5780	0.1556	0.5
0.1566	18.0	6120	0.1562	0.5
0.1566	19.0	6460	0.1562	0.5
0.1578	20.0	6800	0.1559	0.5
0.1573	21.0	7140	0.1605	0.5
0.1573	22.0	7480	0.1802	0.5
0.1629	23.0	7820	0.1601	0.5
0.1669	24.0	8160	0.1598	0.5
0.1678	25.0	8500	0.1600	0.5
0.1678	26.0	8840	0.1604	0.5
0.1659	27.0	9180	0.1600	0.5
0.1653	28.0	9520	0.1565	0.5
0.1653	29.0	9860	0.1561	0.5
0.1593	30.0	10200	0.1555	0.5
0.1573	31.0	10540	0.1601	0.5
0.1573	32.0	10880	0.1568	0.5
0.157	33.0	11220	0.1621	0.5
0.1569	34.0	11560	0.1580	0.5
0.1569	35.0	11900	0.1565	0.5
0.1575	36.0	12240	0.1565	0.5
0.1566	37.0	12580	0.1592	0.5
0.1566	38.0	12920	0.1584	0.5
0.1557	39.0	13260	0.1572	0.5
0.156	40.0	13600	0.1580	0.5
0.156	41.0	13940	0.1587	0.5
0.1566	42.0	14280	0.1573	0.5
0.1553	43.0	14620	0.1565	0.5
0.1553	44.0	14960	0.1621	0.5
0.1567	45.0	15300	0.1576	0.5
0.1557	46.0	15640	0.1574	0.5
0.1557	47.0	15980	0.1558	0.5
0.1571	48.0	16320	0.1557	0.5
0.1558	49.0	16660	0.1556	0.5
0.1559	50.0	17000	0.1569	0.5
0.1559	51.0	17340	0.1558	0.5
0.1549	52.0	17680	0.1561	0.5
0.1566	53.0	18020	0.1557	0.5
0.1566	54.0	18360	0.1563	0.5
0.1557	55.0	18700	0.1562	0.5
0.1557	56.0	19040	0.1568	0.5
0.1557	57.0	19380	0.1558	0.5
0.1553	58.0	19720	0.1557	0.5
0.1561	59.0	20060	0.1551	0.5
0.1561	60.0	20400	0.1575	0.5
0.1551	61.0	20740	0.1570	0.5
0.155	62.0	21080	0.1559	0.5
0.155	63.0	21420	0.1558	0.5
0.1544	64.0	21760	0.1577	0.5
0.1566	65.0	22100	0.1565	0.5
0.1566	66.0	22440	0.1554	0.5
0.153	67.0	22780	0.1561	0.5
0.1565	68.0	23120	0.1574	0.5
0.1565	69.0	23460	0.1574	0.5
0.1552	70.0	23800	0.1571	0.5
0.1548	71.0	24140	0.1572	0.5
0.1548	72.0	24480	0.1563	0.5
0.1546	73.0	24820	0.1563	0.5
0.1547	74.0	25160	0.1570	0.5
0.1542	75.0	25500	0.1563	0.5
0.1542	76.0	25840	0.1571	0.5
0.155	77.0	26180	0.1571	0.5
0.1545	78.0	26520	0.1561	0.5
0.1545	79.0	26860	0.1570	0.5
0.1544	80.0	27200	0.1567	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230901065829

20230901065829

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230901065829

Evaluation results