20230901120149

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1576
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.1594	0.5
0.1863	2.0	680	0.1639	0.5
0.1705	3.0	1020	0.1604	0.5
0.1705	4.0	1360	0.1572	0.5
0.1659	5.0	1700	0.1604	0.5
0.1635	6.0	2040	0.1674	0.5
0.1635	7.0	2380	0.1568	0.5
0.1633	8.0	2720	0.1633	0.5
0.1599	9.0	3060	0.1611	0.5
0.1599	10.0	3400	0.1636	0.5
0.1615	11.0	3740	0.1574	0.5
0.1606	12.0	4080	0.1632	0.5
0.1606	13.0	4420	0.1579	0.5
0.1594	14.0	4760	0.1623	0.5
0.1698	15.0	5100	0.1623	0.5
0.1698	16.0	5440	0.1614	0.5
0.168	17.0	5780	0.1579	0.5
0.1626	18.0	6120	0.1586	0.5
0.1626	19.0	6460	0.1565	0.5
0.1604	20.0	6800	0.1574	0.5
0.1595	21.0	7140	0.1601	0.5
0.1595	22.0	7480	0.1675	0.5
0.1615	23.0	7820	0.1602	0.5
0.1669	24.0	8160	0.1604	0.5
0.1677	25.0	8500	0.1635	0.5
0.1677	26.0	8840	0.1603	0.5
0.1666	27.0	9180	0.1614	0.5
0.1656	28.0	9520	0.1609	0.5
0.1656	29.0	9860	0.1625	0.5
0.1668	30.0	10200	0.1624	0.5
0.1658	31.0	10540	0.1702	0.5
0.1658	32.0	10880	0.1606	0.5
0.166	33.0	11220	0.1657	0.5
0.1674	34.0	11560	0.1619	0.5
0.1674	35.0	11900	0.1585	0.5
0.1636	36.0	12240	0.1592	0.5
0.1612	37.0	12580	0.1568	0.5
0.1612	38.0	12920	0.1607	0.5
0.159	39.0	13260	0.1577	0.5
0.1586	40.0	13600	0.1566	0.5
0.1586	41.0	13940	0.1584	0.5
0.1587	42.0	14280	0.1620	0.5
0.1577	43.0	14620	0.1571	0.5
0.1577	44.0	14960	0.1610	0.5
0.1587	45.0	15300	0.1576	0.5
0.1578	46.0	15640	0.1577	0.5
0.1578	47.0	15980	0.1570	0.5
0.1592	48.0	16320	0.1578	0.5
0.1578	49.0	16660	0.1565	0.5
0.1582	50.0	17000	0.1581	0.5
0.1582	51.0	17340	0.1571	0.5
0.1569	52.0	17680	0.1585	0.5
0.1586	53.0	18020	0.1566	0.5
0.1586	54.0	18360	0.1579	0.5
0.1576	55.0	18700	0.1578	0.5
0.1577	56.0	19040	0.1581	0.5
0.1577	57.0	19380	0.1566	0.5
0.1571	58.0	19720	0.1572	0.5
0.1578	59.0	20060	0.1562	0.5
0.1578	60.0	20400	0.1579	0.5
0.157	61.0	20740	0.1578	0.5
0.157	62.0	21080	0.1566	0.5
0.157	63.0	21420	0.1572	0.5
0.1562	64.0	21760	0.1594	0.5
0.1584	65.0	22100	0.1582	0.5
0.1584	66.0	22440	0.1566	0.5
0.1549	67.0	22780	0.1579	0.5
0.1582	68.0	23120	0.1587	0.5
0.1582	69.0	23460	0.1580	0.5
0.157	70.0	23800	0.1580	0.5
0.1563	71.0	24140	0.1585	0.5
0.1563	72.0	24480	0.1576	0.5
0.1562	73.0	24820	0.1570	0.5
0.1566	74.0	25160	0.1576	0.5
0.156	75.0	25500	0.1570	0.5
0.156	76.0	25840	0.1575	0.5
0.1566	77.0	26180	0.1584	0.5
0.1561	78.0	26520	0.1572	0.5
0.1561	79.0	26860	0.1580	0.5
0.1561	80.0	27200	0.1576	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230901120149

20230901120149

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for dkqjrm/20230901120149

Dataset used to train dkqjrm/20230901120149

Evaluation results