GUE_tf_2-seqsight_8192_512_30M-L32_all

This model is a fine-tuned version of mahdibaghbanzadeh/seqsight_8192_512_30M on the mahdibaghbanzadeh/GUE_tf_2 dataset. It achieves the following results on the evaluation set:

Loss: 0.6338
F1 Score: 0.6884
Accuracy: 0.689

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 1536
eval_batch_size: 1536
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
training_steps: 10000

Training results

Training Loss	Epoch	Step	Validation Loss	F1 Score	Accuracy
0.63	15.38	200	0.6302	0.6461	0.647
0.5337	30.77	400	0.6630	0.6424	0.644
0.4737	46.15	600	0.6934	0.6516	0.654
0.4271	61.54	800	0.7213	0.6735	0.674
0.3909	76.92	1000	0.7608	0.6702	0.671
0.362	92.31	1200	0.7714	0.6624	0.663
0.3431	107.69	1400	0.8214	0.6710	0.671
0.3246	123.08	1600	0.8769	0.6568	0.657
0.3089	138.46	1800	0.8430	0.6725	0.673
0.2939	153.85	2000	0.9266	0.6689	0.669
0.2794	169.23	2200	0.9087	0.6697	0.67
0.2673	184.62	2400	0.9141	0.6609	0.661
0.2546	200.0	2600	0.9812	0.6516	0.652
0.245	215.38	2800	0.9577	0.6570	0.657
0.2333	230.77	3000	0.9936	0.6489	0.649
0.2256	246.15	3200	0.9704	0.6550	0.655
0.2166	261.54	3400	1.0434	0.6478	0.648
0.208	276.92	3600	1.0574	0.664	0.664
0.1987	292.31	3800	1.1171	0.6540	0.654
0.191	307.69	4000	1.0810	0.6529	0.653
0.1841	323.08	4200	1.0971	0.6434	0.645
0.1783	338.46	4400	1.1030	0.6538	0.654
0.1729	353.85	4600	1.0723	0.6549	0.655
0.1663	369.23	4800	1.1525	0.6540	0.654
0.1611	384.62	5000	1.1418	0.6589	0.659
0.156	400.0	5200	1.1778	0.6520	0.652
0.1516	415.38	5400	1.1558	0.6560	0.656
0.1481	430.77	5600	1.1824	0.6470	0.647
0.1441	446.15	5800	1.1839	0.6510	0.651
0.1399	461.54	6000	1.1635	0.6460	0.646
0.1354	476.92	6200	1.2265	0.6527	0.653
0.1324	492.31	6400	1.2001	0.6590	0.659
0.1304	507.69	6600	1.2135	0.6508	0.651
0.1257	523.08	6800	1.2496	0.6550	0.655
0.1236	538.46	7000	1.2449	0.6470	0.647
0.1205	553.85	7200	1.2688	0.6550	0.655
0.1188	569.23	7400	1.2710	0.6639	0.664
0.1157	584.62	7600	1.2893	0.6540	0.654
0.1135	600.0	7800	1.2557	0.6520	0.652
0.1117	615.38	8000	1.2621	0.6490	0.649
0.1097	630.77	8200	1.2867	0.6460	0.646
0.1081	646.15	8400	1.2929	0.6510	0.651
0.1077	661.54	8600	1.2848	0.6598	0.66
0.1061	676.92	8800	1.2900	0.6479	0.648
0.1043	692.31	9000	1.2882	0.648	0.648
0.1062	707.69	9200	1.2893	0.6560	0.656
0.1035	723.08	9400	1.3024	0.6560	0.656
0.1025	738.46	9600	1.2972	0.6620	0.662
0.1017	753.85	9800	1.3034	0.6580	0.658
0.1013	769.23	10000	1.3126	0.6560	0.656

Framework versions

PEFT 0.9.0
Transformers 4.38.2
Pytorch 2.2.0+cu121
Datasets 2.17.1
Tokenizers 0.15.2

mahdibaghbanzadeh
/

GUE_tf_2-seqsight_8192_512_30M-L32_all

GUE_tf_2-seqsight_8192_512_30M-L32_all

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results