sft-fsi

This model is a fine-tuned version of dynamofl/dynamo-1.6B-v0.4-mosaic-dynamoDPO-iter0-2978 on the dynamofl/train-default-FSI-PersonalFinancialAdvice-input-formatted-chatml dataset. It achieves the following results on the evaluation set:

Loss: 0.5351

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-07
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
17.4572	1.0	15	17.4173
16.0	2.0	30	15.7598
13.4471	3.0	45	12.5953
12.2511	4.0	60	11.4095
11.6039	5.0	75	10.9817
10.2763	6.0	90	9.5031
9.0617	7.0	105	8.6753
8.4037	8.0	120	7.9506
7.9582	9.0	135	7.1144
6.9666	10.0	150	6.4290
6.4408	11.0	165	5.7028
5.3708	12.0	180	4.4972
4.8165	13.0	195	3.8790
4.0134	14.0	210	2.8529
3.3501	15.0	225	2.3522
3.1818	16.0	240	1.9746
2.338	17.0	255	1.7629
2.0088	18.0	270	1.6484
2.293	19.0	285	1.4006
1.82	20.0	300	1.3265
1.8957	21.0	315	1.2599
1.5477	22.0	330	1.1908
1.3785	23.0	345	1.1995
1.5653	24.0	360	1.1229
1.5203	25.0	375	1.1563
1.7603	26.0	390	1.1781
1.3828	27.0	405	0.9678
1.1726	28.0	420	1.0369
1.4392	29.0	435	1.0777
1.1965	30.0	450	0.9542
1.1961	31.0	465	0.9490
1.1002	32.0	480	0.8936
1.3295	33.0	495	1.0326
1.0566	34.0	510	1.0263
1.1966	35.0	525	0.9777
1.1547	36.0	540	0.8877
1.2921	37.0	555	0.8489
1.0368	38.0	570	0.8568
1.0894	39.0	585	0.9249
1.182	40.0	600	0.9296
1.1232	41.0	615	0.9656
1.034	42.0	630	0.8042
1.1033	43.0	645	0.8467
1.0659	44.0	660	0.8005
0.9365	45.0	675	0.8196
0.9452	46.0	690	0.7149
0.9357	47.0	705	0.7847
0.9167	48.0	720	0.6707
0.884	49.0	735	0.6987
0.9829	50.0	750	0.7260
0.7688	51.0	765	0.7078
0.9165	52.0	780	0.6694
1.074	53.0	795	0.7018
0.9647	54.0	810	0.6790
0.9155	55.0	825	0.6542
0.8819	56.0	840	0.6652
0.7332	57.0	855	0.6124
0.8385	58.0	870	0.6184
0.7709	59.0	885	0.6434
0.9069	60.0	900	0.6387
0.8426	61.0	915	0.5717
0.8469	62.0	930	0.6204
0.7304	63.0	945	0.6720
0.7256	64.0	960	0.5895
0.6442	65.0	975	0.6164
0.744	66.0	990	0.5816
0.7043	67.0	1005	0.6566
0.8757	68.0	1020	0.6042
0.7355	69.0	1035	0.5842
0.7304	70.0	1050	0.5986
0.8012	71.0	1065	0.6174
0.7211	72.0	1080	0.5787
0.7411	73.0	1095	0.5619
0.8447	74.0	1110	0.5611
0.7919	75.0	1125	0.6355
0.6498	76.0	1140	0.5658
0.682	77.0	1155	0.5776
0.7562	78.0	1170	0.6282
0.7869	79.0	1185	0.5271
0.7478	80.0	1200	0.5542
0.7653	81.0	1215	0.5682
0.7067	82.0	1230	0.6346
0.691	83.0	1245	0.5932
0.7489	84.0	1260	0.5724
0.694	85.0	1275	0.5307
0.7985	86.0	1290	0.6010
0.7029	87.0	1305	0.5514
0.7678	88.0	1320	0.5660
0.7885	89.0	1335	0.5434
0.6703	90.0	1350	0.5838
0.7028	91.0	1365	0.5275
0.7731	92.0	1380	0.5433
0.6815	93.0	1395	0.5619
0.5923	94.0	1410	0.5609
0.7039	95.0	1425	0.5246
0.7842	96.0	1440	0.5473
0.7001	97.0	1455	0.5467
0.7169	98.0	1470	0.5881
0.6552	99.0	1485	0.5636
0.6765	100.0	1500	0.5571

Framework versions

Transformers 4.40.1
Pytorch 2.3.0+cu121
Datasets 2.19.0
Tokenizers 0.19.1

jamesoneill12
/

sft-fsi

sft-fsi

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results