Edit model card

sft-fsi-2

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 6.0965

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss
22.8999 1.0 15 23.7511
22.0567 2.0 30 22.3950
18.1671 3.0 45 18.1863
16.7541 4.0 60 16.6264
13.8558 5.0 75 13.4130
11.5737 6.0 90 11.3170
10.5806 7.0 105 10.1712
9.6941 8.0 120 9.4425
9.4024 9.0 135 8.8622
8.8898 10.0 150 8.2339
7.9189 11.0 165 7.6595
7.7419 12.0 180 7.2258
7.0499 13.0 195 6.8765
6.8429 14.0 210 6.6033
6.6684 15.0 225 6.3930
6.3101 16.0 240 6.2497
6.4896 17.0 255 6.1646
6.5154 18.0 270 6.1160
6.1707 19.0 285 6.0990
6.409 20.0 300 6.0965

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
1.63B params
Tensor type
BF16
·