Llama-31-8B_task-2_180-samples_config-3

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the GaetanMichelet/chat-60_ft_task-2, the GaetanMichelet/chat-120_ft_task-2 and the GaetanMichelet/chat-180_ft_task-2 datasets. It achieves the following results on the evaluation set:

  • Loss: 0.7140

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 150

Training results

Training Loss Epoch Step Validation Loss
1.0365 1.0 17 1.1316
1.1746 2.0 34 1.1196
1.0933 3.0 51 1.0957
0.985 4.0 68 1.0540
0.9741 5.0 85 0.9950
1.0008 6.0 102 0.9377
0.8935 7.0 119 0.8939
0.8862 8.0 136 0.8579
0.8266 9.0 153 0.8294
0.7797 10.0 170 0.8075
0.8158 11.0 187 0.7903
0.6845 12.0 204 0.7742
0.6819 13.0 221 0.7598
0.7241 14.0 238 0.7472
0.695 15.0 255 0.7365
0.6982 16.0 272 0.7272
0.622 17.0 289 0.7215
0.5905 18.0 306 0.7156
0.6121 19.0 323 0.7140
0.567 20.0 340 0.7166
0.5471 21.0 357 0.7172
0.4761 22.0 374 0.7234
0.4967 23.0 391 0.7358
0.4833 24.0 408 0.7644
0.4071 25.0 425 0.8012
0.3567 26.0 442 0.8289

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for GaetanMichelet/Llama-31-8B_task-2_180-samples_config-3

Adapter
(489)
this model

Collection including GaetanMichelet/Llama-31-8B_task-2_180-samples_config-3