Edit model card

MSc_llama2_finetuned_model_secondData

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1306

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

The following bitsandbytes quantization config was used during training:

  • quant_method: bitsandbytes
  • _load_in_8bit: False
  • _load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: bfloat16
  • load_in_4bit: True
  • load_in_8bit: False

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 250

Training results

Training Loss Epoch Step Validation Loss
3.7467 1.33 10 3.0367
2.2464 2.67 20 1.6408
1.2865 4.0 30 0.8977
0.8091 5.33 40 0.7854
0.6926 6.67 50 0.7160
0.6008 8.0 60 0.6834
0.5244 9.33 70 0.6721
0.4661 10.67 80 0.6794
0.4179 12.0 90 0.6977
0.368 13.33 100 0.7334
0.3276 14.67 110 0.7796
0.2989 16.0 120 0.8142
0.2692 17.33 130 0.8650
0.2468 18.67 140 0.9280
0.2356 20.0 150 0.9482
0.2172 21.33 160 0.9970
0.2093 22.67 170 1.0435
0.2031 24.0 180 1.0563
0.1933 25.33 190 1.0916
0.1906 26.67 200 1.1033
0.1864 28.0 210 1.1115
0.1822 29.33 220 1.1225
0.1821 30.67 230 1.1291
0.1803 32.0 240 1.1308
0.1799 33.33 250 1.1306

Framework versions

  • PEFT 0.4.0
  • Transformers 4.38.2
  • Pytorch 2.3.1+cu121
  • Datasets 2.13.1
  • Tokenizers 0.15.2
Downloads last month
50
Unable to determine this model’s pipeline type. Check the docs .

Adapter for