Edit model card

MSc_llama2_finetuned_model_secondData6

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6856

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

The following bitsandbytes quantization config was used during training:

  • quant_method: bitsandbytes
  • _load_in_8bit: False
  • _load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: bfloat16
  • load_in_4bit: True
  • load_in_8bit: False

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 250

Training results

Training Loss Epoch Step Validation Loss
3.9863 1.33 10 3.6593
3.3725 2.67 20 2.9649
2.6441 4.0 30 2.1968
1.9553 5.33 40 1.7116
1.6093 6.67 50 1.4445
1.317 8.0 60 1.1217
0.9709 9.33 70 0.8562
0.8196 10.67 80 0.7974
0.7604 12.0 90 0.7608
0.7056 13.33 100 0.7340
0.6698 14.67 110 0.7142
0.6319 16.0 120 0.7030
0.6102 17.33 130 0.6942
0.5813 18.67 140 0.6916
0.572 20.0 150 0.6906
0.5581 21.33 160 0.6842
0.5377 22.67 170 0.6850
0.535 24.0 180 0.6862
0.5263 25.33 190 0.6841
0.5182 26.67 200 0.6861
0.5204 28.0 210 0.6857
0.5161 29.33 220 0.6855
0.5084 30.67 230 0.6858
0.5144 32.0 240 0.6863
0.5104 33.33 250 0.6856

Framework versions

  • PEFT 0.4.0
  • Transformers 4.38.2
  • Pytorch 2.3.1+cu121
  • Datasets 2.13.1
  • Tokenizers 0.15.2
Downloads last month
119
Unable to determine this model’s pipeline type. Check the docs .

Adapter for