Edit model card

MSc_llama2_finetuned_model_updatePara

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4329

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

The following bitsandbytes quantization config was used during training:

  • quant_method: bitsandbytes
  • _load_in_8bit: False
  • _load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: bfloat16
  • load_in_4bit: True
  • load_in_8bit: False

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 400

Training results

Training Loss Epoch Step Validation Loss
16.5659 1.21 10 1.7444
1.1543 2.42 20 0.8871
0.7821 3.64 30 0.6674
0.6295 4.85 40 0.5773
0.5642 6.06 50 0.5303
0.5251 7.27 60 0.4983
0.4969 8.48 70 0.4813
0.4805 9.7 80 0.4689
0.4702 10.91 90 0.4585
0.4651 12.12 100 0.4512
0.4472 13.33 110 0.4462
0.4466 14.55 120 0.4415
0.4413 15.76 130 0.4385
0.4398 16.97 140 0.4398
0.4314 18.18 150 0.4382
0.4313 19.39 160 0.4313
0.4314 20.61 170 0.4294
0.4189 21.82 180 0.4279
0.4203 23.03 190 0.4285
0.4164 24.24 200 0.4279
0.4183 25.45 210 0.4268
0.4107 26.67 220 0.4268
0.4068 27.88 230 0.4252
0.4082 29.09 240 0.4266
0.4074 30.3 250 0.4299
0.4025 31.52 260 0.4266
0.4038 32.73 270 0.4264
0.4008 33.94 280 0.4284
0.4002 35.15 290 0.4279
0.3937 36.36 300 0.4307
0.4033 37.58 310 0.4299
0.3974 38.79 320 0.4307
0.3951 40.0 330 0.4312
0.3948 41.21 340 0.4327
0.3971 42.42 350 0.4321
0.3958 43.64 360 0.4327
0.3906 44.85 370 0.4333
0.3977 46.06 380 0.4328
0.3956 47.27 390 0.4330
0.3956 48.48 400 0.4329

Framework versions

  • PEFT 0.4.0
  • Transformers 4.38.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.13.1
  • Tokenizers 0.15.2
Downloads last month
71
Unable to determine this model’s pipeline type. Check the docs .

Adapter for