hc-mistral-alpaca / README.md
caldana's picture
End of training
81396c4 verified
metadata
license: apache-2.0
library_name: peft
tags:
  - axolotl
  - generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
model-index:
  - name: hc-mistral-alpaca
    results: []

Built with Axolotl

See axolotl config

axolotl version: 0.4.0

base_model: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true

load_in_8bit: false
load_in_4bit: true
strict: false

lora_fan_in_fan_out: false
data_seed: 49
seed: 49

datasets:
  - path: sample_data/alpaca_synth_queries.jsonl
    type: sharegpt
    conversation: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.1
output_dir: ./qlora-alpaca-out
hub_model_id: caldana/hc-mistral-alpaca

adapter: qlora
lora_model_dir:

sequence_len: 896
sample_packing: false
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

wandb_project: 
wandb_entity: 

gradient_accumulation_steps: 4
micro_batch_size: 16
eval_batch_size: 16
num_epochs: 100
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
max_grad_norm: 1.0
adam_beta2: 0.95
adam_epsilon: 0.00001
save_total_limit: 12

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3

warmup_steps: 20
evals_per_epoch: 3
eval_table_size:
eval_table_max_new_tokens: 128
saves_per_epoch: 6
debug:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"
save_safetensors: true

hc-mistral-alpaca

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3648

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 49
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 20
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
1.334 0.6667 1 1.2849
1.3476 1.3333 2 1.2780
1.2981 2.0 3 1.2487
1.3157 2.6667 4 1.1840
1.1757 3.3333 5 1.0690
1.1376 4.0 6 0.9086
0.9395 4.6667 7 0.7184
0.7385 5.3333 8 0.5617
0.5541 6.0 9 0.4307
0.4056 6.6667 10 0.3257
0.2791 7.3333 11 0.2866
0.2198 8.0 12 0.2453
0.1746 8.6667 13 0.2167
0.1582 9.3333 14 0.2104
0.1515 10.0 15 0.1699
0.1168 10.6667 16 0.1502
0.087 11.3333 17 0.1415
0.1 12.0 18 0.1574
0.0832 12.6667 19 0.1699
0.0765 13.3333 20 0.1601
0.0697 14.0 21 0.1544
0.0625 14.6667 22 0.1653
0.0583 15.3333 23 0.1628
0.047 16.0 24 0.1463
0.0366 16.6667 25 0.1637
0.0342 17.3333 26 0.2020
0.0398 18.0 27 0.1801
0.0319 18.6667 28 0.1835
0.0229 19.3333 29 0.1957
0.0286 20.0 30 0.2024
0.0166 20.6667 31 0.2519
0.0184 21.3333 32 0.2699
0.0129 22.0 33 0.2813
0.0109 22.6667 34 0.2950
0.0105 23.3333 35 0.3037
0.0111 24.0 36 0.3161
0.0071 24.6667 37 0.3310
0.0115 25.3333 38 0.3375
0.0051 26.0 39 0.3456
0.004 26.6667 40 0.3488
0.0077 27.3333 41 0.3599
0.0028 28.0 42 0.3706
0.0021 28.6667 43 0.3737
0.002 29.3333 44 0.3729
0.0017 30.0 45 0.3742
0.0013 30.6667 46 0.3757
0.0004 31.3333 47 0.3755
0.0006 32.0 48 0.3764
0.0002 32.6667 49 0.3750
0.0011 33.3333 50 0.3646
0.0005 34.0 51 0.3586
0.0013 34.6667 52 0.3617
0.0005 35.3333 53 0.3638
0.0011 36.0 54 0.3657
0.0003 36.6667 55 0.3710
0.0002 37.3333 56 0.3711
0.0004 38.0 57 0.3736
0.0003 38.6667 58 0.3784
0.0001 39.3333 59 0.3795
0.0007 40.0 60 0.3737
0.0001 40.6667 61 0.3730
0.0003 41.3333 62 0.3729
0.0002 42.0 63 0.3714
0.0001 42.6667 64 0.3698
0.0001 43.3333 65 0.3704
0.0001 44.0 66 0.3704
0.0001 44.6667 67 0.3705
0.0001 45.3333 68 0.3655
0.0002 46.0 69 0.3672
0.0002 46.6667 70 0.3682
0.0002 47.3333 71 0.3656
0.0001 48.0 72 0.3663
0.0001 48.6667 73 0.3668
0.0001 49.3333 74 0.3673
0.0001 50.0 75 0.3638
0.0001 50.6667 76 0.3640
0.0001 51.3333 77 0.3643
0.0001 52.0 78 0.3640
0.0001 52.6667 79 0.3648
0.0001 53.3333 80 0.3629
0.0001 54.0 81 0.3648
0.0001 54.6667 82 0.3617
0.0001 55.3333 83 0.3632
0.0001 56.0 84 0.3650
0.0001 56.6667 85 0.3636
0.0001 57.3333 86 0.3633
0.0001 58.0 87 0.3673
0.0001 58.6667 88 0.3663
0.0001 59.3333 89 0.3618
0.0001 60.0 90 0.3635
0.0001 60.6667 91 0.3605
0.0001 61.3333 92 0.3654
0.0001 62.0 93 0.3647
0.0001 62.6667 94 0.3586
0.0001 63.3333 95 0.3601
0.0001 64.0 96 0.3631
0.0001 64.6667 97 0.3629
0.0001 65.3333 98 0.3652
0.0001 66.0 99 0.3645
0.0001 66.6667 100 0.3648

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1