Dataset |
databricks/databricks-dolly-15k |
databricks/databricks-dolly-15k |
databricks/databricks-dolly-15k |
databricks/databricks-dolly-15k |
Model |
NousResearch/Llama-2-7b-hf |
NousResearch/Llama-2-7b-hf |
NousResearch/Llama-2-7b-hf |
unsloth/llama-2-7b |
Training Techniques for Model Training Optimization |
QLoRA, Packing |
QLoRA, Flash Attention 1, Packing |
QLoRA, Flash Attention 2, Packing |
QLoRA, Unsloth, Packing |
Dependencies for Unsloth and FA |
NA |
!pip install -U optimum |
!pip install -U flash-attn |
|
Model Loading |
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, use_cache=True, device_map="auto") |
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, use_cache=True, device_map="auto") |
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, use_cache=True, device_map="auto", use_flash_attention_2=True) |
|
LoRA |
|
|
|
|
Model Training Setup |
trainer.train() |
|
trainer.train() |
trainer.train() |
Trainable Params |
67,108,864 |
67,108,864 |
67,108,864 |
67,108,864 |
Total Params |
3,567,521,792 |
3,567,521,792 |
3,567,521,792 |
3,567,521,792 |
Trainable Percentage (%) |
1.881 |
1.881 |
1.881 |
1.881 |