DrishtiSharma's picture
Update notes.md
334cf54 verified
Feature/Case 🤗 + Standard Attention (Baseline) 🤗 + Flash Attention 1 🤗 + Flash Attention 2 🤗 + Unsloth
Dataset databricks/databricks-dolly-15k databricks/databricks-dolly-15k databricks/databricks-dolly-15k databricks/databricks-dolly-15k
Model NousResearch/Llama-2-7b-hf NousResearch/Llama-2-7b-hf NousResearch/Llama-2-7b-hf unsloth/llama-2-7b
Training Techniques for Model Training Optimization QLoRA, Packing QLoRA, Flash Attention 1, Packing QLoRA, Flash Attention 2, Packing QLoRA, Unsloth, Packing
Dependencies for Unsloth and FA NA !pip install -U optimum !pip install -U flash-attn image
Model Loading model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, use_cache=True, device_map="auto") model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, use_cache=True, device_map="auto") model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, use_cache=True, device_map="auto", use_flash_attention_2=True) image
LoRA image image image image
Model Training Setup trainer.train() image trainer.train() trainer.train()
Trainable Params 67,108,864 67,108,864 67,108,864 67,108,864
Total Params 3,567,521,792 3,567,521,792 3,567,521,792 3,567,521,792
Trainable Percentage (%) 1.881 1.881 1.881 1.881