metadata
library_name: peft
base_model: TheBloke/Llama-2-7b-Chat-GPTQ
pipeline_tag: text-generation
inference: false
license: openrail
language:
- en
datasets:
- flytech/python-codes-25k
tags:
- text2code
- LoRA
- GPTQ
- Llama-2-7B-Chat
- text2python
- instruction2code
Llama-2-7b-Chat-GPTQ fine-tuned on PYTHON-CODES-25K
Generate Python code that accomplishes the task instructed.
LoRA Adpater Head
Description
Parameter Efficient Finetuning(PEFT) a 4bit quantized Llama-2-7b-Chat from TheBloke/Llama-2-7b-Chat-GPTQ on flytech/python-codes-25k dataset.
- Language(s) (NLP): English
- License: openrail
- Qunatization: GPTQ 4bit
- PEFT: LoRA
- Finetuned from model TheBloke/Llama-2-7b-Chat-GPTQ
- Dataset: flytech/python-codes-25k
Intended uses & limitations
Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.
How to use
The quantized model is finetuned as PEFT. We have the trained Adapter.
Merging LoRA adapater with GPTQ quantized model is not yet supported.
So instead of loading a single finetuned model, we need to load the base
model and merge the finetuned adapter on top.
instruction = """model_input = "Help me set up my daily to-do list!""""
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM
config = PeftConfig.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ")
model = PeftModel.from_pretrained(model, "SwastikM/Llama-2-7B-Chat-text2code")
tokenizer = AutoTokenizer.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")
inputs = tokenizer(instruction, return_tensors="pt").input_ids.to('cuda')
outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(code)
Size Comparison
The table shows comparison VRAM requirements for loading and training
of FP16 Base Model and 4bit GPTQ quantized model with PEFT.
The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
from HuggingFace
Model | Total Size | Training Using Adam |
---|---|---|
Base Model | 12.37 GB | 49.48 GP |
4bitQuantized+PEFT | 3.90 GB | 11 GB |
Training Details
Training Data
gretelai/synthetic_text_to_sql
Train Set Size: 20000 shuffled randomly
Training Procedure
HuggingFace Accelerate with Training Loop.
Training Hyperparameters
- Optimizer: AdamW
- lr: 2e-5
- decay: linear
- batch_size: 4
- gradient_accumulation_steps: 8
- global_step: 625
LoraConfig
- r: 8
- lora_alpha: 32
- target_modules: ["k_proj","o_proj","q_proj","v_proj"]
- lora_dropout: 0.05
Hardware
- GPU: P100
Additional Information
- Github: Repository
- Intro to quantization: Blog
- Emergent Feature: Academic
- GPTQ Paper: GPTQ
- BITSANDBYTES and further LLM.int8()
Acknowledgment
Thanks to @AMerve Noyan for precise intro. Thanks to @HuggungFace Team for the notebook on gptq.
Model Card Authors
Swastik Maiti