SwastikM's picture
Update README.md
5501946 verified
|
raw
history blame
3.91 kB
metadata
library_name: peft
base_model: TheBloke/Llama-2-7b-Chat-GPTQ
pipeline_tag: text-generation
inference: false
license: openrail
language:
  - en
datasets:
  - flytech/python-codes-25k
tags:
  - text2code
  - LoRA
  - GPTQ
  - Llama-2-7B-Chat
  - text2python
  - instruction2code

Llama-2-7b-Chat-GPTQ fine-tuned on PYTHON-CODES-25K

Generate Python code that accomplishes the task instructed.

LoRA Adpater Head

Description

Parameter Efficient Finetuning(PEFT) a 4bit quantized Llama-2-7b-Chat from TheBloke/Llama-2-7b-Chat-GPTQ on flytech/python-codes-25k dataset.

Intended uses & limitations

Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.

How to use

The quantized model is finetuned as PEFT. We have the trained Adapter.
Merging LoRA adapater with GPTQ quantized model is not yet supported.
So instead of loading a single finetuned model, we need to load the base
model and merge the finetuned adapter on top.
instruction = """model_input = "Help me set up my daily to-do list!""""
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM

config = PeftConfig.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ")
model = PeftModel.from_pretrained(model, "SwastikM/Llama-2-7B-Chat-text2code")
tokenizer = AutoTokenizer.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")

inputs = tokenizer(instruction, return_tensors="pt").input_ids.to('cuda')
outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(code)

Size Comparison

The table shows comparison VRAM requirements for loading and training
of FP16 Base Model and 4bit GPTQ quantized model with PEFT.
The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
from HuggingFace
Model Total Size Training Using Adam
Base Model 12.37 GB 49.48 GP
4bitQuantized+PEFT 3.90 GB 11 GB

Training Details

Training Data

gretelai/synthetic_text_to_sql

Train Set Size: 20000 shuffled randomly

Training Procedure

HuggingFace Accelerate with Training Loop.

Training Hyperparameters

  • Optimizer: AdamW
  • lr: 2e-5
  • decay: linear
  • batch_size: 4
  • gradient_accumulation_steps: 8
  • global_step: 625

LoraConfig

  • r: 8
  • lora_alpha: 32
  • target_modules: ["k_proj","o_proj","q_proj","v_proj"]
  • lora_dropout: 0.05

Hardware

  • GPU: P100

Additional Information

Acknowledgment

Thanks to @AMerve Noyan for precise intro. Thanks to @HuggungFace Team for the notebook on gptq.

Model Card Authors

Swastik Maiti