Llama-2-7B-Chat-text2code / README.md

SwastikM

Update README.md

edb13d1 verified 6 months ago

preview code

raw

history blame

No virus

4.22 kB

	---
	library_name: peft
	base_model: TheBloke/Llama-2-7b-Chat-GPTQ
	pipeline_tag: text-generation
	inference: false
	license: openrail
	language:
	- en
	datasets:
	- flytech/python-codes-25k
	co2_eq_emissions:
	emissions: 1190
	source: >-
	Quantifying the Carbon Emissions of Machine Learning
	https://mlco2.github.io/impact#compute
	training_type: finetuning
	hardware_used: 1 P100 16GB GPU
	tags:
	- text2code
	- LoRA
	- GPTQ
	- Llama-2-7B-Chat
	- text2python
	- instruction2code
	---

	# Llama-2-7b-Chat-GPTQ fine-tuned on PYTHON-CODES-25K

	Generate Python code that accomplishes the task instructed.


	## LoRA Adpater Head

	### Description

	Parameter Efficient Finetuning(PEFT) a 4bit quantized Llama-2-7b-Chat from TheBloke/Llama-2-7b-Chat-GPTQ on flytech/python-codes-25k dataset.

	- Language(s) (NLP): English
	- License: openrail
	- Qunatization: GPTQ 4bit
	- PEFT: LoRA
	- Finetuned from model [TheBloke/Llama-2-7b-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ)
	- Dataset: [flytech/python-codes-25k](https://huggingface.co/datasets/flytech/python-codes-25k)

	## Intended uses & limitations

	Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.

	### How to use

	```
	The quantized model is finetuned as PEFT. We have the trained Adapter.
	Merging LoRA adapater with GPTQ quantized model is not yet supported.
	So instead of loading a single finetuned model, we need to load the base
	model and merge the finetuned adapter on top.
	```

	```python
	instruction = """"Help me set up my daily to-do list!""""
	```
	```python
	from peft import PeftModel, PeftConfig
	from transformers import AutoModelForCausalLM

	config = PeftConfig.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code") #PEFT Config
	model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ") #Loading the Base Model
	model = PeftModel.from_pretrained(model, "SwastikM/Llama-2-7B-Chat-text2code") #Combining Trained Adapter with Base Model
	tokenizer = AutoTokenizer.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")

	inputs = tokenizer(instruction, return_tensors="pt").input_ids.to('cuda')
	outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
	code = tokenizer.decode(outputs[0], skip_special_tokens=True)

	print(code)
	```

	### Size Comparison

	The table shows comparison VRAM requirements for loading and training
	of FP16 Base Model and 4bit GPTQ quantized model with PEFT.
	The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
	from HuggingFace




	\| Model \| Total Size \| Training Using Adam \|
	\| ------------------------\|-------------\| --------------------\|
	\| Base Model \| 12.37 GB \| 49.48 GP \|
	\| 4bitQuantized+PEFT \| 3.90 GB \| 11 GB \|


	## Training Details

	### Training Data

	**Dataset:**[gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)

	Trained on `instruction` column of 20,000 randomly shuffled data.

	### Training Procedure

	HuggingFace Accelerate with Training Loop.


	#### Training Hyperparameters

	- Optimizer: AdamW
	- lr: 2e-5
	- decay: linear
	- batch_size: 4
	- gradient_accumulation_steps: 8
	- global_step: 625

	LoraConfig
	- *r:* 8
	- *lora_alpha:* 32
	- *target_modules:* ["k_proj","o_proj","q_proj","v_proj"]
	- *lora_dropout:* 0.05


	#### Hardware

	- GPU: P100


	## Additional Information

	- *Github:* [Repository]()
	- *Intro to quantization:* [Blog](https://huggingface.co/blog/merve/quantization)
	- *Emergent Feature:* [Academic](https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features)
	- *GPTQ Paper:* [GPTQ](https://arxiv.org/pdf/2210.17323)
	- *BITSANDBYTES and further* [LLM.int8()](https://arxiv.org/pdf/2208.07339)

	## Acknowledgment

	Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
	Thanks to [@HuggungFace Team](https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb?usp=sharing#scrollTo=vT0XjNc2jYKy) for the notebook on gptq.


	## Model Card Authors

	Swastik Maiti