--- tags: - code - coding - python - llama-2 - gptq model-index: - name: Llama-2-7b-4bit-python-coder results: [] license: llama2 language: - code datasets: - iamtarun/python_code_instructions_18k_alpaca pipeline_tag: text-generation --- # LlaMa 2 7b 4-bit Python Coder (GPTQ)👩‍💻 **LlaMa-2 7b** fine-tuned on the **python_code_instructions_18k_alpaca Code instructions dataset** by using the method **QLoRA** in 4-bit with [PEFT](https://github.com/huggingface/peft) library. ## Pretrained description [Llama-2](https://huggingface.co/meta-llama/Llama-2-7b) Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Model Architecture Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety ## Training data [python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca) The dataset contains problem descriptions and code in python language. This dataset is taken from sahil2801/code_instructions_120k, which adds a prompt column in alpaca style. ### Framework versions - PEFT 0.4.0 ### Example of usage ```py from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load quantized model and tokenizer. tokenizer = AutoTokenizer.from_pretrained("NurtureAI/llama-2-7b-int4-gptq-python") model = AutoModelForCausalLM.from_pretrained( "NurtureAI/llama-2-7b-int4-gptq-python", load_in_4bit=True, torch_dtype=torch.float16, device_map=device_map, ) # prepare prompt. instruction="Write a Python function to display the first and last elements of a list." prompt = f"""### Instruction: Use the Task below and the Input given to write the Response, which is a programming code that can solve the Task. ### Task: {instruction} ### Input: ### Response: """ # generate response. input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda() # with torch.inference_mode(): outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.5) print(f"Prompt:\n{prompt}\n") print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}") ``` ### Citation ``` @misc {NurtureAI, author = { Raymond Hernandez }, title = { NurtureAI/llama-2-7b-int4-gptq-python }, year = { 2023 }, url = { https://huggingface.co/NurtureAI/llama-2-7b-int4-gptq-python }, publisher = { Hugging Face } } ```