license: other
language:
- en
pipeline_tag: text2text-generation
tags:
- alpaca
- llama
- chat
- gpt4
inference: false
GPT4 Alpaca LoRA 30B - 4bit GGML
This is a 4-bit GGML version of the Chansung GPT4 Alpaca 30B LoRA model.
It was created by merging the LoRA provided in the above repo with the original Llama 30B model, producing unquantised model GPT4-Alpaca-LoRA-30B-HF
The files in this repo were then quantized to 4bit for use with llama.cpp using the new 4bit quantisation methods being worked on in PR #896.
Provided files
Two files are provided. One is quantised using method Q4_0, the other in Q4_1.
The Q4_1 file requires more RAM and may run a little slower. It may give slightly better results, but this is not proven.
How to run in llama.cpp
I use the following command line; adjust for your tastes and needs:
./main -t 18 -m gpt4-alpaca-lora-30B.GGML.q4_1.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Write a story about llamas
### Response:"
Change -t 18
to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use -t 8
.
If you want to have a chat-style conversation, replace the -p <PROMPT>
argument with -i -ins
Original GPT4 Alpaca Lora model card
This repository comes with LoRA checkpoint to make LLaMA into a chatbot like language model. The checkpoint is the output of instruction following fine-tuning process with the following settings on 8xA100(40G) DGX system.
- Training script: borrowed from the official Alpaca-LoRA implementation
- Training script:
python finetune.py \
--base_model='decapoda-research/llama-30b-hf' \
--data_path='alpaca_data_gpt4.json' \
--num_epochs=10 \
--cutoff_len=512 \
--group_by_length \
--output_dir='./gpt4-alpaca-lora-30b' \
--lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \
--lora_r=16 \
--batch_size=... \
--micro_batch_size=...
You can find how the training went from W&B report here.