File size: 4,456 Bytes
0800fac 60aabf1 0800fac 60aabf1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
---
license: mit
datasets:
- monsoon-nlp/asknyc-chatassistant-format
language:
- en
tags:
- reddit
- asknyc
- nyc
- llama2
---
# nyc-savvy-llama2-7b
Essentials:
- Based on LLaMa2-7b-hf (version 2, 7B params)
- Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
- Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
- Merged LLaMa2 and the adapter weights for this full-sized model
## Prompt options
Here is the template used in training. Note it starts with "### Human: " (following space), the post title and content, then "### Assistant: " (no preceding space, yes following space).
`### Human: Post title - post content### Assistant: `
For example:
`### Human: Where can I find a good bagel? - We are in Brooklyn### Assistant: Anywhere with fresh-baked bagels and lots of cream cheese options.`
From [QLoRA's Gradio example](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing), it looks helpful to add a more assistant-like prompt, especially if you follow their lead for a chat format:
```
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
```
## Training data
- Collected one month of posts to /r/AskNYC from each year 2015-2019 (no content after July 2019)
- Downloaded from PushShift, accepted comments only if upvote scores >= 3
- Originally collected for my GPT-NYC model in spring 2021: https://mapmeld.medium.com/gpt-nyc-part-1-9cb698b2e3d
## Training script
Takes about 2 hours on CoLab once you get it right. You can only set max_steps for QLoRA, but I wanted to stop at 1 epoch.
```
git clone https://github.com/artidoro/qlora
cd qlora
pip3 install -r requirements.txt --quiet
python3 qlora.py \
--model_name_or_path ../llama-2-7b-hf \
--use_auth \
--output_dir ../nyc-savvy-llama2-7b \
--logging_steps 10 \
--save_strategy steps \
--data_seed 42 \
--save_steps 500 \
--save_total_limit 40 \
--dataloader_num_workers 1 \
--group_by_length False \
--logging_strategy steps \
--remove_unused_columns False \
--do_train \
--num_train_epochs 1 \
--lora_r 64 \
--lora_alpha 16 \
--lora_modules all \
--double_quant \
--quant_type nf4 \
--bf16 \
--bits 4 \
--warmup_ratio 0.03 \
--lr_scheduler_type constant \
--gradient_checkpointing \
--dataset /content/gpt_nyc.jsonl \
--dataset_format oasst1 \
--source_max_len 16 \
--target_max_len 512 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 16 \
--max_steps 760 \
--learning_rate 0.0002 \
--adam_beta2 0.999 \
--max_grad_norm 0.3 \
--lora_dropout 0.1 \
--weight_decay 0.0 \
--seed 0 \
```
## Merging it back
What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.
The `peftmerger.py` script applies the adapter and saves the model like this:
```python
m = AutoModelForCausalLM.from_pretrained(
model_name,
#load_in_4bit=True,
torch_dtype=torch.bfloat16,
#device_map={"": 0},
)
m = PeftModel.from_pretrained(m, adapters_name)
m = m.merge_and_unload()
m.save_pretrained("nyc-savvy")
```
## Testing that the model is NYC-savvy
You might wonder if the model successfully learned anything about NYC or is the same old LLaMa2. With your prompt not adding clues, try this from the `pefttester.py` script in this repo:
```python
messages = "A chat between a curious human and an assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
messages += "### Human: What museums should I visit? - My kids are aged 12 and 5"
messages += "### Assistant: "
input_ids = tok(messages, return_tensors="pt").input_ids
# ...
temperature = 0.7
top_p = 0.9
top_k = 0
repetition_penalty = 1.1
op = m.generate(
input_ids=input_ids,
max_new_tokens=100,
temperature=temperature,
do_sample=temperature > 0.0,
top_p=top_p,
top_k=top_k,
repetition_penalty=repetition_penalty,
stopping_criteria=StoppingCriteriaList([stop]),
)
for line in op:
print(tok.decode(line))
```
|