File size: 5,115 Bytes
0800fac
 
60aabf1
 
 
 
 
 
 
 
 
4d22a51
 
 
 
 
0800fac
60aabf1
 
 
 
 
 
 
c57ec09
60aabf1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17e8d7f
60aabf1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c57ec09
 
 
60aabf1
 
 
 
 
 
53eff1c
 
 
60aabf1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
license: mit
datasets:
- monsoon-nlp/asknyc-chatassistant-format
language:
- en
tags:
- reddit
- asknyc
- nyc
- llama2
widget:
- text: "### Human: where can I find a good BEC?### Assistant: "
  example_title: "Basic prompt"
- text: "A chat between a curious human and an assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n### Human: What museums should we visit? - My kids are aged 12 and 5. They love fish### Assistant: "
  example_title: "Assistant prompt"
---

# nyc-savvy-llama2-7b

Essentials:
- Based on LLaMa2-7b-hf (version 2, 7B params)
- Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
- Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
- Merged [quantized-then-dequantized LLaMa2](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) and the adapter weights to produce this full-sized model

## Prompt options

Here is the template used in training. Note it starts with "### Human: " (following space), the post title and content, then "### Assistant: " (no preceding space, yes following space).

`### Human: Post title - post content### Assistant: `

For example:

`### Human: Where can I find a good bagel? - We are in Brooklyn### Assistant: Anywhere with fresh-baked bagels and lots of cream cheese options.`

From [QLoRA's Gradio example](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing), it looks helpful to add a more assistant-like prompt, especially if you follow their lead for a chat format:

```
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
```

## Training data

- Collected one month of posts to /r/AskNYC from each year 2015-2019 (no content after July 2019)
- Downloaded from PushShift, accepted comments only if upvote scores >= 3
- Originally collected for my GPT-NYC model in spring 2021 - [model](https://huggingface.co/monsoon-nlp/gpt-nyc) / [blog](https://mapmeld.medium.com/gpt-nyc-part-1-9cb698b2e3d)

## Training script

Takes about 2 hours on CoLab once you get it right. You can only set max_steps for QLoRA, but I wanted to stop at 1 epoch.

```
git clone https://github.com/artidoro/qlora
cd qlora

pip3 install -r requirements.txt --quiet

python3 qlora.py \
    --model_name_or_path ../llama-2-7b-hf \
    --use_auth \
    --output_dir ../nyc-savvy-llama2-7b \
    --logging_steps 10 \
    --save_strategy steps \
    --data_seed 42 \
    --save_steps 500 \
    --save_total_limit 40 \
    --dataloader_num_workers 1 \
    --group_by_length False \
    --logging_strategy steps \
    --remove_unused_columns False \
    --do_train \
    --num_train_epochs 1 \
    --lora_r 64 \
    --lora_alpha 16 \
    --lora_modules all \
    --double_quant \
    --quant_type nf4 \
    --bf16 \
    --bits 4 \
    --warmup_ratio 0.03 \
    --lr_scheduler_type constant \
    --gradient_checkpointing \
    --dataset /content/gpt_nyc.jsonl \
    --dataset_format oasst1 \
    --source_max_len 16 \
    --target_max_len 512 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --max_steps 760 \
    --learning_rate 0.0002 \
    --adam_beta2 0.999 \
    --max_grad_norm 0.3 \
    --lora_dropout 0.1 \
    --weight_decay 0.0 \
    --seed 0 \
```

## Merging it back

What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.

Two options for merging:
- The included `peftmerger.py` script merges the adapter and saves the model.
- Chris Hayduk produced a script to [quantize then de-quantize](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) the base model before merging a QLoRA adapter. This requires bitsandbytes and a GPU.

## Testing that the model is NYC-savvy

You might wonder if the model successfully learned anything about NYC or is the same old LLaMa2. With your prompt not adding clues, try this from the `pefttester.py` script in this repo:

```python
m = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
tok = LlamaTokenizer.from_pretrained(model_name)

messages = "A chat between a curious human and an assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
messages += "### Human: What museums should I visit? - My kids are aged 12 and 5"
messages += "### Assistant: "

input_ids = tok(messages, return_tensors="pt").input_ids

# ...

temperature = 0.7
top_p = 0.9
top_k = 0
repetition_penalty = 1.1

op = m.generate(
    input_ids=input_ids,
    max_new_tokens=100,
    temperature=temperature,
    do_sample=temperature > 0.0,
    top_p=top_p,
    top_k=top_k,
    repetition_penalty=repetition_penalty,
    stopping_criteria=StoppingCriteriaList([stop]),
)
for line in op:
    print(tok.decode(line))
```