Nick Doiron commited on
Commit
60aabf1
1 Parent(s): 0800fac
Files changed (1) hide show
  1. README.md +139 -0
README.md CHANGED
@@ -1,3 +1,142 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - monsoon-nlp/asknyc-chatassistant-format
5
+ language:
6
+ - en
7
+ tags:
8
+ - reddit
9
+ - asknyc
10
+ - nyc
11
+ - llama2
12
  ---
13
+
14
+ # nyc-savvy-llama2-7b
15
+
16
+ Essentials:
17
+ - Based on LLaMa2-7b-hf (version 2, 7B params)
18
+ - Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
19
+ - Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
20
+ - Merged LLaMa2 and the adapter weights for this full-sized model
21
+
22
+ ## Prompt options
23
+
24
+ Here is the template used in training. Note it starts with "### Human: " (following space), the post title and content, then "### Assistant: " (no preceding space, yes following space).
25
+
26
+ `### Human: Post title - post content### Assistant: `
27
+
28
+ For example:
29
+
30
+ `### Human: Where can I find a good bagel? - We are in Brooklyn### Assistant: Anywhere with fresh-baked bagels and lots of cream cheese options.`
31
+
32
+ From [QLoRA's Gradio example](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing), it looks helpful to add a more assistant-like prompt, especially if you follow their lead for a chat format:
33
+
34
+ ```
35
+ A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
36
+ ```
37
+
38
+ ## Training data
39
+
40
+ - Collected one month of posts to /r/AskNYC from each year 2015-2019 (no content after July 2019)
41
+ - Downloaded from PushShift, accepted comments only if upvote scores >= 3
42
+ - Originally collected for my GPT-NYC model in spring 2021: https://mapmeld.medium.com/gpt-nyc-part-1-9cb698b2e3d
43
+
44
+ ## Training script
45
+
46
+ Takes about 2 hours on CoLab once you get it right. You can only set max_steps for QLoRA, but I wanted to stop at 1 epoch.
47
+
48
+ ```
49
+ git clone https://github.com/artidoro/qlora
50
+ cd qlora
51
+
52
+ pip3 install -r requirements.txt --quiet
53
+
54
+ python3 qlora.py \
55
+ --model_name_or_path ../llama-2-7b-hf \
56
+ --use_auth \
57
+ --output_dir ../nyc-savvy-llama2-7b \
58
+ --logging_steps 10 \
59
+ --save_strategy steps \
60
+ --data_seed 42 \
61
+ --save_steps 500 \
62
+ --save_total_limit 40 \
63
+ --dataloader_num_workers 1 \
64
+ --group_by_length False \
65
+ --logging_strategy steps \
66
+ --remove_unused_columns False \
67
+ --do_train \
68
+ --num_train_epochs 1 \
69
+ --lora_r 64 \
70
+ --lora_alpha 16 \
71
+ --lora_modules all \
72
+ --double_quant \
73
+ --quant_type nf4 \
74
+ --bf16 \
75
+ --bits 4 \
76
+ --warmup_ratio 0.03 \
77
+ --lr_scheduler_type constant \
78
+ --gradient_checkpointing \
79
+ --dataset /content/gpt_nyc.jsonl \
80
+ --dataset_format oasst1 \
81
+ --source_max_len 16 \
82
+ --target_max_len 512 \
83
+ --per_device_train_batch_size 1 \
84
+ --gradient_accumulation_steps 16 \
85
+ --max_steps 760 \
86
+ --learning_rate 0.0002 \
87
+ --adam_beta2 0.999 \
88
+ --max_grad_norm 0.3 \
89
+ --lora_dropout 0.1 \
90
+ --weight_decay 0.0 \
91
+ --seed 0 \
92
+ ```
93
+
94
+ ## Merging it back
95
+
96
+ What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.
97
+
98
+ The `peftmerger.py` script applies the adapter and saves the model like this:
99
+
100
+ ```python
101
+ m = AutoModelForCausalLM.from_pretrained(
102
+ model_name,
103
+ #load_in_4bit=True,
104
+ torch_dtype=torch.bfloat16,
105
+ #device_map={"": 0},
106
+ )
107
+ m = PeftModel.from_pretrained(m, adapters_name)
108
+ m = m.merge_and_unload()
109
+ m.save_pretrained("nyc-savvy")
110
+ ```
111
+
112
+ ## Testing that the model is NYC-savvy
113
+
114
+ You might wonder if the model successfully learned anything about NYC or is the same old LLaMa2. With your prompt not adding clues, try this from the `pefttester.py` script in this repo:
115
+
116
+ ```python
117
+ messages = "A chat between a curious human and an assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
118
+ messages += "### Human: What museums should I visit? - My kids are aged 12 and 5"
119
+ messages += "### Assistant: "
120
+
121
+ input_ids = tok(messages, return_tensors="pt").input_ids
122
+
123
+ # ...
124
+
125
+ temperature = 0.7
126
+ top_p = 0.9
127
+ top_k = 0
128
+ repetition_penalty = 1.1
129
+
130
+ op = m.generate(
131
+ input_ids=input_ids,
132
+ max_new_tokens=100,
133
+ temperature=temperature,
134
+ do_sample=temperature > 0.0,
135
+ top_p=top_p,
136
+ top_k=top_k,
137
+ repetition_penalty=repetition_penalty,
138
+ stopping_criteria=StoppingCriteriaList([stop]),
139
+ )
140
+ for line in op:
141
+ print(tok.decode(line))
142
+ ```