Nick Doiron commited on
Commit
c57ec09
1 Parent(s): 53eff1c

quantize-then-dequantize

Browse files
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ .DS_Store
README.md CHANGED
@@ -22,7 +22,7 @@ Essentials:
22
  - Based on LLaMa2-7b-hf (version 2, 7B params)
23
  - Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
24
  - Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
25
- - Merged LLaMa2 and the adapter weights for this full-sized model
26
 
27
  ## Prompt options
28
 
@@ -100,19 +100,9 @@ python3 qlora.py \
100
 
101
  What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.
102
 
103
- The `peftmerger.py` script applies the adapter and saves the model like this:
104
-
105
- ```python
106
- m = AutoModelForCausalLM.from_pretrained(
107
- model_name,
108
- #load_in_4bit=True,
109
- torch_dtype=torch.bfloat16,
110
- #device_map={"": 0},
111
- )
112
- m = PeftModel.from_pretrained(m, adapters_name)
113
- m = m.merge_and_unload()
114
- m.save_pretrained("nyc-savvy")
115
- ```
116
 
117
  ## Testing that the model is NYC-savvy
118
 
 
22
  - Based on LLaMa2-7b-hf (version 2, 7B params)
23
  - Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
24
  - Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
25
+ - Merged [quantized-then-dequantized LLaMa2](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) and the adapter weights to produce this full-sized model
26
 
27
  ## Prompt options
28
 
 
100
 
101
  What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.
102
 
103
+ Two options for merging:
104
+ - The included `peftmerger.py` script merges the adapter and saves the model.
105
+ - Chris Hayduk produced a script to [quantize then de-quantize](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) the base model before merging a QLoRA adapter. This requires bitsandbytes and a GPU.
 
 
 
 
 
 
 
 
 
 
106
 
107
  ## Testing that the model is NYC-savvy
108
 
pytorch_model-00001-of-00002.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:54668da8619fcca38e9bf7b133ce1d445f127a6ac02c7b61ccb6c9803833f14a
3
- size 9976620122
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6875060db94711a55e3aefe325355c28b260fb3bd5795add8707cfe8fe8340b8
3
+ size 9976623130
pytorch_model-00002-of-00002.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0c64d556b8cf743c04926f35c9673ca38a7b4dee2fa3fc3f972fbe04b3a1ee1d
3
- size 3500310787
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35f9ab7de991127d8aee80f8f6fea00e73385f303121ac995c1afd51fd2551ba
3
+ size 3500311811