Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,10 @@ pipeline_tag: text-generation
|
|
8 |
This is an experimental <a href="https://github.com/mobiusml/hqq/">HQQ</a> 1-bit quantized (<b>binary weights</b>) <a href="https://huggingface.co/meta-llama/Llama-2-7b-chat-hf"> Llama2-7B-chat model </a> using a LoRA adapter to improve the performance (referred to as HQQ+).
|
9 |
|
10 |
Quantizing small models at extreme low-bits is a challenging task. The purpose of this model is to show the community what to expect when fine-tuning such models.
|
|
|
|
|
|
|
|
|
11 |
|
12 |
## Datasets
|
13 |
The adapter was trained via SFT on random subsets of the following:
|
|
|
8 |
This is an experimental <a href="https://github.com/mobiusml/hqq/">HQQ</a> 1-bit quantized (<b>binary weights</b>) <a href="https://huggingface.co/meta-llama/Llama-2-7b-chat-hf"> Llama2-7B-chat model </a> using a LoRA adapter to improve the performance (referred to as HQQ+).
|
9 |
|
10 |
Quantizing small models at extreme low-bits is a challenging task. The purpose of this model is to show the community what to expect when fine-tuning such models.
|
11 |
+
We notice that, 1-bit quantization doesn't work well when applied directly on small models such as the Llama2-7B. However, when fine-tuned, the model's ouput significantly improves. In fact, the 1-bit base model outperforms Quip# 2-bit after fine-tuning on ~2.9K samples.
|
12 |
+
|
13 |
+
Note that the weights here are unsigned 1-bit (0 or 1), <a href="https://arxiv.org/abs/2402.17764">not ternary like the recent 1.58-bit work </a>. This is a more challenging task since we lose the sign of the weights and only fine-tune a small fraction of the parameters (~94MB worth of weights).
|
14 |
+
HQQ's dequantization step can be rewriten as a 1-bit matmul which could potential require only additions + a very low-rank matmul which is fast to compute.
|
15 |
|
16 |
## Datasets
|
17 |
The adapter was trained via SFT on random subsets of the following:
|