mobicham commited on
Commit
510d225
1 Parent(s): ca41f62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -1,3 +1,29 @@
1
  ---
2
  license: llama2
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama2
3
+ train: false
4
+ inference: false
5
+ pipeline_tag: text-generation
6
  ---
7
+
8
+ ## Llama-2-13b-hf-4bit_g64-HQQ
9
+ This a version of the LLama2-13B model quantized to 4-bit via Half-Quadratic Quantization (HQQ): https://mobiusml.github.io/hqq/
10
+
11
+ To run the model, install the HQQ library from https://github.com/mobiusml/hqq/tree/main/code and use it as follows:
12
+ ``` Python
13
+ from hqq.models.llama import LlamaHQQ
14
+ import transformers
15
+
16
+ model_id = 'mobiuslabsgmbh/Llama-2-13b-hf-4bit_g64-HQQ'
17
+ #Load the tokenizer
18
+ tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
19
+ #Load the model
20
+ model = LlamaHQQ.from_quantized(model_id)
21
+ ```
22
+
23
+ *Limitations*: <br>
24
+ -Only supports single GPU runtime.<br>
25
+ -Not compatible with HuggingFace's PEFT.<br>
26
+
27
+
28
+
29
+