Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: llama2
|
3 |
+
---
|
4 |
+
|
5 |
+
## experimental llama2-7b-4bit-awq quantized model for llama2.c
|
6 |
+
|
7 |
+
Source model: llama2-7b-chat + [AWQ](https://github.com/mit-han-lab/llm-awq) scales: https://huggingface.co/datasets/mit-han-lab/awq-model-zoo/blob/main/llama-2-7b-chat-w4-g128.pt
|
8 |
+
|
9 |
+
Export script: https://github.com/atamurad/llama2.c/blob/int4-avx2/export_awq.py
|
10 |
+
|
11 |
+
Known issue: works only for ~20 tokens, model will be fixed/updated soon.
|
12 |
+
|
13 |
+
Inference code: https://github.com/atamurad/llama2.c/tree/int4-avx2
|
14 |
+
|
15 |
+
## Sample usage/prompt format:
|
16 |
+
```
|
17 |
+
./run llama2-7b-4bit-awq/llama2-7b-chat.awq -i "[INST]say hi[/INST]"
|
18 |
+
[INST]say hi[/INST] Hello! It's nice to meet you! How are you today?
|
19 |
+
```
|