atamurad commited on
Commit
282a686
1 Parent(s): a80b535

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama2
3
+ ---
4
+
5
+ ## experimental llama2-7b-4bit-awq quantized model for llama2.c
6
+
7
+ Source model: llama2-7b-chat + [AWQ](https://github.com/mit-han-lab/llm-awq) scales: https://huggingface.co/datasets/mit-han-lab/awq-model-zoo/blob/main/llama-2-7b-chat-w4-g128.pt
8
+
9
+ Export script: https://github.com/atamurad/llama2.c/blob/int4-avx2/export_awq.py
10
+
11
+ Known issue: works only for ~20 tokens, model will be fixed/updated soon.
12
+
13
+ Inference code: https://github.com/atamurad/llama2.c/tree/int4-avx2
14
+
15
+ ## Sample usage/prompt format:
16
+ ```
17
+ ./run llama2-7b-4bit-awq/llama2-7b-chat.awq -i "[INST]say hi[/INST]"
18
+ [INST]say hi[/INST] Hello! It's nice to meet you! How are you today?
19
+ ```