atamurad
/

llama2-7b-4bit-awq

Model card Files Files and versions Community

atamurad commited on Aug 19, 2023

Commit

282a686

•

1 Parent(s): a80b535

Create README.md

Files changed (1) hide show

README.md +19 -0

README.md ADDED Viewed

	@@ -0,0 +1,19 @@

+---
+license: llama2
+---
+## experimental llama2-7b-4bit-awq quantized model for llama2.c
+Source model: llama2-7b-chat + [AWQ](https://github.com/mit-han-lab/llm-awq) scales: https://huggingface.co/datasets/mit-han-lab/awq-model-zoo/blob/main/llama-2-7b-chat-w4-g128.pt
+Export script: https://github.com/atamurad/llama2.c/blob/int4-avx2/export_awq.py
+Known issue: works only for ~20 tokens, model will be fixed/updated soon.
+Inference code: https://github.com/atamurad/llama2.c/tree/int4-avx2
+## Sample usage/prompt format:
+```
+./run llama2-7b-4bit-awq/llama2-7b-chat.awq  -i "[INST]say hi[/INST]"
+[INST]say hi[/INST]  Hello! It's nice to meet you! How are you today?
+```