atamurad
/

llama2-7b-4bit-awq

Model card Files Files and versions Community

llama2-7b-4bit-awq / README.md

atamurad's picture

Create README.md

282a686 12 months ago

|

No virus

676 Bytes

metadata

license: llama2

experimental llama2-7b-4bit-awq quantized model for llama2.c

Source model: llama2-7b-chat + AWQ scales: https://huggingface.co/datasets/mit-han-lab/awq-model-zoo/blob/main/llama-2-7b-chat-w4-g128.pt

Export script: https://github.com/atamurad/llama2.c/blob/int4-avx2/export_awq.py

Known issue: works only for ~20 tokens, model will be fixed/updated soon.

Inference code: https://github.com/atamurad/llama2.c/tree/int4-avx2

Sample usage/prompt format:

./run llama2-7b-4bit-awq/llama2-7b-chat.awq  -i "[INST]say hi[/INST]" 
[INST]say hi[/INST]  Hello! It's nice to meet you! How are you today?