--- license: llama2 --- ## experimental llama2-7b-4bit-awq quantized model for llama2.c Source model: llama2-7b-chat + [AWQ](https://github.com/mit-han-lab/llm-awq) scales: https://huggingface.co/datasets/mit-han-lab/awq-model-zoo/blob/main/llama-2-7b-chat-w4-g128.pt Export script: https://github.com/atamurad/llama2.c/blob/int4-avx2/export_awq.py Known issue: works only for ~20 tokens, model will be fixed/updated soon. Inference code: https://github.com/atamurad/llama2.c/tree/int4-avx2 ## Sample usage/prompt format: ``` ./run llama2-7b-4bit-awq/llama2-7b-chat.awq -i "[INST]say hi[/INST]" [INST]say hi[/INST] Hello! It's nice to meet you! How are you today? ```