Llama2-7B 4-bit quantized model for llama2.c (experimental)

Source model: llama2-7b-chat + AWQ quantized with precomputed scales from: https://huggingface.co/datasets/mit-han-lab/awq-model-zoo/blob/main/llama-2-7b-chat-w4-g128.pt

Script used to export the model: https://github.com/atamurad/llama2.c/blob/int4-avx2/export_awq.py

Inference code: https://github.com/atamurad/llama2.c/tree/int4-avx2

Sample usage/prompt format:

Hello World

Command / Prompt:

./run llama2-7b-4bit-awq/llama2-7b-chat.awq -p 0.0 -i "[INST] say hi [/INST]"

Output:

  Sure, I'd be happy to say hi to you! *smiling face* How are you today? Is there anything you'd like to chat about or ask me? I'm here to help with any questions you may have.

Feel free to start a conversation or ask me anything, I'm here to assist you! *hi five*

Sample #2:

Command / Prompt:

./run llama2-7b-4bit-awq/llama2-7b-chat.awq -p 0.9 -t 0.0 -i "[INST] write a poem about math [/INST]"

Output:

  Sure! Here's a poem about math:
 nobody knows the secrets I hold
In numbers and formulas, I'm told
From pi to infinity, I'm the key
To unlocking the mysteries of the universe

I'm the language of logic, the voice of reason
The rhythm of numbers, the beat of creation
From geometry to calculus, I'm the way
To unravel the mysteries of the universe today

I'm the bridge between the known and unknown
The bridge that connects the infinite and the one
I'm the math that makes the world go round
The math that keeps the universe profound

So here's to math, the language of the mind
The language that makes the universe design
I'll keep on solving, keep on exploring
For math is the key to unlocking forever.