nexa-enterprise-a/nexaQuant

Introduction

This repository provides the 1B NexaQuant model weights and code to run and evaluate the models.

Run the Models

Firstly clone the repo:

git clone https://huggingface.co/nexa-enterprise-a/nexaQuant

You can run it through nexa SDK or llama.cpp. You can follow our installation instruction at Nexa SDK to install our toolkit then run our model:

nexa run --local_path Llama3.2-1B-llamacpp_q4_0_gs128.gguf

Or, you can run it through llama.cpp. You can follow their installation instruction at llama.cpp:

git clone --recursive https://github.com/ggerganov/llama.cpp
mkdir build && cd build
cmake ..
make -j32

Then run the model with llama.cpp

./llama.cpp/build/bin/llama-cli -m Llama3.2-1B-llamacpp_q4_0_gs128.gguf

Evaluation

We use the widely adopted public toolkit lm-evaluation-harness to benchmark the performance of our models.An example of the evaluation command on ifeval is

pip install lmms-eval
lm_eval --model hf \
    --model_args pretrained=Llama3.2-3B-llamacpp_q4_0_gs128,add_bos_token=True \
    --tasks ifeval \
    --device cuda:0 \
    --batch_size 136

For the 1B model, it shows the model quality is recovered from only 86.45% to 100.34% after Nexa AI's 4bit quantization. For more information, please check NexaQuant blog

BF16	Q4_0 (gs=128)	Nexa Q4_0 (gs=128)	Benchmark	Degraded after Q4_0 (gs=128)	Restoration, %	Improvement, %
51.34	41.59	50.59	IFEVAL	81.01%	98.54%	21.64%
46.05	40.00	44.68	MMLU (5-shot)	86.86%	97.02%	11.70%
45.64	43.29	46.27	Hellaswag	94.85%	101.38%	6.88%
34.73	31.06	35.84	arc_challenge	89.43%	103.20%	15.39%
68.98	63.59	69.61	arc_easy	92.19%	100.91%	9.47%
74.65	71.00	73.78	piqa	95.11%	98.83%	3.92%
26.60	25.40	28.40	openbook qa	95.49%	106.77%	11.81%
32.57	18.46	31.46	gsm8k	56.68%	96.59%	70.42%
			total	86.45%	100.34%	18.90%

nexa-enterprise-a
/

nexaQuant

Introduction

Run the Models

Evaluation

Model tree for nexa-enterprise-a/nexaQuant