Introduction

This repository provides the 1B NexaQuant model weights and code to run and evaluate the models.

Run the Models

Firstly clone the repo:

git clone https://huggingface.co/nexa-enterprise-a/nexaQuant

You can run it through nexa SDK or llama.cpp. You can follow our installation instruction at Nexa SDK to install our toolkit then run our model:

nexa run --local_path Llama3.2-1B-llamacpp_q4_0_gs128.gguf 

Or, you can run it through llama.cpp. You can follow their installation instruction at llama.cpp:

git clone --recursive https://github.com/ggerganov/llama.cpp
mkdir build && cd build
cmake ..
make -j32

Then run the model with llama.cpp

./llama.cpp/build/bin/llama-cli -m Llama3.2-1B-llamacpp_q4_0_gs128.gguf

Evaluation

We use the widely adopted public toolkit lm-evaluation-harness to benchmark the performance of our models.An example of the evaluation command on ifeval is

pip install lmms-eval
lm_eval --model hf \
    --model_args pretrained=Llama3.2-3B-llamacpp_q4_0_gs128,add_bos_token=True \
    --tasks ifeval \
    --device cuda:0 \
    --batch_size 136

For the 1B model, it shows the model quality is recovered from only 86.45% to 100.34% after Nexa AI's 4bit quantization. For more information, please check NexaQuant blog

BF16 Q4_0 (gs=128) Nexa Q4_0 (gs=128) Benchmark Degraded after Q4_0 (gs=128) Restoration, % Improvement, %
51.34 41.59 50.59 IFEVAL 81.01% 98.54% 21.64%
46.05 40.00 44.68 MMLU (5-shot) 86.86% 97.02% 11.70%
45.64 43.29 46.27 Hellaswag 94.85% 101.38% 6.88%
34.73 31.06 35.84 arc_challenge 89.43% 103.20% 15.39%
68.98 63.59 69.61 arc_easy 92.19% 100.91% 9.47%
74.65 71.00 73.78 piqa 95.11% 98.83% 3.92%
26.60 25.40 28.40 openbook qa 95.49% 106.77% 11.81%
32.57 18.46 31.46 gsm8k 56.68% 96.59% 70.42%
total 86.45% 100.34% 18.90%
Downloads last month
62
GGUF
Model size
1.5B params
Architecture
llama

4-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for nexa-enterprise-a/nexaQuant

Quantized
(197)
this model