Introduction
This repository provides the 1B NexaQuant model weights and code to run and evaluate the models.
Run the Models
Firstly clone the repo:
git clone https://huggingface.co/nexa-enterprise-a/nexaQuant
You can run it through nexa SDK or llama.cpp. You can follow our installation instruction at Nexa SDK to install our toolkit then run our model:
nexa run --local_path Llama3.2-1B-llamacpp_q4_0_gs128.gguf
Or, you can run it through llama.cpp. You can follow their installation instruction at llama.cpp:
git clone --recursive https://github.com/ggerganov/llama.cpp
mkdir build && cd build
cmake ..
make -j32
Then run the model with llama.cpp
./llama.cpp/build/bin/llama-cli -m Llama3.2-1B-llamacpp_q4_0_gs128.gguf
Evaluation
We use the widely adopted public toolkit lm-evaluation-harness to benchmark the performance of our models.An example of the evaluation command on ifeval
is
pip install lmms-eval
lm_eval --model hf \
--model_args pretrained=Llama3.2-3B-llamacpp_q4_0_gs128,add_bos_token=True \
--tasks ifeval \
--device cuda:0 \
--batch_size 136
For the 1B model, it shows the model quality is recovered from only 86.45% to 100.34% after Nexa AI's 4bit quantization. For more information, please check NexaQuant blog
BF16 | Q4_0 (gs=128) | Nexa Q4_0 (gs=128) | Benchmark | Degraded after Q4_0 (gs=128) | Restoration, % | Improvement, % |
---|---|---|---|---|---|---|
51.34 | 41.59 | 50.59 | IFEVAL | 81.01% | 98.54% | 21.64% |
46.05 | 40.00 | 44.68 | MMLU (5-shot) | 86.86% | 97.02% | 11.70% |
45.64 | 43.29 | 46.27 | Hellaswag | 94.85% | 101.38% | 6.88% |
34.73 | 31.06 | 35.84 | arc_challenge | 89.43% | 103.20% | 15.39% |
68.98 | 63.59 | 69.61 | arc_easy | 92.19% | 100.91% | 9.47% |
74.65 | 71.00 | 73.78 | piqa | 95.11% | 98.83% | 3.92% |
26.60 | 25.40 | 28.40 | openbook qa | 95.49% | 106.77% | 11.81% |
32.57 | 18.46 | 31.46 | gsm8k | 56.68% | 96.59% | 70.42% |
total | 86.45% | 100.34% | 18.90% |
- Downloads last month
- 62
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for nexa-enterprise-a/nexaQuant
Base model
meta-llama/Llama-3.2-1B-Instruct