kousw's picture
Upload 21 files
29964ce verified
---
license: mit
---
# Quantized BitNet-B1-58-3B
This repository contains a quantized version of the [1bitLLM/bitnet_b1_58-3B](https://huggingface.co/1bitLLM/bitnet_b1_58-3B) model.
While the original repository showcases impressive validation results, it emulates BitNet's Linear layers, resulting in memory usage similar to fp16 models. By leveraging the QuantLinear module from [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ), this repository enables the output and execution of a 2-bit quantized model.
The quantized model offers significant advantages in terms of model size and memory consumption. With a model size of just 1GB , the quantized 3B model can perform inference with a context size of 2048 while consuming only 4.5GB of VRAM. Furthermore, since the weights used during execution are the same as the original repository, the perplexity (PPL) output remains unchanged.
## Install
```
pip install -r requirements.txt
```
## Quantization
The quantized model is already provided in this repository. However, if you wish to quantize the model yourself, you can load it from 1bitLLM/bitnet_b1_58-3B and save the quantized version (2-bit) to ./bitnet_b1_58-3B_quantized by running the following command:
```
python quantization.py
```
## Evaluation
```
python eval_ppl.py --hf_path ./ --seqlen 2048 --max_dataset_size 1000
```
```
python eval_task.py --hf_path ./ \
--batch_size 1 \
--tasks \
--output_path result.json \
--num_fewshot 0 \
--ctx_size 2048
```