---
license: apache-2.0
tags:
- text2text-generation
pipeline_tag: text2text-generation
language:
- zh
- en
---

---
license: apache-2.0
tags:
- text2text-generation
pipeline_tag: text2text-generation
language:
- zh
- en
---
# GPTQ-for-Bloom

## Welcome
If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE !

## Model description
8 bits quantization of [Bloom](https://arxiv.org/pdf/2211.05100.pdf) using [GPTQ](https://arxiv.org/abs/2210.17323)

GPTQ is SOTA one-shot weight quantization method.

The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/gptq.

Basically, 8-bit quantization and 128 groupsize are recommended.

**This code is based on [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)**

## Model list

| model name       |  file size | GPU memory usage |
| -------------------------------------------------- |  ------------------- | ------------------ |
|           base                 |          27G        |       ~28.2G         |
|           bloom7b-2m-8bit-128g.pt                  |          9.7G        |       ~11.4G          |
|           bloom7b-2m-4bit-128g.pt                  |          6.9G        |        ~8.4G          |
|           bloom7b-0.2m-8bit-128g.pt                  |          9.7G        |       ~11.4G          |
|           bloom7b-0.2m-4bit-128g.pt                  |          6.9G        |        ~8.4G          |

## Citation

Please cite us when using our code, data or model.

```
@misc{BELLE,
  author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li},
  title = {BELLE: Bloom-Enhanced Large Language model Engine },
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LianjiaTech/BELLE}},
}
```

Cite the original BLOOM, Stanford Alpaca and Self-Instruct papers as well!

***

# GPTQ-for-Bloom

## 欢迎
如果您觉得此模型对您有帮助，请like此模型并在https://github.com/LianjiaTech/BELLE 项目中star我们！

## 模型描述
对[Bloom](https://arxiv.org/pdf/2211.05100.pdf)模型使用[GPTQ](https://arxiv.org/abs/2210.17323)进行8 bit（8位）量化。

GPTQ是目前SOTA的one-shot权重量化方法。

此模型的推理代码请见https://github.com/LianjiaTech/BELLE/gptq .

一般来说，推荐使用8-bit量化及groupsize = 128.

**推理代码基于[GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)**

## 模型列表

| 模型名称       |  文件大小 | GPU显存占用 |
| -------------------------------------------------- |  ------------------- | ------------------ |
|           base                 |          27G        |       ~28.2G         |
|           bloom7b-2m-8bit-128g.pt                  |          9.7G        |       ~11.4G          |
|           bloom7b-2m-4bit-128g.pt                  |          6.9G        |        ~8.4G          |
|           bloom7b-0.2m-8bit-128g.pt                  |          9.7G        |       ~11.4G          |
|           bloom7b-0.2m-4bit-128g.pt                  |          6.9G        |        ~8.4G          |

## 引用
如果使用本项目的代码、数据或模型，请引用本项目。
```
@misc{BELLE,
  author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li},
  title = {BELLE: Bloom-Enhanced Large Language model Engine },
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LianjiaTech/BELLE}},
}
```
也请同时引用原始的BLOOM论文、Stanford Alpaca和Self-Instruct论文。