license: apache-2.0
tags:
- text2text-generation
pipeline_tag: text2text-generation
language:
- zh
- en
GPTQ-for-Bloom
Welcome
If you find this model helpful, please like this model and star us on https://github.com/LianjiaTech/BELLE !
Model description
8 bits quantization of Bloom using GPTQ
GPTQ is SOTA one-shot weight quantization method.
The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/gptq.
Basically, 8-bit quantization and 128 groupsize are recommended.
This code is based on GPTQ-for-LLaMa
Model list
model name | file size | GPU memory usage |
---|---|---|
base | 27G | ~28.2G |
bloom7b-2m-8bit-128g.pt | 9.7G | ~11.4G |
bloom7b-2m-4bit-128g.pt | 6.9G | ~8.4G |
bloom7b-0.2m-8bit-128g.pt | 9.7G | ~11.4G |
bloom7b-0.2m-4bit-128g.pt | 6.9G | ~8.4G |
Citation
Please cite us when using our code, data or model.
@misc{BELLE,
author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li},
title = {BELLE: Bloom-Enhanced Large Language model Engine },
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/LianjiaTech/BELLE}},
}
Cite the original BLOOM, Stanford Alpaca and Self-Instruct papers as well!
GPTQ-for-Bloom
欢迎
如果您觉得此模型对您有帮助,请like此模型并在https://github.com/LianjiaTech/BELLE 项目中star我们!
模型描述
GPTQ是目前SOTA的one-shot权重量化方法。
此模型的推理代码请见https://github.com/LianjiaTech/BELLE/gptq .
一般来说,推荐使用8-bit量化及groupsize = 128.
推理代码基于GPTQ-for-LLaMa
模型列表
模型名称 | 文件大小 | GPU显存占用 |
---|---|---|
base | 27G | ~28.2G |
bloom7b-2m-8bit-128g.pt | 9.7G | ~11.4G |
bloom7b-2m-4bit-128g.pt | 6.9G | ~8.4G |
bloom7b-0.2m-8bit-128g.pt | 9.7G | ~11.4G |
bloom7b-0.2m-4bit-128g.pt | 6.9G | ~8.4G |
引用
如果使用本项目的代码、数据或模型,请引用本项目。
@misc{BELLE,
author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li},
title = {BELLE: Bloom-Enhanced Large Language model Engine },
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/LianjiaTech/BELLE}},
}
也请同时引用原始的BLOOM论文、Stanford Alpaca和Self-Instruct论文。