metadata
license: apache-2.0
tags:
- text2text-generation
pipeline_tag: text2text-generation
language:
- zh
- en
GPTQ-for-Bloom
4 bits quantization of Bloom using GPTQ
GPTQ is SOTA one-shot weight quantization method.
The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/gptq.
This code is based on GPTQ-for-LLaMa
Model list
model name | file size | GPU memory |
---|---|---|
bloom7b-2m-8bit-128g.pt | 9.7G | 11G |
bloom7b-2m-4bit-128g.pt | 6.9G | 8G |
bloom7b-2m-3bit-128g.pt | 6.2G | 7.7G |