BELLE-7B-gptq / README.md
mabaochang's picture
Update README.md
4060faa
|
raw
history blame
961 Bytes
metadata
license: apache-2.0
tags:
  - text2text-generation
pipeline_tag: text2text-generation
language:
  - zh
  - en

GPTQ-for-Bloom

4 bits quantization of Bloom using GPTQ

GPTQ is SOTA one-shot weight quantization method.

The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/gptq.

This code is based on GPTQ-for-LLaMa

Model list

model name file size GPU memory
bloom7b-2m-8bit-128g.pt 9.7G 11G
bloom7b-2m-4bit-128g.pt 6.9G 8G
bloom7b-2m-3bit-128g.pt 6.2G 7.7G