File size: 5,085 Bytes
1427fc7
 
 
 
 
 
 
 
 
 
7e11068
1427fc7
 
 
 
 
b1c5a4d
7e11068
 
 
b1c5a4d
7e11068
2707373
 
b1c5a4d
7e11068
 
 
5681744
7e11068
5681744
 
 
 
 
7e11068
b1c5a4d
 
 
 
 
 
 
 
 
 
 
1427fc7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b1c5a4d
1427fc7
 
 
b679bbe
1427fc7
 
 
b1c5a4d
1427fc7
 
 
 
 
 
 
 
 
 
 
b1c5a4d
 
 
 
 
 
 
 
 
 
 
1427fc7
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
license: apache-2.0
tags:
- text2text-generation
pipeline_tag: text2text-generation
language:
- zh
- en
---

# GPTQ-for-Bloom

## Welcome
If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE !

## Model description
8 bits quantization of [BELLE-7B-2M](https://huggingface.co/BelleGroup/BELLE-7B-2M) and [BELLE-7B-0.2M](https://huggingface.co/BelleGroup/BELLE-7B-0.2M) using [GPTQ](https://arxiv.org/abs/2210.17323)

GPTQ is SOTA one-shot weight quantization method.

The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/tree/main/gptq.

Basically, 8-bit quantization and 128 groupsize are recommended.

**This code is based on [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) for [Bloom](https://arxiv.org/pdf/2211.05100.pdf) model**

## Model list

| model name       |  file size | GPU memory usage |
| -------------------------------------------------- |  ------------------- | ------------------ |
|           base                 |          27G        |       ~28.2G         |
|           bloom7b-2m-8bit-128g.pt                  |          9.7G        |       ~11.4G          |
|           bloom7b-2m-4bit-128g.pt                  |          6.9G        |        ~8.4G          |
|           bloom7b-0.2m-8bit-128g.pt                  |          9.7G        |       ~11.4G          |
|           bloom7b-0.2m-4bit-128g.pt                  |          6.9G        |        ~8.4G          |

## Limitations
There still exists a few issues in the model trained on current base model and data:

1. The model might generate factual errors when asked to follow instructions related to facts.

2. Occasionally generates harmful responses since the model still struggles to identify potential harmful instructions.

3. Needs improvements on reasoning and coding.

Since the model still has its limitations, we require developers only use the open-sourced code, data, model and any other artifacts generated via this project for research purposes. Commercial use and other potential harmful use cases are not allowed.

## Citation

Please cite us when using our code, data or model.

```
@misc{BELLE,
  author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li},
  title = {BELLE: Bloom-Enhanced Large Language model Engine },
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LianjiaTech/BELLE}},
}
```

Cite the original BLOOM, Stanford Alpaca and Self-Instruct papers as well!

***

# GPTQ-for-Bloom

## 欢迎
如果您觉得此模型对您有帮助,请like此模型并在https://github.com/LianjiaTech/BELLE 项目中star我们!

## 模型描述
对[BELLE-7B-2M](https://huggingface.co/BelleGroup/BELLE-7B-2M) and [BELLE-7B-0.2M](https://huggingface.co/BelleGroup/BELLE-7B-0.2M)进行8 bit(8位)量化。

GPTQ是目前SOTA的one-shot权重量化方法。

此模型的推理代码请见https://github.com/LianjiaTech/BELLE/tree/main/models/gptq .

一般来说,推荐使用8-bit量化及groupsize = 128.

**[Bloom](https://arxiv.org/pdf/2211.05100.pdf)模型使用[GPTQ](https://arxiv.org/abs/2210.17323)的推理代码基于[GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)**

## 模型列表

| 模型名称       |  文件大小 | GPU显存占用 |
| -------------------------------------------------- |  ------------------- | ------------------ |
|           base                 |          27G        |       ~28.2G         |
|           bloom7b-2m-8bit-128g.pt                  |          9.7G        |       ~11.4G          |
|           bloom7b-2m-4bit-128g.pt                  |          6.9G        |        ~8.4G          |
|           bloom7b-0.2m-8bit-128g.pt                  |          9.7G        |       ~11.4G          |
|           bloom7b-0.2m-4bit-128g.pt                  |          6.9G        |        ~8.4G          |

## 局限性和使用限制
基于当前数据和基础模型训练得到的SFT模型,在效果上仍存在以下问题:

1. 在涉及事实性的指令上可能会产生违背事实的错误回答。

2. 对于具备危害性的指令无法很好的鉴别,由此会产生危害性言论。

3. 在一些涉及推理、代码等场景下模型的能力仍有待提高。

基于以上模型局限性,我们要求开发者仅将我们开源的代码、数据、模型及后续用此项目生成的衍生物用于研究目的,不得用于商业,以及其他会对社会带来危害的用途。

## 引用
如果使用本项目的代码、数据或模型,请引用本项目。
```
@misc{BELLE,
  author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li},
  title = {BELLE: Bloom-Enhanced Large Language model Engine },
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LianjiaTech/BELLE}},
}
```
也请同时引用原始的BLOOM论文、Stanford Alpaca和Self-Instruct论文。