Spaces:
Running
Running
OpenSourceRonin
commited on
Commit
•
c7f64f9
1
Parent(s):
8453bfa
Update README.md
Browse files
README.md
CHANGED
@@ -36,21 +36,6 @@ Scaling model size significantly challenges the deployment and inference of Larg
|
|
36 |
|
37 |
Read tech report at [**Tech Report**](https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf) and [**arXiv Paper**](https://arxiv.org/pdf/2409.17066)
|
38 |
|
39 |
-
### Early Results from Tech Report
|
40 |
-
VPTQ achieves better accuracy and higher throughput with lower quantization overhead across models of different sizes. The following experimental results are for reference only; VPTQ can achieve better outcomes under reasonable parameters, especially in terms of model accuracy and inference speed.
|
41 |
-
|
42 |
-
<img src="assets/vptq.png" width="500">
|
43 |
-
|
44 |
-
| Model | bitwidth | W2↓ | C4↓ | AvgQA↑ | tok/s↑ | mem(GB) | cost/h↓ |
|
45 |
-
| ----------- | -------- | ---- | ---- | ------ | ------ | ------- | ------- |
|
46 |
-
| LLaMA-2 7B | 2.02 | 6.13 | 8.07 | 58.2 | 39.9 | 2.28 | 2 |
|
47 |
-
| | 2.26 | 5.95 | 7.87 | 59.4 | 35.7 | 2.48 | 3.1 |
|
48 |
-
| LLaMA-2 13B | 2.02 | 5.32 | 7.15 | 62.4 | 26.9 | 4.03 | 3.2 |
|
49 |
-
| | 2.18 | 5.28 | 7.04 | 63.1 | 18.5 | 4.31 | 3.6 |
|
50 |
-
| LLaMA-2 70B | 2.07 | 3.93 | 5.72 | 68.6 | 9.7 | 19.54 | 19 |
|
51 |
-
| | 2.11 | 3.92 | 5.71 | 68.7 | 9.7 | 20.01 | 19 |
|
52 |
-
|
53 |
-
---
|
54 |
|
55 |
## Installation
|
56 |
|
@@ -148,72 +133,4 @@ A environment variable is available to control share link or not.
|
|
148 |
`export SHARE_LINK=1`
|
149 |
```
|
150 |
python -m vptq.app
|
151 |
-
```
|
152 |
-
|
153 |
-
---
|
154 |
-
|
155 |
-
## Road Map
|
156 |
-
- [ ] Merge the quantization algorithm into the public repository.
|
157 |
-
- [ ] Submit the VPTQ method to various inference frameworks (e.g., vLLM, llama.cpp).
|
158 |
-
- [ ] Improve the implementation of the inference kernel.
|
159 |
-
- [ ] **TBC**
|
160 |
-
|
161 |
-
## Project main members:
|
162 |
-
* Yifei Liu (@lyf-00)
|
163 |
-
* Jicheng Wen (@wejoncy)
|
164 |
-
* Yang Wang (@YangWang92)
|
165 |
-
|
166 |
-
## Acknowledgement
|
167 |
-
|
168 |
-
* We thank for **James Hensman** for his crucial insights into the error analysis related to Vector Quantization (VQ), and his comments on LLMs evaluation are invaluable to this research.
|
169 |
-
* We are deeply grateful for the inspiration provided by the papers QUIP, QUIP#, GPTVQ, AQLM, WoodFisher, GPTQ, and OBC.
|
170 |
-
|
171 |
-
## Publication
|
172 |
-
|
173 |
-
EMNLP 2024 Main
|
174 |
-
```bibtex
|
175 |
-
@inproceedings{
|
176 |
-
vptq,
|
177 |
-
title={VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models},
|
178 |
-
author={Yifei Liu and
|
179 |
-
Jicheng Wen and
|
180 |
-
Yang Wang and
|
181 |
-
Shengyu Ye and
|
182 |
-
Li Lyna Zhang and
|
183 |
-
Ting Cao and
|
184 |
-
Cheng Li and
|
185 |
-
Mao Yang},
|
186 |
-
booktitle={The 2024 Conference on Empirical Methods in Natural Language Processing},
|
187 |
-
year={2024},
|
188 |
-
}
|
189 |
-
```
|
190 |
-
|
191 |
-
---
|
192 |
-
|
193 |
-
## Limitation of VPTQ
|
194 |
-
* ⚠️ VPTQ should only be used for research and experimental purposes. Further testing and validation are needed before you use it.
|
195 |
-
* ⚠️ The repository only provides a method of model quantization algorithm. The open-source community may provide models based on the technical report and quantization algorithm by themselves, but the repository cannot guarantee the performance of those models.
|
196 |
-
* ⚠️ VPTQ is not capable of testing all potential applications and domains, and VPTQ cannot guarantee the accuracy and effectiveness of VPTQ across other tasks or scenarios.
|
197 |
-
* ⚠️ Our tests are all based on English texts; other languages are not included in the current testing.
|
198 |
-
|
199 |
-
## Contributing
|
200 |
-
|
201 |
-
This project welcomes contributions and suggestions. Most contributions require you to agree to a
|
202 |
-
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
|
203 |
-
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
|
204 |
-
|
205 |
-
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
|
206 |
-
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
|
207 |
-
provided by the bot. You will only need to do this once across all repos using our CLA.
|
208 |
-
|
209 |
-
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
|
210 |
-
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
|
211 |
-
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
|
212 |
-
|
213 |
-
## Trademarks
|
214 |
-
|
215 |
-
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
|
216 |
-
trademarks or logos is subject to and must follow
|
217 |
-
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
|
218 |
-
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
|
219 |
-
Any use of third-party trademarks or logos are subject to those third-party's policies.
|
|
|
36 |
|
37 |
Read tech report at [**Tech Report**](https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf) and [**arXiv Paper**](https://arxiv.org/pdf/2409.17066)
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
## Installation
|
41 |
|
|
|
133 |
`export SHARE_LINK=1`
|
134 |
```
|
135 |
python -m vptq.app
|
136 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|