This is 2-bit quantization of Qwen/Qwen1.5-72B-Chat using QuIP#
Random samples from RedPajama and Skypile (for Chinese) are used as calibration data.
Model loading
Please follow the instruction of QuIP-for-all for usage.
As an alternative, you can use Aphrodite engine or my vLLM branch for faster inference. If you have problem installing fast-hadamard-transform
from pip, you can also install it from source
- Downloads last month
- 15
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.