Edit model card

This is 2-bit quantization of Qwen/Qwen1.5-72B-Chat using QuIP#

Random samples from RedPajama and Skypile (for Chinese) are used as calibration data.

Model loading

Please follow the instruction of QuIP-for-all for usage.

As an alternative, you can use Aphrodite engine or my vLLM branch for faster inference. If you have problem installing fast-hadamard-transform from pip, you can also install it from source

Downloads last month
1
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.