Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
KnutJaegersberg 
posted an update Jan 7
Post
QuIP# ecosystem is growing :)

I've seen a quip# 2 bit Qwen-72b-Chat model today on the hub that shows there is support for vLLM inference.
This will speed up inference and make high performing 2 bit models more practical. I'm considering quipping MoMo now, as I can only use brief context window of Qwen-72b on my system otherwise, even with bnb double quantization.

keyfan/Qwen-72B-Chat-2bit

Also notice the easier to use Quip# for all library :)

https://github.com/chu-tianxiang/QuIP-for-all

The papers supporting QuIP are fascinating, here are some links in case people were not aware of them:

Good to know that there are still people doing real maths in this field :D

·

might be of interest to @merve @osanseviero too