How is the performance of the model with 2bits only?

#1
by DrNicefellow - opened

ANyone tested it?

Mobius Labs GmbH org

You can find numbers for the base model and comparison with bitsandbytes below:

Wikitext2 PPL/Memory: HQQ vs bitsandbytes (BNB)

#8-bit (group_size=128)
Mixtral-8x7B-v0.1 / BNB : 3.64 | (54.5 GB)
Mixtral-8x7B-v0.1 / HQQ : 3.63 | (47 GB)

#4-bit (group_size=64)
Mixtral-8x7B-v0.1 / BNB : 3.97 | (27 GB)
Mixtral-8x7B-v0.1 / HQQ : 3.79 | (26 GB)

#3-bit (group_size=128)
Mixtral-8x7B-v0.1 / HQQ : 4.76 | (21.8 GB)

#2-bit (group_size=16 | scale_g128/zero=8-bit):
Mixtral-8x7B-v0.1 / HQQ : 5.90 | (18 GB)

I wouldn't recommend using the 2-bit in production, rather use the 4-bit version. But we wanted to provide the community with a model that can run on a single 24 GB card so they can play with it and see if they like the feel of the model compared to others. I have personally played with it and the instruct model is working surprisingly fine with this 2-bit settings.

Sign up or log in to comment