Loading model in 8bit

#37
by abhi24 - opened

Does loading the model in 8 bit always lead to a poor quality of performance? at least poorer compared to original model?
Can someone describe in brief what happens when we load it 8 bit?

Databricks org

Instead of working with 16-bit floating-point numbers for weights, you work with 8-bit integers. These have much smaller range and precision, so the math is less accurate where done in 8-bit. It's not necessarily faster either. But it takes half the memory. It doesn't necessarily make the result much worse; some have experimented even with 4-bit math. For example, the Dolly 12B model works on an A10 in 8-bit and the results seem pretty fine to me.

Thank you for the insightful reply.

abhi24 changed discussion status to closed

Sign up or log in to comment