Transformer bitsandbytes 4 bit quant does not work well. QLoRA also fails

#4
by Yhyu13 - opened

HI,

I am not sure if you have tried transformer inference or not, but it seems internlm2 would not work properly under bitsandbytes 4 bit quantization. It will constantly spit out self Q&As without stopping.

Also accerlerate QLoRA would also not work on internlm2 with error on some tensor has no grad.

float16 would work in all cases.

It is not a big deal since internlm2 mainly support lmdeploy framework with its own 4 bit quantization instead of transformer.

Thanks!

InternLM org

Hi @Yhyu13 ,
Could you please refer to this doc https://github.com/InternLM/InternLM/pull/636 and see if it solves your issue?

Sign up or log in to comment