HF/bitsandbytes load_in_4bit is now apparently live apparently! (in peft)

#24

by 2themaxx - opened May 24, 2023

Discussion

2themaxx

May 24, 2023

•

edited May 24, 2023

Was looking at various llm quant options and came across: https://github.com/artidoro/qlora

It mentions load_in_4bit in the README.md, and I hadn't heard of that being available. Apparently they built a new datatype for it (not sure how performant it is). After a bit of looking around, it's apparently now part of the huggingface peft library. Tim Dettmers (working on bitsandbytes) is also a contributor to the qlora repo (likely also part of his research), but it looks like you can now load straight models into frozen 4-bit at least for training, and it should be able to be work in inference as well one would assume just by following the instructions on the readme.

I stumbled onto the closed conversation here by TheBloke mentioning "when it is released it will look like... load_in_4bit" or some such, and thought I'd post this in case it's useful to anyone.

Wonder if this could also be used on the CPU easily and how that might perform🤔

TheBloke

Owner May 24, 2023

Thanks. It's not fully released yet. The code is in PEFT and transformers (or will be soon). But the actual 4bit bitsandbytes library is not yet released. I'm sure it'll be out very soon.

I wouldn't hold your hopes up for good performance. 8bit bitsandbytes performs much worse than other methods, and early indications seem to be that this may be the same for 4bit.

The huge benefit that bitsandbytes has is how easy it is to use. You can download any HF model and with one parameter, load it in 8bit instead, and soon 4bit as well. But that does bring with it a cost to performance. I wouldn't expect it to be usable on CPU at all; for that you want GGML q4 or q5.

2themaxx

May 25, 2023

•

edited May 25, 2023

Not claiming anything about performance, but it looks like it's at least in alpha release after the hugging face blog post on it today.

https://huggingface.co/blog/4bit-transformers-bitsandbytes

They claim it works in inference.

TheBloke

Owner May 25, 2023

•

edited May 25, 2023

Yeah it's out for training - but not yet ready for inference https://twitter.com/Tim_Dettmers/status/1661617478865395712?s=20

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment