Issue with loading 4-bit quantized model on Apple M1 pro

#45
by waxsum8 - opened

Have been facing an issue with loading the gemma-2b-it model 4-bit quantization config on Apple M1 pro.

Code:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

model_id = "google/gemma-2b-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)

Error:

ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`

I have tried installing the latest version of accelerate, bitsandbytes as well as transformers, still facing the issue for quantized version.

Sign up or log in to comment