Running finetuned inference on CPU - accelerate ImportError
I have successfully finetuned gemma-2b for TextClassification using LORA and merged model by using merge_and_unload()
.
I then saved it to my local path using model.save_pretrained(f"{LOCAL_MODEL_PATH}", safe_serialization=False)
.
This was done on a GPU machine.
I am trying to load the above model on a CPU only device for inference using the following scriptmodel = AutoModelForSequenceClassification.from_pretrained(LOCAL_MODEL_PATH, num_labels=2)
However, I see the issue
ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`
I do have accelerate installed and following are the libraries I installed before loading the model
tokenizers==0.15.2
transformers==4.39.3
torch==2.2.2
bitsandbytes==0.43.0
accelerate==0.28.0
peft==0.10.0
I see the same error even if I run the commands on a linux terminal.
Would like to get some help in resolving the issue and running the model on a CPU only machine for inference.
Hi
@saikrishna6491
, BitsAndBytes 8-bit quantization requires the Accelerate library, which is not installed or an outdated version is being used, Ensure that accelerate
is installed with pip install accelerate
. and Update bitsandbytes
to the latest version with pip install -i
. Once these steps are completed, you should be able to load your fine-tuned model on a CPU-only machine for inference. Kindly try and let me know if issue is still persists. Thank you.