Unable to load the model in 8 bits

#30
by ARahul2003 - opened

Hi all, I have been trying to use this model on a laptop without any GPU for one of my course projects. Naturally, I am required to load this model in 8-bit quantization form. However, whenever I try to load it in a quantized state, I get an error stating that the accelerate and bits-and-bytes libraries are not present. I made to sure install those libraries in my virtual environment, yet the error persists. Please help me.

Huggingface 2.png

huggingface 1.png

Here is the code that I have written:

from transformers import AutoModelForCausalLM, AutoTokenizer
import accelerate
import bitsandbytes
import gradio as gr
import torch

title = "????AI ChatBot"
description = "Quantised version of the Phi 1.5 LLM released by Microsoft research"
examples = [["How are you?"]]

tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True, torch_dtype="auto")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", trust_remote_code=True, torch_dtype="auto", load_in_8bit = True)

def predict(input, history=[]):
# tokenize the new input sentence
new_user_input_ids = tokenizer.encode(
input + tokenizer.eos_token, return_tensors="pt"
)

# append the new user input tokens to the chat history
bot_input_ids = torch.cat([torch.LongTensor(history), new_user_input_ids], dim=-1)

# generate a response
history = model.generate(
    bot_input_ids, max_length=4000, pad_token_id=tokenizer.eos_token_id
).tolist()

# convert the tokens to text, and then split the responses into lines
response = tokenizer.decode(history[0]).split("<|endoftext|>")
# print('decoded_response-->>'+str(response))
response = [
    (response[i], response[i + 1]) for i in range(0, len(response) - 1, 2)
]  # convert to tuples of list
# print('response-->>'+str(response))
return response, history

gr.Interface(
fn=predict,
title=title,
description=description,
examples=examples,
inputs=["text", "state"],
outputs=["chatbot", "state"],
theme="finlaymacklon/boxy_violet",
).launch()

Microsoft org

Hello @ARahul2003 !

Your image still show an ImportError, which could be related to an incomplete installation of either accelerate or bitsandbytes. However, please note that we haven't tested Phi-based models support with 8 bits, so I am unsure what will be its behavior.

Hello @ARahul2003 ,

You need to use older version of transformers.

!pip install -qU trl datasets accelerate loralib einops xformers bitsandbytes
!pip install transformers==4.30

gugarosa changed discussion status to closed

Sign up or log in to comment