AUTOGPTQ Error in Google Colab

by echogit - opened

When trying to load the model in google colab, I get the error:

ImportError: Loading a GPTQ quantized model requires optimum (pip install optimum) and auto-gptq library (pip install auto-gptq)

My code has following:

!pip install -q -U transformers peft accelerate optimum
!pip install auto-gptq

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ")
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ") #ERROR HAPPENS HERE

If I try a different 7b gptq model it does't give the error, for example:
model = AutoModelForCausalLM.from_pretrained("edumunozsala/llama-2-7b-int4-python-code-20k")

Not sure why it's working with that other model and not this one. But please try installing AutoGPTQ as follows:

!pip install auto-gptq --extra-index-url 

as i understand error might be because transformers lib can not check existance of auto-gpt lib
_is_package_available function in transformers uses this code: "package_exists = importlib.util.find_spec(pkg_name) is not None"
the error might me in importlib lib, I cant find "util" module in it (Python 3.10.12, kaggle notebook)

yeah I am facing the same error ? But I got it working by using langchain Ctransformers .CTransformers(model="TheBloke/Llama-2-7b-Chat-GPTQ"). But still I want to download this model using pretrained and then use it like on a local hardware .

is there any solution ?

@TheBloke solution worked for me

Please help with this error.

from transformers import AutoTokenizer, pipeline, logging, AutoModelForCausalLM
#from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name_or_path = "TheBloke/Llama-2-7b-Chat-GPTQ"
model_basename = "model"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(model_name_or_path,

Sign up or log in to comment