AUTOGPTQ Error in Google Colab
When trying to load the model in google colab, I get the error:
ImportError: Loading a GPTQ quantized model requires optimum (pip install optimum
) and auto-gptq library (pip install auto-gptq
)
My code has following:
!pip install -q -U transformers peft accelerate optimum
!pip install auto-gptq
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ")
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ") #ERROR HAPPENS HERE
If I try a different 7b gptq model it does't give the error, for example:
model = AutoModelForCausalLM.from_pretrained("edumunozsala/llama-2-7b-int4-python-code-20k")
Not sure why it's working with that other model and not this one. But please try installing AutoGPTQ as follows:
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
as i understand error might be because transformers lib can not check existance of auto-gpt lib
_is_package_available function in transformers uses this code: "package_exists = importlib.util.find_spec(pkg_name) is not None"
the error might me in importlib lib, I cant find "util" module in it (Python 3.10.12, kaggle notebook)
yeah I am facing the same error ? But I got it working by using langchain Ctransformers .CTransformers(model="TheBloke/Llama-2-7b-Chat-GPTQ"). But still I want to download this model using pretrained and then use it like on a local hardware .
is there any solution ?
Please help with this error.
from transformers import AutoTokenizer, pipeline, logging, AutoModelForCausalLM
#from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name_or_path = "TheBloke/Llama-2-7b-Chat-GPTQ"
model_basename = "model"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)