Model cannot be found when using with the auto_gptq library

#16

by Mincookie - opened Jun 26, 2023

Jun 26, 2023

Hi there I'm trying to run an instance of this model on google colab with the below code

from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import logging

quantized_model_dir = "/content/drive/MyDrive/Wizard-Vicuna-13B-Uncensored-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True)

try:
   quantize_config = BaseQuantizeConfig.from_pretrained(quantized_model_dir)
except:
    quantize_config = BaseQuantizeConfig(
            bits=4,
            group_size=128
        )
    
# download quantized model from Hugging Face Hub and load to the first GPU
model = AutoGPTQForCausalLM.from_quantized(
    quantized_model_dir, 
    device="cuda:0", 
    use_safetensors=True, 
    use_triton=False, 
    quantize_config=quantize_config
    )

pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)

print(pipeline("auto-gptq is")[0]["generated_text"])

However regardless of whether I set the quantized_model_dir as a location in my drive or in the temporary file location it will always return with the below traceback:

─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 18>:18                                                                            │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/auto.py:82 in from_quantized          │
│                                                                                                  │
│    79 │   │   model_type = check_and_get_model_type(save_dir or model_name_or_path, trust_remo   │
│    80 │   │   quant_func = GPTQ_CAUSAL_LM_MODEL_MAP[model_type].from_quantized                   │
│    81 │   │   keywords = {key: kwargs[key] for key in signature(quant_func).parameters if key    │
│ ❱  82 │   │   return quant_func(                                                                 │
│    83 │   │   │   model_name_or_path=model_name_or_path,                                         │
│    84 │   │   │   save_dir=save_dir,                                                             │
│    85 │   │   │   device_map=device_map,                                                         │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py:698 in from_quantized        │
│                                                                                                  │
│   695 │   │   │   │   │   break                                                                  │
│   696 │   │                                                                                      │
│   697 │   │   if resolved_archive_file is None: # Could not find a model file to use             │
│ ❱ 698 │   │   │   raise FileNotFoundError(f"Could not find model in {model_name_or_path}")       │
│   699 │   │                                                                                      │
│   700 │   │   model_save_name = resolved_archive_file                                            │
│   701                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: Could not find model in /content/drive/MyDrive/Wizard-Vicuna-13B-Uncensored-GPTQ

I've made sure I downloaded all the files from huggingface (including the tokenizer_config.json file) and have them in my drive.

Does anyone know why it won't detect the model?

TheBloke

Owner Jun 26, 2023

•

edited Jun 26, 2023

Yeah you need to specify model_basename

model_basename = "Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order"

model = AutoGPTQForCausalLM.from_quantized(
    quantized_model_dir, 
    model_basename=model_basename
    device="cuda:0", 
    use_safetensors=True, 
    use_triton=False, 
    quantize_config=quantize_config
    )

So that it knows what the name of the safetensors file is.

Mincookie

Jun 26, 2023

Oh I realised I was missing the model_basename argument. All good, thanks!

Mincookie

Jun 26, 2023

Yeah you need to specify model_basename

model_basename = "Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order"

model = AutoGPTQForCausalLM.from_quantized(
    quantized_model_dir, 
    model_basename=model_basename
    device="cuda:0", 
    use_safetensors=True, 
    use_triton=False, 
    quantize_config=quantize_config
    )

So that it knows what the name of the safetensors file is.

Thanks so much, I just realised the legend of the forum answered after I posed my last comment. Cheers!

Mincookie changed discussion status to closed Jun 26, 2023

TheBloke

Owner Jun 26, 2023

You're welcome!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment