latest update broke use

#16
by pseudotensor - opened

https://huggingface.co/TheBloke/Nous-Hermes-13B-GPTQ/commit/05c24345fc9a7b94b9e5ed7deebb534cd928a578

Since this, the example fails:

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/Nous-Hermes-13B-GPTQ"
model_basename = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

says:

(h2ogpt) jon@pseudotensor:~/h2ogpt$ python
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from transformers import AutoTokenizer, pipeline, logging

>>> from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
>>> import argparse
>>> 
>>> model_name_or_path = "TheBloke/Nous-Hermes-13B-GPTQ"
>>> model_basename = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"
>>> use_triton = False
>>> 
>>> tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
>>> 
>>> model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
...         model_basename=model_basename,
...         use_safetensors=True,
...         trust_remote_code=True,
...         device="cuda:0",
...         use_triton=use_triton,
...         quantize_config=None)
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ <stdin>:1 in <module>                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/auto_gptq/modeling/auto.py:94 in   โ”‚
โ”‚ from_quantized                                                                                   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    91 โ”‚   โ”‚   โ”‚   for key in signature(quant_func).parameters                                    โ”‚
โ”‚    92 โ”‚   โ”‚   โ”‚   if key in kwargs                                                               โ”‚
โ”‚    93 โ”‚   โ”‚   }                                                                                  โ”‚
โ”‚ โฑ  94 โ”‚   โ”‚   return quant_func(                                                                 โ”‚
โ”‚    95 โ”‚   โ”‚   โ”‚   model_name_or_path=model_name_or_path,                                         โ”‚
โ”‚    96 โ”‚   โ”‚   โ”‚   save_dir=save_dir,                                                             โ”‚
โ”‚    97 โ”‚   โ”‚   โ”‚   device_map=device_map,                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/auto_gptq/modeling/_base.py:714 in โ”‚
โ”‚ from_quantized                                                                                   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   711 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   break                                                                  โ”‚
โ”‚   712 โ”‚   โ”‚                                                                                      โ”‚
โ”‚   713 โ”‚   โ”‚   if resolved_archive_file is None: # Could not find a model file to use             โ”‚
โ”‚ โฑ 714 โ”‚   โ”‚   โ”‚   raise FileNotFoundError(f"Could not find model in {model_name_or_path}")       โ”‚
โ”‚   715 โ”‚   โ”‚                                                                                      โ”‚
โ”‚   716 โ”‚   โ”‚   model_save_name = resolved_archive_file                                            โ”‚
โ”‚   717                                                                                            โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
FileNotFoundError: Could not find model in TheBloke/Nous-Hermes-13B-GPTQ
>>> 

However this now works:

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/Nous-Hermes-13B-GPTQ"
model_basename = "model"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)
pseudotensor changed discussion status to closed

Correct, I have renamed all models to model.safetensors to prepare for native Transformers GPTQ support which is coming in the next couple of days.

I've updated all my code examples to show that model_basename = "model" should be used. But I've not yet put out more detailed documentation. That will be coming to all GPTQ repos as soon as the new Transformers version goes live, hopefully tomorrow.

In fact you can now leave out model_basename entirely - I also updated quantize_config.json to indicate that model_basename=model so there's no need to manually specify model_basename in .from_quantized() any more. When I update the docs properly I will remove that. Actually I'll remove all AutoGPTQ code, and show loading it directly from Transformers.

So will you be moving away from AutoGPTQ as main inspiration for GPTQ? I know you tracked his project and was pushing him some to get working more on it :) Once in transformers, no need for AutoGPTQ or GPTQ-for-LLaMa?

The Transformers implementation uses AutoGPTQ as its backend, so AutoGPTQ will still be required. To use GPTQ in Transformers, the user will need three packages:
transformers optimum auto-gptq

So AutoGPTQ will still be vital.

But yeah GPTQ-for-LLaMa is dead as far as I'm concerned!

Sign up or log in to comment