May 18, 2023

•

edited May 19, 2023

I recently used AUTOGPTQ with GPT-J models and it worked quite well, now out of nowhere I get an error with triton even though it indicates that it is turned off, has this ever happened to you, do you know of a solution,

error: (base) C:\Users\ReDXeoL\AutoGPTQ\examples\quantization>python basic_usage.py
triton not installed.
Traceback (most recent call last):
##
##
ModuleNotFoundError: No module named 'triton'

(base) C:\Users\ReDXeoL\AutoGPTQ\examples\quantization>

this is my code:

import os

from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

pretrained_model_dir = "A:/LLMs_LOCAL/bertin_gpt_j_6B_alpaca/"
quantized_model_dir = "bertin-gpt-j-6B-alpaca-4bit-128g"

os.makedirs(quantized_model_dir, exist_ok=True)

def main():
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
examples = [
tokenizer(
"auto-gptq es una biblioteca de cuantificación de modelos fácil de usar con API amigables para el usuario, basada en el algoritmo GPTQ."
),
tokenizer(
"La inteligencia artificial ha avanzado significativamente en los últimos años."
),
tokenizer(
"La cuantificación de modelos puede reducir el tamaño y mejorar la eficiencia del modelo."
),
tokenizer(
"Los algoritmos de cuantificación pueden reducir la cantidad de memoria y energía requerida."
),
tokenizer(
"El aprendizaje profundo se utiliza en una variedad de aplicaciones, desde la medicina hasta el marketing."
),
tokenizer(
"La arquitectura GPT-4 es la base de muchos modelos de lenguaje de última generación."
),
tokenizer(
"El procesamiento del lenguaje natural permite a las máquinas comprender y comunicarse en lenguajes humanos."
),
tokenizer(
"Las redes neuronales convolucionales se utilizan comúnmente en la visión por computadora."
),
tokenizer(
"Los algoritmos de optimización son fundamentales para el entrenamiento de modelos de aprendizaje profundo."
),
tokenizer(
"El aprendizaje por refuerzo es una técnica de aprendizaje automático en la que los agentes aprenden a través de la interacción con su entorno."
)
]

quantize_config = BaseQuantizeConfig(
    bits=4,  # quantize model to 4-bit
    group_size=128,  # it is recommended to set the value to 128
    desc_act=False
)

# load un-quantized model, the model will always be force loaded into cpu
model = AutoGPTQForCausalLM.from_pretrained(pretrained_model_dir, quantize_config)

# quantize model, the examples should be list of dict whose keys contains "input_ids" and "attention_mask"
# with value under torch.LongTensor type.
model.quantize(examples, use_triton=False)

# save quantized model
model.save_quantized(quantized_model_dir)

# save quantized model using safetensors
model.save_quantized(quantized_model_dir, use_safetensors=True)

# load quantized model, currently only support cpu or single gpu
model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0", use_triton=False)

# inference with model.generate
print(tokenizer.decode(model.generate(**tokenizer("auto_gptq is", return_tensors="pt").to("cuda:0"))[0]))

# or you can also use pipeline
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer, device="cuda:0")
print(pipeline("auto-gptq is")[0]["generated_text"])

if name == "main":
import logging

logging.basicConfig(
    format="%(asctime)s %(levelname)s [%(name)s] %(message)s", level=logging.INFO, datefmt="%Y-%m-%d %H:%M:%S"
)

main()

TheBloke

Owner May 19, 2023

Yeah this is a recent bug in AutoGPTQ. I pushed a PR that fixes it: https://github.com/PanQiWei/AutoGPTQ/pull/85

Hopefully it'll be merged into main soon. Or pull my PR and build AutoGPTQ from that for now

RedXeol

May 19, 2023

i get the same error when i want to activate it in text-generation-webui
--autogptq

RedXeol

May 19, 2023

It's strange, I replaced all the files with the modifications and I still get the same error... do you know if I'm doing something wrong?

TheBloke

Owner May 19, 2023

You did rebuild with pip install . ?

I'm going to bed now but if its still a problem let me know and I'll look tomorrow. Do double check that that basic example isn't setting Triton to True

RedXeol

May 19, 2023

sorry, it was my fault, I didn't know about pip install . ... it works now, have a nice night, thank you very much, you are a genius.

TheBloke
/

Manticore-13B-GPTQ

help me with a question in 4bits model

os.makedirs(quantized_model_dir, exist_ok=True)