Model throws gibberish instead of actual response.

#10
by Fimbul - opened

When I'm trying to use this model with oogabooga web ui I'm getting this kind of responses and I don't know why:

Input:
introduce yourself
Output:
/_mysinside phys chairphys AlcUSTontmymoGP�≠ monuments _ _alu _ _concurrent jsf preced///_mysmysmysmys _ fsmys/_mysmys _mys _ _ _ _ _ _ / phys phys/ phys _ mys _mysmys _leepдра/ Phys/_mysmys/_mys _ _mysmys précéd _mysextend _mys _ _mysmys _ _ _ _ _ _Physmys _mysmys _mysmysmys _ Alcmysmys _ _ Alc _ AlcWF Alc _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Alc Alc _g _ _ Alc _ Alc _ _ _ Alc Alc _ _ _ Alc Alc Alc _ _ Alc Alc _ Alc Alc Alc Alc Alc _ _ _ _ _ _ _ _ Alc _o Alc _mymymy _ _ _ _ _ _ _mymymymymymymymymymymymy _PR _ont _ontmyontmymyont Alc

image.png

Please delete the file ending latest.act-order.safetensor and load file compat.no-act-order.safetensor instead

@TheBloke Been using your quantized model and it works great. Any chance you'll be making a quantized version for the GPT4All-13B-snoozy?

Oh sure, happy to. I kept checking the Nomic repo around the time they first released it, but it was never uploaded to HF. But I see it has been now.

I'm starting the process now!

Having the same problem but can't find the files you specified. Where should I look and/or download?

Hey Bloke, I tried with both 4 bit quantised 7B and 13B .safetensors models.
The final output looks gibberish. Can u pls let me know what am i missing in the below code?

' ' '
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

quantized_model_dir = "/content/drive/MyDrive/Vicuna/FastChat/models/TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g_actorder"
model_basename = "/content/drive/MyDrive/Vicuna/FastChat/models/TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g_actorder/vicuna-7B-1.1-GPTQ-4bit-128g"

use_triton = False
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True)
quantize_config = BaseQuantizeConfig(
bits=4,
group_size=128,
desc_act=False
)

model = AutoGPTQForCausalLM.from_quantized( quantized_model_dir,
use_safetensors=True,
model_basename=model_basename,
device="cuda:0",
use_triton=use_triton,
quantize_config=quantize_config
)

prompt = """ """

inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
tokens = model.generate(
**inputs,
max_new_tokens=2000,
do_sample=True,
temperature=1.0,
top_p=1.0,

truncation=True

)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

' ' '

There was a bug in AutoGPTQ 0.3.0 that causes gibberish when you use a model with both group_size and desc_act.

It can be fixed by updating to AutoGPTQ 0.3.1 or 0.3.2. I recommend to build from source at the moment due to some issues people are having installing from PyPi:

pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip3 install .

Sign up or log in to comment