Model throws gibberish instead of actual response.

#10

by Fimbul - opened Apr 29, 2023

Apr 29, 2023

When I'm trying to use this model with oogabooga web ui I'm getting this kind of responses and I don't know why:

Input:
introduce yourself
Output:
/_mysinside phys chairphys AlcUSTontmymoGP�≠ monuments _ _alu _ _concurrent jsf preced///_mysmysmysmys _ fsmys/_mysmys _mys _ _ _ _ _ _ / phys phys/ phys _ mys _mysmys _leepдра/ Phys/_mysmys/_mys _ _mysmys précéd _mysextend _mys _ _mysmys _ _ _ _ _ _Physmys _mysmys _mysmysmys _ Alcmysmys _ _ Alc _ AlcWF Alc _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Alc Alc _g _ _ Alc _ Alc _ _ _ Alc Alc _ _ _ Alc Alc Alc _ _ Alc Alc _ Alc Alc Alc Alc Alc _ _ _ _ _ _ _ _ Alc _o Alc _mymymy _ _ _ _ _ _ _mymymymymymymymymymymymy _PR _ont _ontmyontmymyont Alc

TheBloke

Owner Apr 29, 2023

Please delete the file ending latest.act-order.safetensor and load file compat.no-act-order.safetensor instead

kusoge

May 5, 2023

@TheBloke Been using your quantized model and it works great. Any chance you'll be making a quantized version for the GPT4All-13B-snoozy?

TheBloke

Owner May 5, 2023

•

edited May 5, 2023

Oh sure, happy to. I kept checking the Nomic repo around the time they first released it, but it was never uploaded to HF. But I see it has been now.

I'm starting the process now!

TheBloke

Owner May 5, 2023

Forgot to come back here and say, it's done!

BooBoo992001

May 7, 2023

Having the same problem but can't find the files you specified. Where should I look and/or download?

Gowthamkrishnan

Jul 27, 2023

Hey Bloke, I tried with both 4 bit quantised 7B and 13B .safetensors models.
The final output looks gibberish. Can u pls let me know what am i missing in the below code?

' ' '
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

quantized_model_dir = "/content/drive/MyDrive/Vicuna/FastChat/models/TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g_actorder"
model_basename = "/content/drive/MyDrive/Vicuna/FastChat/models/TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g_actorder/vicuna-7B-1.1-GPTQ-4bit-128g"

use_triton = False
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True)
quantize_config = BaseQuantizeConfig(
bits=4,
group_size=128,
desc_act=False
)

model = AutoGPTQForCausalLM.from_quantized( quantized_model_dir,
use_safetensors=True,
model_basename=model_basename,
device="cuda:0",
use_triton=use_triton,
quantize_config=quantize_config
)

prompt = """ """

inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
tokens = model.generate(
**inputs,
max_new_tokens=2000,
do_sample=True,
temperature=1.0,
top_p=1.0,

truncation=True

)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

' ' '

TheBloke

Owner Jul 27, 2023

There was a bug in AutoGPTQ 0.3.0 that causes gibberish when you use a model with both group_size and desc_act.

It can be fixed by updating to AutoGPTQ 0.3.1 or 0.3.2. I recommend to build from source at the moment due to some issues people are having installing from PyPi:

pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip3 install .

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment