Does the model response correctly

#24
by mnwato - opened

I used sample example:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name_or_path = 'TheBloke/Llama-2-7B-chat-GPTQ'

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

Then generate outpu using:

prompt = "Tell me about AI"
prompt_template=f'''[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]

'''

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

But always getting string like bellow:

strijonasce Socquet fundătquetătритоsceDom Fichescequetscescesceleescescegemeindescesce fundrettoonasceритоsce Intentsceleescesce historiquessce ens Socscescescesceuyonaquetsce Dom Fundsce MortuyDomonascescesce Intent Fichavelsceonaonascesce historiquesscesce Intentритоsce Intentleeритоscesceavelsce DomритоDom IntentscegemeindeătquetscegemeindeprintStackTrace historiques historiques Fichsceритоscesce fund ensscesce Fundscestrijăt enssce fund Societyscesce Fichlee fundsceрито IntentDomsceDomscesceătengoDomscesce SocietyscesceDomscescesce fundscesce rat 

Hello, you should consider using AutoGPTQForCausalLM. Instead of AutoModelForCausalLM. Indead when quantizing the model entry, and layer are different, information are coded in different number of bit, might explain your problem.
@mnwato

from auto_gptq import AutoGPTQForCausalLM
model = AutoGPTQForCausalLM.from_quantized(
                    semodel_id,
                    use_safetensors=True,
                    trust_remote_code=True,
                    device=device,
                    use_triton=False,
                    quantize_config=None,
                )

Sign up or log in to comment