generation is not good

#2
by jmjzz - opened

I think this 4bit version is not working well as the generation contains too many random or garbage tokens.

I think this 4bit version is not working well as the generation contains too many random or garbage tokens.

@jmjzz
Can you please share what script you used to run the model? With the script provided in this repo, I am getting "Runtime error: LayerNormKernelImpl not implemented for Half".

I think this 4bit version is not working well as the generation contains too many random or garbage tokens.

Can you give us an example of which input you used ?

The generation for the base prompt looks good to me @jmjzz :

image.png

I see. I'm also using the base prompt given in the DBRX huggingface page. Did you make any modifications?

This is the exact script i used @jmjzz :

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("PrunaAI/dbrx-instruct-bnb-4bit", trust_remote_code=True, token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("PrunaAI/dbrx-instruct-bnb-4bit", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, token="hf_YOUR_TOKEN")

input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))

@johnrachwanpruna I see, the generation looks good right now. But loading the model takes like 30 minutes, which is significantly slower than loading Mixtral 7B*8.

@jmjzz For me running the code snippet i showed you takes only 30 seconds

image.png

@johnrachwanpruna Thanks, I think I solved the problem. BTW, I feel the 4bit DBRX is weaker than the default Mixtral 7B*8 after running some evaluations. Have you tried to evaluate it on any benchmarks?

Pruna AI org

We did not try to benchmark the quantized models at the moment.

johnrachwanpruna changed discussion status to closed

Sign up or log in to comment