HuggingFaceH4/zephyr-7b-beta · Load chat model directly

Nov 2, 2023

•

edited Nov 2, 2023

Hello H4 team,

Thank you for the great work!
Is there any way to use a chat template when loading the model using AutoTokenizer, and AutoModelForCausalLM?
I was testing the loaded directly, not through the pipeline, but the answers to the questions were totally different when I was testing on the chat demo.

uuguuguu

Nov 7, 2023

For people who want to load it directly using chat template.

# Apply the chat template
with tokenizer.as_target_tokenizer():
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Tokenize the prompt
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

# Generate a response
outputs = model.generate(
    input_ids,
    max_new_tokens=4096,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
    pad_token_id=tokenizer.eos_token_id
)

# Decode the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

# The output will be the generated text, formatted similarly to the example given

Ripeer

Nov 14, 2023

Hi !
Thanks for your work !
Where did you find some informations to do like this ? Is this the best way to use zephyr-7b-Beta with AutoModelForCausalLM ?