Load chat model directly
#16
by
uuguuguu
- opened
Hello H4 team,
Thank you for the great work!
Is there any way to use a chat template when loading the model using AutoTokenizer
, and AutoModelForCausalLM
?
I was testing the loaded directly, not through the pipeline, but the answers to the questions were totally different when I was testing on the chat demo.
For people who want to load it directly using chat template.
# Apply the chat template
with tokenizer.as_target_tokenizer():
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Tokenize the prompt
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
# Generate a response
outputs = model.generate(
input_ids,
max_new_tokens=4096,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95,
pad_token_id=tokenizer.eos_token_id
)
# Decode the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
# The output will be the generated text, formatted similarly to the example given
Hi !
Thanks for your work !
Where did you find some informations to do like this ? Is this the best way to use zephyr-7b-Beta with AutoModelForCausalLM ?