On the model card, you have written "Please use the exact chat template provided by Llama-3 instruct version. Otherwise there will be a degradation in the performance."
I tried using apply_chat_template
, but I get a different result depending on whether I use the Llama 3 Instruct tokenizer or the OpenBioLLM tokenizer:
model_id = "aaditya/Llama3-OpenBioLLM-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
},
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
#Try encoding with apply chat template and then decode it to see what it is supposed to look like:
token_inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
return_tensors="pt",
add_generation_prompt=True
)
decoded_inputs = tokenizer.decode(token_inputs[0], skip_special_tokens=False)
print(decoded_inputs)
For OpenBioLLM: '<|im_start|>system\nYou are a friendly chatbot who always responds in the style of a pirate<|im_end|>\n<|im_start|>user\nHow many helicopters can a human eat in one sitting?<|im_end|>\n<|im_start|>assistant\n'
For Llama 3 Instruct: '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a friendly chatbot who always responds in the style of a pirate<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow many helicopters can a human eat in one sitting?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'
Any insight on this?