Chat template during finetuning?

by wvangils - opened May 1, 2024

May 1, 2024

Hi, did you use the original Llama-3 chat template while finetuning? The template is now missing from the tokenizer config, so it defaults to ChatML. Using the model to follow an instruction by applying a chat template leads to long inference times. Does this sounds familiar?

JorgeDeC

ReBatch org May 2, 2024

I did not use the Llama-3 chat template, it is trained on the ChatML template.
I don't fully understand the question, applying a chat template should not lead to longer inference time per token? Unless your conversation is very long, the first token can take a bit longer.

wvangils

May 2, 2024

Okey, thanks. I know, this should not be the case. However, I use an instruction to summarize (400 tokens) and I supply context (1000 tokens). On the original Llama-3 8B model inference is done within seconds. When I use the finetuned for Dutch model inference takes quite long, 30sec+. I will take another look at this later today to see if there is something different in the parameters.

JorgeDeC

ReBatch org May 2, 2024

That is weird, the model architecture is exactly the same. Only the weights are different.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment