Question regarding added tokens vs llama base

#7
by vince62s - opened

Hello,
I have some questions regarding the 7 added tokens.
Are embeddings learned at finetuning time or this just a "pre/post" processing usage ?
Also can you clarify the meaning of those:



Unbabel org

Hey there,
The added tokens are there for flexibility if you want to fine-tune it for some specific use-case (e.g., MASK, CLS tokens). We only explicitly used at all times during the SFT the <|im_end|> (redefined as the eos token) and the <|im_start|> tokens.
They are learned at finetuning time.

nunonmg changed discussion status to closed

Sign up or log in to comment