Is <BOS_TOKEN> really needed?

#22
by ChuckMcSneed - opened

Do various frontends insert automatically or is adding it manually really needed?

It feels like when I add it manually in koboldcpp, I get slightly worse results, though it may just be confirmation bias.

The tokenizer is defined with adding the BOS alone. Depending on the tools and how they used the tokenizer it might be added for you unfortunately.
Using transformers:

inputs = tokenizer("This is a test") # Adds BOS automatically
inputs = tokenizer("This is a test", add_special_tokens=False) # Does not.

It would also be nice if cohere actually added the chat template which should be used with this model...

Do various frontends insert automatically or is adding it manually really needed?

llama.cpp and Ollama should add it as it's defined in the GGUF file:

https://github.com/ollama/ollama/issues/3650

Sadly the official Ollama template is messed up and has an extra <|END_OF_TURN_TOKEN|> added on the end.

It should be:

TEMPLATE """{{if .System}}<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{.System}}<|END_OF_TURN_TOKEN|>{{end}}<|START_OF_TURN_TOKEN|><|USER_TOKEN|>{{.Prompt}}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{.Response}}"""

I nearly deleted this model because it was so bad at coding, and its 1-shot stories kinda sucked. But oh boy is it good if you guide it!!! :O

Goliath-120B was definitely better at 1-shot Grimdark, but always got too distracted and wanted to carry on writing and not refine things, but this model seems absolutely amazing at refining its initial attempt.

It's the first model where I've actually been able to say "let's try to merge the best bits, XYZ, of your last draft(s) with this latest draft" and it actually listened...

Sign up or log in to comment