Token leak?

#3
by Varkoyote - opened

Hello! Been trying this model, but sometimes the assistant's responses end with "<|assistant", am I doing something wrong? I'm also a bit confused with which instruct template I should use with this model... thanks!

@Varkoyote There were fixes in llama.cpp for Falcon3 added recently, so you might want to try current (b4376, or newer) version of llama.cpp and GGUFs made with it - like the ones from bartowski (created few hours ago).

I've been using this version, but I'll try the new GGUFs! Also, which instruct template should I use? It's not specified anywhere...

Simplified template would be something like that:

<|system|>
{system_prompt}
<|user|>
{prompt}
<|assistant|>

which should be applied by default when using current version of llama.cpp.

I launched it without any customizations, just llama-server.exe -ngl 99 -m Falcon3-10B-Instruct-Q6_K.gguf -c 16384, and it seems to be working:
image.png

It works! No more token errors and such, pretty nice. However I find it quickly less smart and creative than NeMo sadly :( At least for my uses.

Sign up or log in to comment