Token leak?
Hello! Been trying this model, but sometimes the assistant's responses end with "<|assistant", am I doing something wrong? I'm also a bit confused with which instruct template I should use with this model... thanks!
@Varkoyote There were fixes in llama.cpp for Falcon3 added recently, so you might want to try current (b4376, or newer) version of llama.cpp and GGUFs made with it - like the ones from bartowski (created few hours ago).
I've been using this version, but I'll try the new GGUFs! Also, which instruct template should I use? It's not specified anywhere...
Simplified template would be something like that:
<|system|>
{system_prompt}
<|user|>
{prompt}
<|assistant|>
which should be applied by default when using current version of llama.cpp.
I launched it without any customizations, just llama-server.exe -ngl 99 -m Falcon3-10B-Instruct-Q6_K.gguf -c 16384
, and it seems to be working:
It works! No more token errors and such, pretty nice. However I find it quickly less smart and creative than NeMo sadly :( At least for my uses.