Token leak?

by Varkoyote - opened Dec 23, 2024

Dec 23, 2024

Hello! Been trying this model, but sometimes the assistant's responses end with "<|assistant", am I doing something wrong? I'm also a bit confused with which instruct template I should use with this model... thanks!

MoonRide

Dec 23, 2024

@Varkoyote There were fixes in llama.cpp for Falcon3 added recently, so you might want to try current (b4376, or newer) version of llama.cpp and GGUFs made with it - like the ones from bartowski (created few hours ago).

Varkoyote

Dec 23, 2024

•

edited Dec 23, 2024

I've been using this version, but I'll try the new GGUFs! Also, which instruct template should I use? It's not specified anywhere...

MoonRide

Dec 23, 2024

Simplified template would be something like that:

<|system|>
{system_prompt}
<|user|>
{prompt}
<|assistant|>

which should be applied by default when using current version of llama.cpp.

I launched it without any customizations, just llama-server.exe -ngl 99 -m Falcon3-10B-Instruct-Q6_K.gguf -c 16384, and it seems to be working:

Varkoyote

Dec 24, 2024

•

edited Dec 24, 2024

It works! No more token errors and such, pretty nice. However I find it quickly less smart and creative than NeMo sadly :( At least for my uses.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment