Transformers
GGUF
English
Inference Endpoints

Broken End Token

#1
by LoafyLemon - opened

Unfortunately, the quant uses wrong end tokens, and appends the word assistant to every response.

Check out NousResearch thread for details on how to fix it:

https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct-GGUF/discussions/1

Thanks for the heads-up, unfortunately, that's what the upstream model does right now. I'll probably delete this repo and/or redo it once upstream has fixed theirs (in this case, NousResearch ).

mradermacher changed discussion status to closed

I think the upstream solution is wrong. The end token in this repo is correct, just not for all cases - llama doesn't handle multiple end tokens atm.

deleted

The end token doesn't currently work for GPT4All (keeps freezing w/ 100% CPU), but maybe after the next update.

Well, GPT4All freezing is obviously not a problem with these ggufs, but simply a bug in gpt4all.

deleted

@mradermacher Yeah, when it doesn't get the expected end token it freezes. But it's still a problem in other apps, such as koboldcpp which will keep outputing nonsense. It just doesn't freeze.

That's a known bug in token handling in koboldcpp that has been fixed.

Also, koboldcpp does not simply keep outputting nonsense, even before the fix. It would simply be impossible to configure the stop token.

deleted

When I say nonsense, it was just referring to formatting, theoretical user responses and so on. Didn't mean to imply incoherent words.

deleted

Here's an example of what I'm talking about with koboldcpp. After it answer my question is added the following.

"(Note: The responses should be brief and concise.)
More Information:

If you want more information about this topic or the TV show itself, feel free to ask. I'd be happy to help!assistant"

I'll try download the new version and see if it still happens.

deleted
β€’
edited Apr 20

The latest version of koboldcpp still does it. All GGUFs of Llama 3 without the end token fix do this in every app I tested.

Angus T. Jones played Jake Harper.ert</div></p>ertassistant
<p dir="ltr"

The end token "fix" does not fix anything, it just replaces one bug by another, i.e. it might work in your config, but break in others. This is not a problem with these ggufs, the ggufs are (within the limits of current llama.cpp support) correct. I will not break the model because of bugs in inference engines. If llama 3 multiple end token support improves, I might redo these ggufs.

Sign up or log in to comment