Broken End Token

by LoafyLemon - opened Apr 19, 2024

Discussion

LoafyLemon

Apr 19, 2024

Unfortunately, the quant uses wrong end tokens, and appends the word assistant to every response.

Check out NousResearch thread for details on how to fix it:

https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct-GGUF/discussions/1

mradermacher

Owner Apr 19, 2024

Thanks for the heads-up, unfortunately, that's what the upstream model does right now. I'll probably delete this repo and/or redo it once upstream has fixed theirs (in this case, NousResearch ).

mradermacher changed discussion status to closed Apr 19, 2024

mradermacher

Owner Apr 19, 2024

I think the upstream solution is wrong. The end token in this repo is correct, just not for all cases - llama doesn't handle multiple end tokens atm.

deleted

Apr 20, 2024

The end token doesn't currently work for GPT4All (keeps freezing w/ 100% CPU), but maybe after the next update.

mradermacher

Owner Apr 20, 2024

Well, GPT4All freezing is obviously not a problem with these ggufs, but simply a bug in gpt4all.

deleted

Apr 20, 2024

@mradermacher Yeah, when it doesn't get the expected end token it freezes. But it's still a problem in other apps, such as koboldcpp which will keep outputing nonsense. It just doesn't freeze.

mradermacher

Owner Apr 20, 2024

That's a known bug in token handling in koboldcpp that has been fixed.

mradermacher

Owner Apr 20, 2024

Also, koboldcpp does not simply keep outputting nonsense, even before the fix. It would simply be impossible to configure the stop token.

deleted

Apr 20, 2024

When I say nonsense, it was just referring to formatting, theoretical user responses and so on. Didn't mean to imply incoherent words.

deleted

Apr 20, 2024

Here's an example of what I'm talking about with koboldcpp. After it answer my question is added the following.

"(Note: The responses should be brief and concise.)
More Information:

If you want more information about this topic or the TV show itself, feel free to ask. I'd be happy to help!assistant"

I'll try download the new version and see if it still happens.

deleted

Apr 20, 2024

•

edited Apr 20, 2024

The latest version of koboldcpp still does it. All GGUFs of Llama 3 without the end token fix do this in every app I tested.

Angus T. Jones played Jake Harper.ert</div></p>ertassistant
<p dir="ltr"

mradermacher

Owner Apr 20, 2024

The end token "fix" does not fix anything, it just replaces one bug by another, i.e. it might work in your config, but break in others. This is not a problem with these ggufs, the ggufs are (within the limits of current llama.cpp support) correct. I will not break the model because of bugs in inference engines. If llama 3 multiple end token support improves, I might redo these ggufs.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment