How Are End Tokens Managed?

#2
by deleted - opened
deleted

No GGUF of Phi-3 works right in GPT4ALL v2.7.4 or Koboldcpp, including the one released by Microsoft. They all keep talking past the end token, usually displaying "<|end|><|assistant|>" one or more times in the response followed by random nonsense.

Are the end tokens set within GGUF files, or are they handled by the app?

In the case of Phi-3 there's now apparently 3 end tokens after Microsoft edited some files. The end tokens are as follows, along with their tags.

"eos_token_id": [
32000,
32001,
32007

"<|endoftext|>"
"<|assistant|>"
"<|end|>"
Quant Factory org

These quants were created from microsoft's fp16 file, so would have the same issues as the official ones (which we thought would work)
Will update with the llama.cpp latest release for phi-3-4k support

Quant Factory org

Here is the updated version QuantFactory/Phi-3-mini-4k-instruct-GGUF-v2

deleted

@0-hero Thanks.

Sign up or log in to comment