QuantFactory/Phi-3-mini-4k-instruct-GGUF · How Are End Tokens Managed?

deleted

Apr 29

No GGUF of Phi-3 works right in GPT4ALL v2.7.4 or Koboldcpp, including the one released by Microsoft. They all keep talking past the end token, usually displaying "<|end|><|assistant|>" one or more times in the response followed by random nonsense.

Are the end tokens set within GGUF files, or are they handled by the app?

In the case of Phi-3 there's now apparently 3 end tokens after Microsoft edited some files. The end tokens are as follows, along with their tags.

"eos_token_id": [
32000,
32001,
32007

"<|endoftext|>"
"<|assistant|>"
"<|end|>"

munish0838

Quant Factory org Apr 29

These quants were created from microsoft's fp16 file, so would have the same issues as the official ones (which we thought would work)
Will update with the llama.cpp latest release for phi-3-4k support

0-hero

Quant Factory org May 5

Here is the updated version QuantFactory/Phi-3-mini-4k-instruct-GGUF-v2

deleted

May 5

@0-hero Thanks.