EOS token ID changed for unquantized version
See https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct/discussions/49/files
The EOS token oversight is now being fixed, but the GGUF files here still have the old ID. Would be great if this gets fixed.
With the old ID, the models kept generating until they hit the token limit.
It's really not an issue if the library you are using has stop strings set to <|eot_id|>
and it works without any issue. (Ollama, LM Studio, etc.`)
If what you use to serve doesn't support terminators/stop_strings, you can edit the GGUF metadata tokenizer.eos_token_id to 128009
yourself easily.
PS: https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF/discussions/7
I'll do that, then.
Just seeing that 300k+ people downloaded these model files as the most convenient way of using Llama3, and many users do not know how to pull off that edit. Would save them the hassle. ;)
Makes sense, I'll see if I can re-upload the edited files this weekend :)
Great, thanks for the reuploads! ...I think you missed the Q8, though. ;)
You are totally right, uploading it now :)