May require reconversion due to llama.cpp enhancements

#1
by concedo - opened

The convert-hf-to-gguf.py script was recently updated to support llama 3 pretokenization, that fixed some incorrect regex merges. I believe that may require a reconversion and requantization of all llama 3 models.

https://github.com/ggerganov/llama.cpp/pull/6920

Thank you for bringing this to my attention. I will release an updated and improved version soon with this inluded!

thanks for the ping, holding off until it's recognized by slaren or ggerganov, just to be sure the fix isn't yet another hack that'll need to be fixed lol

@bartowski Yeah lol, I'm doing more tests now with different regex and will continue my thread on llama.cpp with the findings

@bartowski Yeah lol, I'm doing more tests now with different regex and will continue my thread on llama.cpp with the findings

Hi, is there any news about the v2 version?

@huggingfacess There's a huge thread about GGUF and llama.cpp issues linked here, I will see when I can get things in order for a new version, for now I wil have to verify things not working as intended.

It seems that it's not actually a bug with the conversions but instead with the inference tools

Some tools add an additional BOS token which messes with generation

https://github.com/ggerganov/llama.cpp/issues/7062#issuecomment-2100027534

@bartowski Yes, the issue seems to be what I found, that instruct or fine tuned models should be infered with tye system tokens present as they had during inference ,regardless of empty system message string or not. It results in un-expected outputs if removed.

Orenguteng changed discussion status to closed

Sign up or log in to comment