GGuf Quants need to be redone.

#5
by Nitral-AI - opened
The Chaotic Neutrals org

@Lewdiculous
Not sure if you have seen this yet, but it will affect all L3 models at least.
https://github.com/ggerganov/llama.cpp/pull/6920

The Chaotic Neutrals org

I would not requant now as kube said its still not completely fixed(?)

The Chaotic Neutrals org

Figured id at least let him know it was coming.

Thanks for the heads up. I was made aware of this, I personally will wait for the complete implementation to be merged downstream into KoboldCpp as that's the usual backend of choice and both the conversion and inference parts are needed.

And hopefully the biggest kinks are ironed out:

https://github.com/ggerganov/llama.cpp/issues/6914

Seems I need to use convert-hf-model instead of just convert.py though? Ugh.

Initially I'd re-upload quants for the latest 3-4 most popular models and add a Notice to the ones that are outdated - Until they are also done but at a lower priority.

Poppy 0.7 and SOVL at least can already receive the latest version for those inferring directly on LlamaCpp, with potential issue #6914 in mind.

With this is it still necessary to mess with the tokenizer before conversion?

Edit: Actually looks like not. Hurray.

This is an annoying mess. convert-hf-to-gguf-update.py doesn't seem to be working completely for me. I'll wait.

Basically having the same issue as this comment:

https://github.com/ggerganov/llama.cpp/pull/6920#issuecomment-2083411139

The Chaotic Neutrals org

Appreciate you trying anyways.

Alright looks like github/ggerganov pushed a relevant fix. Conversion can happen now, and we still need to manually replace the config files, but now with theirs.

The issue I have with convert-hf instead of convert.py is that last time I used it, it refused to output an F16 GGUF, instead giving an F32 one, which needed to be converted down so I the imatrix generation doesn't take 50 years. It is working again, thanks gg-man, I'm gonna cry.

I'll replace Poppy 0.7 and SOLV initially.

I'll say the process is not very intuitive now and I think it's gonna be interesting having people find the correct PR and read everything but, well, it is what it is. It's possible to just add a user-input step to the script and do the copying for the user but I'm not feeling that, if anyone wants to PR I'll welcome, otherwise I'll do it manually for the time being.

It seems like there's no need to do all of it agin cause there would be a quick fix for the old quants in koboldcpp

screenshot_1.jpg

koboldcpp upstream branch:
screenshot_2.jpg

I'll do the most popular models - for cases where people are using other backends - and add a disclaimer to the ones that aren't updated if it's a relevant information. If it's handled automatically then we just roll with it.

https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script/discussions/26#6630f8447bab9c55ef7f23f8

Looks like all quants need to be redone since I use Importance Matrix, and that has to be done again.

@Nitral-AI I'm already uploading new Poppy.

Screenshot_2024-04-30-11-09-58-697_org.cromite.cromite.png

The Chaotic Neutrals org

image.png

Nitral-AI changed discussion status to closed

Sign up or log in to comment