GGuf Quants need to be redone.

by Nitral-AI - opened Apr 29, 2024

Owner Apr 29, 2024

@Lewdiculous
Not sure if you have seen this yet, but it will affect all L3 models at least.
https://github.com/ggerganov/llama.cpp/pull/6920

IkariDev

Apr 29, 2024

I would not requant now as kube said its still not completely fixed(?)

Nitral-AI

Owner Apr 29, 2024

Figured id at least let him know it was coming.

Lewdiculous

Apr 29, 2024

•

edited Apr 29, 2024

Thanks for the heads up. I was made aware of this, I personally will wait for the complete implementation to be merged downstream into KoboldCpp as that's the usual backend of choice and both the conversion and inference parts are needed.

And hopefully the biggest kinks are ironed out:

https://github.com/ggerganov/llama.cpp/issues/6914

Seems I need to use convert-hf-model instead of just convert.py though? Ugh.

Lewdiculous

Apr 29, 2024

•

edited Apr 29, 2024

Initially I'd re-upload quants for the latest 3-4 most popular models and add a Notice to the ones that are outdated - Until they are also done but at a lower priority.

Lewdiculous

Apr 29, 2024

•

edited Apr 29, 2024

Poppy 0.7 and SOVL at least can already receive the latest version for those inferring directly on LlamaCpp, with potential issue #6914 in mind.

With this is it still necessary to mess with the tokenizer before conversion?

Edit: Actually looks like not. Hurray.

Lewdiculous

Apr 30, 2024

•

edited Apr 30, 2024

This is an annoying mess. convert-hf-to-gguf-update.py doesn't seem to be working completely for me. I'll wait.

Lewdiculous

Apr 30, 2024

Basically having the same issue as this comment:

https://github.com/ggerganov/llama.cpp/pull/6920#issuecomment-2083411139

Nitral-AI

Owner Apr 30, 2024

Appreciate you trying anyways.

Lewdiculous

Apr 30, 2024

•

edited Apr 30, 2024

Alright looks like github/ggerganov pushed a relevant fix. Conversion can happen now, and we still need to manually replace the config files, but now with theirs.

The issue I have with convert-hf instead of convert.py is that last time I used it, it refused to output an F16 GGUF, instead giving an F32 one, which needed to be converted down so I the imatrix generation doesn't take 50 years. It is working again, thanks gg-man, I'm gonna cry.

I'll replace Poppy 0.7 and SOLV initially.

Lewdiculous

Apr 30, 2024

•

edited Apr 30, 2024

I'll say the process is not very intuitive now and I think it's gonna be interesting having people find the correct PR and read everything but, well, it is what it is. It's possible to just add a user-input step to the script and do the copying for the user but I'm not feeling that, if anyone wants to PR I'll welcome, otherwise I'll do it manually for the time being.

xxx777xxxASD

Apr 30, 2024

•

edited Apr 30, 2024

It seems like there's no need to do all of it agin cause there would be a quick fix for the old quants in koboldcpp

koboldcpp upstream branch:

Lewdiculous

Apr 30, 2024

•

edited Apr 30, 2024

I'll do the most popular models - for cases where people are using other backends - and add a disclaimer to the ones that aren't updated if it's a relevant information. If it's handled automatically then we just roll with it.

Lewdiculous

Apr 30, 2024

•

edited Apr 30, 2024

https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script/discussions/26#6630f8447bab9c55ef7f23f8

Looks like all quants need to be redone since I use Importance Matrix, and that has to be done again.

@Nitral-AI I'm already uploading new Poppy.

Nitral-AI

Owner Apr 30, 2024

Nitral-AI changed discussion status to closed Apr 30, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment