I suck at making this work for llama

#10
by ABX-AI - opened

I've had some issues making the script work with llama, on one level because of the different vocab type, but I managed to do it with just direct llama.cpp commands and getting an fp32 out, and then making an f16 out of that.

However, I can't make the imatrix.dat generation work, as if there is a mismatch somewhere in the config. I'm probably going to continue trying to work it out, but if you have any tips on how to use this with llama3 let me know.

Yeah, so I haven't tried with Llamma-3 yet just now I was still making some changes to the script to accommodate some of your suggestions and other low hanging fruits. Um...

We wait for this issue then (for the script part, and I'll see if a new check has to be added once things settle down about Llama-3):
https://github.com/ggerganov/llama.cpp/issues/6690#issuecomment-2065278517

This is super easy to fix lmao: --vocab-type bpe @Lewdiculous @ABX-AI
Also made some changes to config.json and the generation _config. https://files.catbox.moe/u35p33.rar

Yeah I saw that part but...
The thing is then it'd need to be detected and I don't wanna have to check for this, might just add an alternative for Llama-3 for now then until this is addressed or not addressed.

Thanks for files.

@ABX-AI Use his provided config files.

I want to be like you when I grow up.

Also made some changes to config.json and the generation _config. https://files.catbox.moe/u35p33.rar

@Nitral-AI Can I host your files in the repo as a fallback?

@Lewdiculous Absolutely my dude!

Thanks mates.

FantasiaFoundry changed discussion status to closed

Dont know if you have these.

Instruct and context presets for ST here: https://files.catbox.moe/lkclc9.rar

FantasiaFoundry changed discussion status to open

Thanks. Still got some things to iron out.

No problem, decided to be a little more open out of the gate with my findings to help get some nice llama 3 finetunes out as quickly as possible. :) Now I'm back to working on giving you guys another unhinged banger. :)

Looks okay now... Surely...

For Llama-3 models, at the moment, you have to use gguf-imat-llama-3.py and replace the config files with the ones in the llama-3-config-files folder for properly quanting and generating the imatrix data.

FantasiaFoundry changed discussion status to closed

Thanks :) I already did use the vocab-type option @Nitral-AI , otherwise it would have been impossible to get the f32 and f16 out at all, which I did, but then I guess getting the imat still didn't work and that was my issue. Thanks for the updates, I'll try the new script out <3

Sign up or log in to comment