20K context on this AWQ
I haven't tried the original model, but this awq failed to follow instructions, ask it to looking for "private data" in ~20K token input, reply with "thanks... ". Tried older Celeste awq, it works fine.
I am not sure if this is an issue with the Quant or the original model. I usually don't care what the model does, if my inference testing / validation is passing.
I was able to get great output, using the example in this repos README.md.
Do you have a method I could use, to reproduce the error you have discovered?
If it is a problem with the quant, I could improve my validation efforts, otherwise, it would be an issue outside of my ability to control.
I tried 3 model, twinllama dpo 3, this model, and Celeste 1.5.
Well. Only this model didn't pass the test.
Test that I use is something like this
"Find private information in this block of text that shouldn't be put in the text."
And then put something like name, birthday, and other thing. In random places in the text.
I use dummy text generator, then I paste it repeatedly to get ~20K token or ~18k words
I just built AutoAWQ_kernels from source, took forever.
I will rename this repo, and then reprocess it.
This is the basic idea, to unpin / unlock the torch and transformers versions in the AutoAWQ project: https://github.com/SolidRusT/srt-model-quantizing/blob/main/awq/create_virtualenv.sh
All the machines are busy right now, but this will be the next quant in the queue.
Can you quant roleplay Hermes?, that model seems interesting
Absolutely! now that the pipeline is working, and stable, I am LFQ hard core. I want to hit 500 models today! ^^
I'll queue up anything realted to Hermes, that I haven't quanted.
if you have any specific models, I cam prioritize them
I have 2 machines dedicated to AWQ right now and they take about 20mins or so each
This one is back in the AWQ oven: https://huggingface.co/solidrust/Hermes-3-Llama-3.1-8B-lorablated-AWQ
The author say this merge better than before so, I guess I'll place it here.
Thanks for your work
kromeurus/L3.1-Siithamo-v0.4-8B
In the second AWQ oven: https://huggingface.co/solidrust/L3.1-Siithamo-v0.4-8B-AWQ
2/32 [01:15<18:54, 37.81s/it]
https://huggingface.co/solidrust/Hermes-3-Llama-3.1-8B-lorablated-AWQ is ready for evaluation.
closing this thread.