Gibberish on 'latest', with recent qwopqwop GPTQ/triton and ooba? #2
Hi, I'm sure I'm making an obvious mistake, but hoping someone can point it out to me.
I'm getting gibberish output from this model on the 'latest' branch, the one with act-order.
I'm on qwopqwop GPTQ, triton branch, commit: 05781593c81 (May 8th, most recent commit as of this posting)
and ooba: ab08cf646543c (May 14th, today)
I'm on native Arch linux, not wsl.
Is there something I need to change? Do I need to be on the bleeding edge GPTQ branch called "fastest-inference-4bit" which has the most recent activity?
Thanks. And apologies for being yet another "gibberish output" post :) Really appreciate all the great work you're doing.
Same
USER: What is 4x8?
ASSISTANT: Burgлия Sud Reserve Stockrn Wall TournFD Beauobre tématuMDb husrut Star stickbourgoin respectEventListener Bour Bruno Fourierrn titles BlaConstraint Autor lo Matrixrou conspлияMatrix Fin framern Chart substitutionsko SudMDbлиялияrn BeauMDb Assume BurgлиялиялияAA
Same here... similar output to the above
Ugh, sorry about that. I went back to using the old ooba fork of GfL because if I used the latest version, people can't do CPU offloading. I didn't realise it would result in gibberish with the new fork. So if you're OK going back to https://github.com/oobabooga/GPTQ-for-LLaMa then it will work fine there.
I can also confirm it works great with AutoGPTQ, which you can use easily from Python code (you need to pass strict=False
to .from_quantized()
when loading the model)
And early/preliminary support for AutoGPTQ was just added to text-generation-webui, so you could experiment with it there.
I'd like to say I'll add a '128-latest' version like I used to do. But I'm uploading so many models now that I can't promise I'll get to it.
I can also confirm it works great with AutoGPTQ, which you can use easily from Python code (you need to pass
strict=False
to.from_quantized()
when loading the model)And early/preliminary support for AutoGPTQ was just added to text-generation-webui, so you could experiment with it there.
Anyone got AutoGPTQ worked?
I add a quantize_config.json file
{
"bits": 4,
"desc_act": true,
"true_sequential": true,
"group_size": 128
}
then start with
python server.py --autogptq --model-dir /data --model Wizard-Vicuna-13B-Uncensored-GPTQ_last --listen-host 0.0.0.0 --chat --api --notebook --xformers
But the model is still output gibberish like 'Burgлия '
I can also confirm it works great with AutoGPTQ, which you can use easily from Python code (you need to pass
strict=False
to.from_quantized()
when loading the model)And early/preliminary support for AutoGPTQ was just added to text-generation-webui, so you could experiment with it there.
Anyone got AutoGPTQ worked?
I add a quantize_config.json file
{ "bits": 4, "desc_act": true, "true_sequential": true, "group_size": 128 }
then start with
python server.py --autogptq --model-dir /data --model Wizard-Vicuna-13B-Uncensored-GPTQ_last --listen-host 0.0.0.0 --chat --api --notebook --xformers
But the model is still output gibberish like 'Burgлия '
It should be "desc_act": false
But you don't need to specify that yourself, it's already in the repo
Just download the full contents of this repo and run latest text-gen-ui. You don't even need to specify --autogptq
as that's default now