Multi GPU splitting issue.

#1
by Hardcore7651 - opened

Exl2 currently has an issue with some models. when the quantization_config is present in config.json it causes GPU splitting to not function.

https://github.com/oobabooga/text-generation-webui/issues/5707

I would like to request that you remove the quantization_config details from your quants in order to fix this. Thanks.

Huh, I did not encounter that as I use TabbyAPI myself. You can easily remove the quantization config Json block in config.json without noticeable adverse effect. I recommend doing that upon downloading in the meantime. I don't think removing information from the model config itself due to a temporary bug for a specific API is the way to go.
That looks like a bug inside TextGenerationWebUI, and should be fixed given some time. Feel free to keep this post open for people to see. And thanks for the report :)

Same solution for a different problem within TextGenerationWebUI.
quantization_config is forcing its context to only function in ctx 2048, anything above that is causing to give the known bug Total sequence length exceeds cache size in model.forward

Removing quantization_config seems to fix a lot of issues...

Sign up or log in to comment