Anyone else having terrible perfomance with this model on web ui?

#1
by RebornZA - opened

...went from 4-5 tokens/sec on other 13b models to like 0.10 tokens/sec with this model.

Try using this one: https://huggingface.co/TheBloke/wizard-vicuna-13B-GPTQ

I only uploaded this one because none of TheBloke's quantizations usually work on the version of GPTQ that is used by the Occam fork of KoboldAI. I only tested this with KoboldAI, where I get my normal 20-25 tokens per second on my 3090. Try using TheBloke's quantization, since his are tested with ooba.

Thanks so much that fixed it <3

RebornZA changed discussion status to closed

Sign up or log in to comment