Slower and inconsistent

#2
by webslug - opened

I have been using the GGUF CPU version of Dolphin and works extremely well. I was under the impression this AWQ variant would be faster. I have GTX 3060 and downloaded this model for Oobabooga.

This model seems to be inconsistent, sometimes responses take several minutes to appear and on occasions the responses are limited to just one word. Nothing has changed in my setup apart from the model.

Am I doing something wrong, I was under the impression AWQ models were faster.

Sign up or log in to comment