Receive empty response, regardless of which loader I choose

#2
by anon7463435254 - opened

There is something strange happening (same happen with the new Nous-Hermes): the model is loaded, but any message I send I receive a totally empty response (it does not even show my question). I tried this with AutoGPTQ, ExLLama and GPTQ-for-LLaMA. I'll show you an example:

instruction_template.PNG
model.PNG
params.PNG

After loading the model, if I try to send "Hello", this happens:

chat.PNG
error.PNG

Am I missing something?

Thank you and sorry about bothering you, I hope this can help.

Yeah there's something wrong with your install. Have you updated text-generation-webui to the latest version? Are you using one-click installer or manual installer? If manual install, make sure to git pull on both text-generation-webui and exllama, and to re-do pip3 install -r requirements.txt in text-generation-webui

I did this inside a Colab:

!git clone https://github.com/oobabooga/text-generation-webui
%cd text-generation-webui
!pip install -r requirements.txt

and only If I want to use GPTQ-for-LLaMA I run this:

%mkdir /content/text-generation-webui/repositories/
%cd /content/text-generation-webui/repositories/
!git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
%cd GPTQ-for-LLaMa
!pip install ninja
!pip install -r requirements.txt
!python setup_cuda.py install

And what about exllama? Did you install that? You need to install it before you use it

# in text-generation-webui directory
!mkdir repositories
!git clone https://github.com/turboderp/exllama repositories/exllama
!pip3 install -r repositories/exllama/requirements.txt

Yes, I installed also ExLLama, but the problem I've shown previously it doesn't happen only with it anyway. It happens also with the other loaders, such as AutoGPTQ.

Just to be sure, I replaced my code with the snippet you sent me, and this is the complete code of the Colab I'm running:

image.png

Same as before. The model is loaded (after doing the usual stuff on the model tab and selecting ExLLAMA), but it replies with that empty string.

i always encounter this issue whenever I am not selecting the correct instruction template model. Nouse Hermes is compatible with Alpaca prompting. probably same with your problem.

This comment has been hidden

i always encounter this issue whenever I am not selecting the correct instruction template model. Nouse Hermes is compatible with Alpaca prompting. probably same with your problem.

Just tried with Nous Hermes, using the colab code I put in the previous message and Alpaca instruction template and nothing changes.
I get this error:

image.png

I'm starting to think that I have something wrong with the installation of something.
Could you please provide me the code you use to do the whole process, from the installation to run the models?

i still use this same code in colab google

#@title 2. Install the Web UI & LLM
import os
import shutil
from IPython.display import clear_output
%cd /content/
!apt-get -y install -qq aria2

!git clone https://github.com/oobabooga/text-generation-webui
%cd /content/text-generation-webui

!pip install -r requirements.txt
!pip install -U gradio==3.28.3

!mkdir /content/text-generation-webui/repositories
%cd /content/text-generation-webui/repositories
!git clone -b cuda https://github.com/oobabooga/GPTQ-for-LLaMa.git
%cd GPTQ-for-LLaMa
!python setup_cuda.py install

%cd /content/text-generation-webui/extensions/api
!pip install -r requirements.txt

%cd /content/text-generation-webui
!python server.py --share --chat --api --public-api

just checking it now, it worked with Vicuna and alpaca (both)

image.png

image.png

then tested in prompt

image.png

image.png

Oh, that old CUDA version of GPTQ-for-LLaMA is no longer supported

Please use AutoGPTQ 0.3.1 or exllama. ExLlama is much faster, and that is the recommended option in text-generation-webui.

yes bro. i still haven't update that part. i did not mind because when i get inside the webui, i can find exllama already.

image.png

OK so it is working OK for you with ExLlama? That's fine then!

yes. working fine. thanks.

I found the problem, guys. The problem is in the value of "max_new_tokens". If you want to try, try to set it to 4096 and then send messages. The error will appear.
I also noticed that the greater the value the more distant from the context and confused is the response. This behaviour is clearly visible with values greater enough than 2048 and with models like Nous Hermes, Wizard LM and Dolphin.
It seems that LLaMA2-chat-GPTQ can handle it (meaning that it does not show those empty messages), but if you set an high value, it will forget the context (even the previous response) and send more confused responses.
Let me know if you can try it.

Ahh yeah, I've heard that - that's likely a bug in text-generation-webui or ExLlama I believe. Note this isn't directly related to the context of the model which is set on the Model screen. But yes I was told that if you let it generate more than 2K new tokens, it might not respond or might throw errors.

Sign up or log in to comment