empty responses from LLM

#1
by rodrigofarias - opened

Regarding this post:
https://github.com/abacaj/mpt-30B-inference/issues/5

"I'm having the same problem. Processing goes to 100% for a few seconds but returns empty answers. It goes around 24Gb of RAM usage.
I tested in VScode and in cmd. Same behavior.
Ive tried to debug, but the "generator" variable had no kind of string text inside it.

I'm running mpt-30b-chat.ggmlv0.q5_1.bin model instead of default q4_0.

PC: Ryzen 5900X and 32 Gb RAM."

I still have the empty responses, using your implementation with gradio. Any guess on why this happens?

Great work, thanks in advance!

I really don't have any idea.

But I started a q5_1 at https://huggingface.co/spaces/mikeee/mpt-30b-chat-gglm-5bit

It seems to be running alright.

Sign up or log in to comment