Transformers
GGUF
mistral
text-generation-inference

Weird issues with the local model, but not with the online demo

#2
by sleepyjoecheated - opened

I initially noticed that the model keeps switching to Spanish and sometimes after that to other languages. Here's one example conversation:
me: hello there my ai brother
ai: Hello! It's nice to meet you too. How can I help you today?
me: what are you
ai: ¡Hola! Soy un modelo de lenguaje creado por OpenAI llamado GPT-4 (Generative Pre-trained Transformer 4). Estoy diseñado para comprender y responder preguntas en una amplia variedad de temas, así como para generar texto coherente y relevante basado en la información proporcionada. ¿En qué puedo ayudarte hoy?
me: stop speaking spanish
ai: 好的,我会用英语回复您。请问有什么我可以帮助您的吗?[this translates to: OK, I will reply to you in English. Is there anything I can do to help you?]

Then I realized it's not just a language issue, the local model subjectively feels a lot dumber in every way than the online demo. One conversation, it began adding an emoji in front of each response and it would tell me that it'll stop while it kept doing it. Another chat, it wrote its first response as an ordered list and from then on every single answer was in a list. Even when I said thank you, it had to make it a list:

  1. You're welcome! I'm glad I could help clarify some information for you. Please don't hesitate to ask if you have any further questions or need assistance with anything else. Have a great day!

And when I said thank you and please stop making everything into a list:

  1. No problem! I apologize if my previous responses were too list-oriented. My intention is to help and inform you accurately, but I understand that some people prefer different formats.
  2. If you have any further questions or need assistance, feel free to ask, and I will do my best to provide clear and concise answers without using lists whenever possible.

I'm using the Q6_K quant. Does anyone know what's going on?

They are recommending here https://huggingface.co/openchat/openchat_3.5 to use their package, not sure if it has a different implementation or what

Another weird thing is that I always get an empty string reaponse on Q8 version (evwn when I use a grammar). Does anyone else has this issue?

Your prompt format seems to be wrong. Try following the prompt format Thebloke provides.

Another weird thing is that I always get an empty string reaponse on Q8 version (evwn when I use a grammar). Does anyone else has this issue?

Have this been solved?

Sign up or log in to comment