chatbot giving weird responses

#6
by hammad93 - opened

I tried to start a chatbot with multiple versions of this model using the llama.cpp server the mixtral support version on a kaggle notebook. the script runs ok and there is enough memory and vram for this version of the model on the 2X T4 GPU notebook but for some reason the chat bot responses are really weird, it keeps getting stuck on the same responses no matter what prompt I use, and it doesn't seem to understand the prompts. i'm not sure what's causing the weird responses. Also I'm using the default values for the parameters. I tried changing the temperature multiple times but it doesn't affect the responses.

Kaggle notebook repo link: https://github.com/mth93/mixtral_llama_cpp/blob/main/llama-cpp-ngrok-api-mixtral.ipynb

Versions:
mixtral-8x7b-instruct-v0.1.Q2_K.gguf
mixtral-8x7b-instruct-v0.1.Q3_K_M.gguf
mixtral-8x7b-instruct-v0.1.Q4_0.gguf
mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf
i tried using the other files but due to the model size there is not enough vram on kaggle notebooks free version.

I'm a beginner at AI and using LLMs and I don't understand the difference between the different quantized versions and how that affects performance, If anyone can explain that or provide resources that would be really appreciated.

PS: please let me know if you need me to provide more details or clear out anything as English isn't my first language.

Thanks,

Use llamacpp version .
It works on CPU and GPU even partially on GPU

@mirek190 i'm already using the mixtral branch on the llamacpp github repo and it work. the problem is with the responses. i'll add some screenshots on the parameter values and the responses

Have you used a proper template ?
And what quantitation did you use ?
I'm testing minimum q4k_m and also chat version is shitty. Try instruct version which is much better .

@mirek190 I think that might be the problem. i'm trying to use it as a regular chatbot with no template and the responses are really shitty and awkward. where can i find a proper template for mixtral.
i'm currently trying out this version: mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf

thanks a lot for helping out man its really appreciated.

Under llamacpp for that model template is

--in-prefix " <s>[INST] " --in-suffix " [/INST] "

add this to command line

@mirek190 how to use the same options with server instead of main?

Server isn't working yet. I just pulled latest from branch mixtral and tried the instruct model with main and server. Main seems to work fine.

@morph3v5 i tested out main too it works fine. the problem is main doesn't serve an OAI compatible api as far as i know. ill post back if i find a solution for this

...or just wait till llamacpp get official release ;)

@mirek190 most likely that's what's going to happen. I'm just way too excited to try this model with autogen πŸ˜‚πŸ˜‚

@morph3v5 i tested out main too it works fine. the problem is main doesn't serve an OAI compatible api as far as i know. ill post back if i find a solution for this

I've run this with the server from https://github.com/ggerganov/llama.cpp/pull/4406 (mixtral branch) chatting through the built in interface. Excellent results. (I've tried temps of 0.2 and 0.7).
There's also an OAI api example in examples/server/api_like_OAI.py

@neph1 how did you add the prompt template to the server?

is it possible to get llama.cpp to work on TPU instead of GPU

@hammad93

I just built it, and it worked.

@neph1 what's the difference between this pull request and the already merged mixtral branch?

@hammad93
I don't think there's been a merge to master, yet. If you're talking about https://github.com/ggerganov/llama.cpp/pull/4428, it was merged to the "main" mixtral branch.

Edit: But now the mixtral branch has been merged to master.

@mirek190
How can i add this ---> --in-prefix " [INST] " --in-suffix " [/INST] " to the server of llamacpp? i get a error, only works with main.

Hmm iam confused on the readme page from TheBloke the Template is [INST] {prompt} [/INST] yours is " [INST] {prompt} "[/INST]" witch one is the correct one?

WTF? whats up with the line over the text?

Read again the main page to this message model looking for llamacpp ... you can add this after -p parameter.

@mirek190
Thank you!

Sign up or log in to comment