Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision

Not remembering previous prompts and hallucinating

#2
by CPhoenix - opened

I'm not sure if its something I've not configured correctly although I have no problems with other models, Wizard-Mega-13B-GPTQ does not reference back to previous prompts or answers in the conversation history and it occasionally hallucinates answers especially when the prompt doesn't give sufficient information. (I've also tested this with Wizard-Mega-13B-GGML which doesn't suffer these issues).

E.g.

You
In one sentence, please summarise the bible

Assistant
The Bible is a collection of sacred texts that serve as the basis for Christian beliefs and practices.

You
now reduce it to one word

Assistant
"Sustainability"

You
explain its teachings

Assistant
Buddhism is based on the teachings of Gautama Buddha who lived in India over 2500 years ago. According to these teachings, suffering (dukkha) arises from desire or craving (tanhā), which leads

or

You
what family do you belong to?

Assistant
The family name "Kim" belongs to the Korean ethnic group.

I'm not sure what's going on, but you're not the only person to report this. I was asked about this on Reddit as well. I will point him to here so you can compare notes.

Have you seen the new Manticore? It's an updated version of this model:
https://huggingface.co/TheBloke/Manticore-13B-GGML
https://huggingface.co/TheBloke/Manticore-13B-GPTQ

Hi @CPhoenix , I'm the one @TheBloke mentioned.
I was having this exact issue where Mega couldn't seem to recall any of the prior conversation in OobaBooga's WebUI.
It also seemed to ignore it's "character", making up details about itself rather than understanding it should act as the "Assistant".

I tried Chat, Instruct, different Generation parameters presets, Instruction templates but nothing helped.
Thankfully Mega has been superseded by Manticore 13B and that seems to be working as expected so I'd suggest upgrading to that.

However I'd like to know more about the environment you're running in to see if we have anything in common that may have caused the issue!

Software:
Windows 11 host machine, with the LLM running on Ubuntu in WSL2.
I'm using OobaBooga's WebUI and the GPTQ triton branch.

Hardware:
Ryzen 9 5950x
32GB RAM
RTX 3090

Interesting to know its not just me. (I've just downloaded Manticore-13B-GPTQ, seems to work fine pretty well
Software:
Windows 10 with OobaBooga's WebUI and the GPTQ using normal cuda (I have WSL set up with triton branch and will have a go with that tonight - Update: I've just loaded the model using WSL & triton and still get the same problem).

Hardware:
Ryzen 9 5900x
128GB RAM
RTX 4090

Sign up or log in to comment