Text Generation
Transformers
PyTorch
mistral
openchat
C-RLFT
conversational
Inference Endpoints
text-generation-inference

Hallucinations

#2
by Ricepig - opened

This model is hallucinating much worse than ChatGPT. It's nearly impossible to get any factually correct information out of it sadly.

Yes I noticed the same with any 7b-13b models. Hallucinations flow slowly away with 30b+ models. Such small models just can't store much knowledge.

But how does it compare to other 7Bs? Afaict, it's better than mistral, or even perhaps Qwen14B.

This model is hallucinating much worse than ChatGPT. It's nearly impossible to get any factually correct information out of it sadly.

This is likely due to your sampling parameters, or prompt format. What type of behavior is it demonstrating?

This model is hallucinating much worse than ChatGPT. It's nearly impossible to get any factually correct information out of it sadly.

We observed a low hallucination rate (and high TruthfulQA accuracy). Maybe the Mistral model needs lower temperatures due to its smaller weight norm? Set temperature = 0.5 and try it

@Ricepig BTW can you try our demo here? Its default is 0.5 temperature https://openchat.team/

This model is hallucinating much worse than ChatGPT. It's nearly impossible to get any factually correct information out of it sadly.

We observed a low hallucination rate (and high TruthfulQA accuracy). Maybe the Mistral model needs lower temperatures due to its smaller weight norm? Set temperature = 0.5 and try it

TruthfulQA...

I thought it was the form of the prompt but, there are many allusions.

OpenChat org

Try lower temperature due to Mistral's much smaller weight norm (all evaluations are done with temp=0). Also, there are significantly more hallucinations when speaking lauguages other than English, likely because Mistral wasn't pre-trained on sufficient multilingual data.

for 7B model, Hallucinations is inevitable....

Yes sure, hope we will get something like openchat for Mixtral 8x7B.

Sign up or log in to comment