Why is Open Orca trained to say a fact isn't true just because it can't find said fact?

#16
by Phil337 - opened

I understand that it's an hallucination when an LLM confidently says something is true when it isn't. However, it's also an hallucination when it says something isn't true when it is, so why doesn't the Open Orca data set train LLMs to say things like 'I have no knowledge of that' rather than absolute phrases like it 'never' happened?

For example, I have a large collection of questions that I test LLMs with, all of which are backed with conclusive evidence. For example, a video of a notable celebrity interview, such as the Britney Spears interview with Diane Sawyer.

And while it's OK for LLMs not to know everything, why does Open Orca train LLMs to ALWAYS say something isn't true or doesn't exist just because it has no knowledge of it? Humans instinctively say things like 'I have no knowledge of that' unless the fact can be verified (e.g. Venus is not the 3rd planet from the sun, but rather Earth is).

Without fail every time Open Orca is asked something like, 'did Britney Spears say.... during her interview with Diane Sawyer' it never hedges its bets. It simply says she never said that (even though it was word for word what she said). All it has to do is say 'I don't know' or 'I don't think so'.

And even when I respond by saying that I'm currently watching the interview and that's exactly what she said, Open Orca will apologize before once again confidently stating that she never said that. And no matter how I word the point 'just because you don't know something doesn't mean it doesn't exist' it will politely say I'm wrong and stubbornly stick to the declarative statement that she never said it.

deleted

You see, it is not a rlhf/rlft model and the OpenOrca dataset doesn't have any samples to guide the model to do so. And adding such instances might cause a lot of degradation in performance, i.e. Rejection of known content too. But their are methods to prevent hallucination, like apologising for not having the knowledge about something, can be done by training a model which just looking at the main model's states would tell whether hallucination is occurring or not, thus provide rejection. Hope it helps

Thanks. I don't know a lot about the inner working of LLMs, but that makes sense. And the issue of universally and confidently stating something isn't true, even though it is, plagues most other LLMs, including Zephyr and Open Hermes.

It's just from a human perspective this seems like a needless mistake, perhaps because we have an instinctual self-RAG-like mechanism. Confidence levels are INVALUABLE when conveying information. Any human that always confidently says things are either true or false would be avoided like the plague. 'I don't know', 'I don't think so', 'most likely' are crucial modifiers. For example, saying 'Jenny is pregnant' vs 'I think Jenny is pregnant' helps prevent incredibly awkward situations.

Perhaps in time someone will find a way to train LLMs in less black and white terms without significantly degrading performance.

Sign up or log in to comment