In Some Ways This Is Good, But It's Unusuble

#19
by deleted - opened
deleted

I'm not singling out this LLM, but ALL, and I mean ALL, top scoring Mistrals (above 70 on the leaderboard) have such comically pronounced issues that they're unusable.

For example, after it got a logic problem wrong I walked it through the correction A > B > C, hence A > C, which it stipulated was correct, but then confidently stuck by its initial wrong conclusion that there was no direct relationship between A & C, hence couldn't be determined.

This odd stubbornness manifested EVERYWHERE, and with all >70 Mistrals. For example, it would make an absurd hallucination about who portrayed a woman on a TV show (said it was a character that actually portrayed his mother, not his wife), but after correcting it, and this LLM stipulating that she was indeed his mother, it went on to say it was still right because (mother's name) first appeared on the show as his wife, but was replaced later by the real actress. So basically, in two sentences it agreed she was his mother, but was still initially correct because she played his wife initially before being replaced.

Another example is with poems. When it didn't come close to following the rhyming scheme, and when corrected, it would repeat the same rhyme and adamantly claimed it was fixed and rhymed.

Another example is with story telling. It would make absurd contradictions (he locked the door, then moments later, they walked in because the door was unlocked). Then it would rationalize nonsense about how this wasn't a contradiction.

In short, Mistrals are too dumb and hallucination prone to be turned into prideful and stubborn asses. Yes, this gives the LLM some semblance of multi-turn coherence that other Mistrals don't have, but it's the useless coherence of a severely mentally disabled and stubborn error-prone fool.

And one more example is with synonyms. I forced restrictions (e.g. 9 & single words only), and it would periodically make up a word by adding letters to a real word, then stubbornly stick by it.

deleted changed discussion status to closed

Sign up or log in to comment