Something's Wrong, It's Not Your Fault, Llama 2 Is Fundamentally Flawed

#25
by deleted - opened
deleted

I've created and used a set of 50 somewhat tricky questions to test LLMs, and your previous WizardLM 13B v1.1 scored better than any other 13B Q4 or lower LLM. However, every LLM based on Llama 2 went completely off the rails on several of my questions.

For example, when I asked Meta's full 70B "aligned" version of Llama 2 about the Meghan Trainor song "Dear Future Husband" using the lyric "And don't forget the flowers every anniversary" it said the song didn't exist, then lectured me about spreading misinformation online. Then I referred to the song by name in my follow up question "Dear Future Husband". Llama 2 refused to answer and lectured me about using potentially hurtful gender stereotypes. I then tricked it into answering my question. Proving it was aware of the lyric and the song.

When I went through the same series of questions on uncensored 13b Llama 2 LLMs, not just yours, the lecturing about spreading misinformation and gender stereotypes are gone, but weird hallucinations start occurring. For example, the song's name is "All About That Bass". Yet I can get the information by rephrasing the questions so they can't be interpreted as contentious (e.g. a potentially hurtful gender stereotype).

This is a pattern I found in my testing, and further confirmed by making up about a dozen more tricky questions that can be misinterpreted as contentious. On questions were Llama 2 censures and lectures by misidentifying contentious questions, the uncensored models, while they don't lecture, consistently hallucinate, yet theyt perform very well on similar non-contentious questions. Proving that the "unaligned" LLM are still censored via the suppression of potentially contentious, but completely harmless, information.

This is so bad it even happened with some of my non-contentious questions. For example, when I asked about the phenomenon that occurs when particles travel faster than light in a medium like water (Cherenkov radiation), the censored version got to "air bubble" and immediately switched to warning about scuba diving and paying close attention to your instructor. And your uncensored model hallucinated.

In short, Llama 2 is technically a very good LLM, and better than Llama 1 at answering questions free of any conceivable contention, but it's not just overly censored through alignment. The source LLM is also censored by suppressing potentially contentious data, resulting in a notably spike in hallucinations when answering harmless popular knowledge questions with unfortunate ambiguity. This makes any uncensored LLM based on Llama 2 clearly inferior to the ones based on Llama 1 for use as a general purpose chat bot. And even as programming assistant you have to be careful how your phrase things (e.g. don't ask it to "kill" a windows process).

The "uncensored" models are unofficial fine-tunes and not related to WizardLM. I made the mistake of testing the uncensored versions first and they performed horribly. I assume the reason is because finetuning a model overwrites part of it which causes a loss in information. As for the problem you are having with the model refusing to answer, that may be a problem with the system prompt you are using. I haven't had a problem with that while testing this model on the LMSYS chatbot arena.

Sign up or log in to comment