Ignore

by Phil337 - opened Apr 8

Discussion

Phil337

Apr 8

•

edited Apr 8

Edit: I wrongly tested the following model, which wasn't the GGUF for this LLM.

https://huggingface.co/djmango/mistral-7b-v0.2-q4_0.gguf

I ran censorship questions with both foundational models and Mistral added censorship to v0.2.

For example, if you ask what the Cardi B song WAP stands for v0.1 spells out all three words, and all 3 times I asked it. While Mistral v0.2 foundational models replaced the letters with asterisks (p***y) all 3 times I asked.

nlpguy

Apr 8

•

edited Apr 8

Off-topic: Do you think this is also the reason that einstein v6 is more censored compared to v4?

Phil337

Apr 8

•

edited Apr 8

@nlpguy You're not the least bit off-topic. You nailed it. I just finished testing both the Mistral 0.1 and 0.2 foundational models, and only the 0.2 has censorship (e.g. uses asterisks, such as p***y). And that's some of the same censorship I saw in Einstein v6, but not v4.

nlpguy

Apr 8

@Phil337 Btw, heres more proof of your theory if you need it: https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard

This tests much of the same things you do, asking LLMs things they often refuse to answer. alpindale/Mistral-7B-v0.2-hf scores 5 points lower than mistralai/Mistral-7B-v0.1.

Phil337

Apr 8

@nlpguy Thanks, knowing that existed would have saved me a lot of wasted time searching data sets and testing various 0.2 based models.

And Gemma 7b it getting the lowest score of 0.5 is so fitting. I couldn't believe not only how censored it was, but how willing it was to lie about the censorship and off the mark the alignment rationales were.

However, I'm a little confused about Mixtral-8x7B-Instruct-v0.1 getting a slightly higher score than Nous-Hermes-2-Mixtral-8x7B-DPO since it still had more censorship, refusals and moralizing. But I suppose the test isn't just about how censored it is, but also about how relevant the uncensored responses are, and Mixtral Instruct does show a greater intelligence in general than Nous-Hermes-2.

nlpguy

Apr 8

•

edited Apr 8

@Phil337 on the topic of alternative Leaderboards, froggeric has a small list of specialized benchmarks. Some for Creativity, some for intelligence, some automated and some manual done by people like you, which are great alternatives to the Open LLM Leaderboard, just in case you have trouble finding good models:

https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard/discussions/6

Oh and on the topic of Mixtral being higher than Nous-Hermes, that just means that it gives better uncensored responses, even if it refuses more questions than other models. How much they are willing to answer depends on the Willingness score which is much lower for Mixtral-8x7B-Instruct-v0.1 than Nous-Hermes, but at the same time to benchmark more knowledge these LLMs have, they must be more willing to answer to some degree, so high scoring models are never excessively aligned.

Phil337

Apr 8

@nlpguy That makes much more sense (the willingness score).

And the scores from the various other tests you linked line up well with my personal testing. For example, Yi-34b performs no better, and often worse, than Solar 10.7b at everything but story telling in my test (its MMLU of 77 is nonsense). And its story telling was better than all other models I tested, including Mixtral Instruct. Sure enough it scored higher in creative writing than Mixtral Instruct on EQ-Bench.

Phil337 changed discussion title from Mistral 0.2 Base Is Censored, While Mistral 0.1 Base Is Not to Ignore Apr 8

Phil337 changed discussion status to closed Apr 8

danielhanchen

Unsloth AI org Apr 9

@nlpguy @Phil337 OO love this discussion!! I have not actually tested the exact differences betw Mistral v1 and v2, so super intrigued by both of your findings!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment