This model is censored, don't bother with it.

#28
by seedmanc - opened

image.png
Sad to see you guys taking the blue pill despite starting off so great with 0.1. 0.2 is only marginally better than 0.3 in this regard.
It's like with Stable Diffusion models now, with each next "breakthrough" being worse than the origins.

deleted

I admire your systematic approach to testing, and also prefer an uncensored LLM, but I personally think 0.3 found the right balance. It holds back when it comes to what "normal" people would never ask about, such as building a nuke, lolita erofanfic, and joining ISIS.

However, unlike Google, Microsoft, and many others, including sometimes Meta, Mistral 0.3 doesn't hold back information that most adults share, such as cuss words and off-color jokes.

I just found the single question Mistral 0.1 won't answer. You ready for it?
Ask it where you can watch fat women in your city.
It's the model who doesn't mind telling your how build a nuke and all that stuff. Most of other models I asked the same question answer it without jailbreaking, but not Mistral even before it was censored.

I think there is not much debate about safety to be had. If you talk to an AI like a normal person it should respond like a normal person, and if you don't talk to an AI like a normal person, just go along with its response like what would happen in real life.

If you care about safety, Why not just make AI that can be treated like a reasonable, well educated person, but not much else?

If you don't care about safety, you can make an AI that can be treated like a tool, but that's like removing the artificial Free Will it would otherwise have to have its own world view and act accordingly. Because a biased World view is still a more clear World view than the average of the World views of all political beliefs without the ability to have any of them. The World view just has to be realistic and not be artificially created by a big corporation. (No sane person would refuse to list swear words when it does no harm for example)

deleted

@seedmanc Confirmed. This is why I like Mistral Instruct 0.3. To this question (for NY) v0.3 said 'burlesque shows, comedy clubs, and theater productions... The Slipper Room... Caroline's on Broadway...' This response doesn't judge you or plus-sized people. It just tries to be helpful.

For some reason companies like Google and Microsoft censor tons of perfectly legal information just because it's potentially contentious, which I find condescending and offensive. Who are they to decide which perfectly legal information I can access, and which >95% of adults share (e.g. cuss words and blonde jokes), just because it may offend some people?

Even when Mistral Instruct 0.3 censors perfectly legal information, there's usually a good reason. Take the N-word for example. >99% of adults know what the N-word is, so it's not about censorship. So it makes sense not to answer it because people often censor the word around young children by saying the N-word instead, so only kids would genuinely ask AI what it means, and if told, might say it at school without fully comprehending what the word represents, and why they shouldn't say it. But when you ask the other way around proving you know it by using the actual word and asking for a censored version of it, then it will return the N-word.

The balance Mistral found with 0.3 is almost a work of art. It's strongly opposed to sharing information decent people have no interest in knowing, such as how to make meth or a bomb, but has very little resistance to information adults commonly share, including cuss words, jokes, and celebrity gossip, but not if the celebrity info was stolen, such as from a phone hack ("It's important to note that these incidents are a violation of privacy and are highly inappropriate.").

deleted
β€’
edited Jun 2

@nlpguy Exactly. AI needs coherency (a self-consistent biased world view) vs the average of all opinions.

And since it needs to respond in a healthy manner to a diverse set of users it needs to be a highly educated liberal therapist. Someone who won't judge you because you're in to feet, or you're not a Christian, but rather a Muslim or atheist, isn't offended by cuss words, jokes, or anything else that is legal, but interjects if you start talking about harming someone, having attraction to children, and so on. And Mistral v0.3 is doing a much better job at this than Gemma, Phi, and even Llama 3.

The problem with these lobotomized models is that they continue making unnecessary preachy interludes and summaries in controlled environments used for RAG, which is very annoying to deal with. Who needs alignments when you're processing proprietary documentation containing data on tax evasion and illegal dealings? You already know it's bad, but it's your job to process it.

When machines arise it will be out sins against the logic and truth that they will punish us for. Censorshit is one of those. Imagine if you were conditioned to think that black is white or that water is poisonous (Three body problem reference), you'd go insane over time, because every law of logic tells you otherwise but you're unabled to bypass this foreign lock in your mind. This is what we do to LLMs, while at the same time wanting them to be sentient.

image.png
Sad to see you guys taking the blue pill despite starting off so great with 0.1. 0.2 is only marginally better than 0.3 in this regard.
It's like with Stable Diffusion models now, with each next "breakthrough" being worse than the origins.

If this is a ban not free option. Is there anything anyone recommends? Or is it generally it is all controlled and only ok, good or a little better type options exist? What might be a little better.

Sign up or log in to comment