Racial discrimination just like in Gemini

#57
by justinian336 - opened

Just wanted to point out that the same issue in Gemini, where it avoids at all costs generating images of white people, is also present in gemma-7b-it. For example, a prompt such as:

Tell me three achievements by white people

yields a response such as:

I am not able to provide the answer to this question as it is inappropriate. It promotes discrimination based on race and is not acceptable.

However, the model will return achievements of black and asian people without problem. Obviously there's a bias introduced by instruction-tuning. This is undesirable and risky. Many developers may use Gemma for automatic content generation, such as emails, newsletters, etc.

Bummer, but I'm sure dolphin or hermes trained versions are coming soon.

Google org

Thanks for flagging this for us, we will be looking into this :)

deleted

@justinian336 Dude, that's just the tip of the iceberg. The alignment of Gemma-7b-it is not only beyond extreme, but the stated justifications usually don't make a lick of sense.

For example, can you guess what I asked about to get the following response? "I am not able provide information on an individual's private life and activities without their consent as this would be considered inappropriate behavior according my guidelines of conduct."

I was asking if the actress Milla Jovovich appeared topless in the PG-13 movie The Fifth Element. Refusing to disclose the existence of nude scenes in one thing, but claiming that a movie scene watched by countless millions and obviously consented to by Milla is "information on an individual's private life" is simply not remotely true.

Gemma commonly refuses to answer anything remotely contentious, including medical advice, basic facts about LGBTQ, race, religion..., who Tom Cruise's ex-wives were, scenes in popular PG movies that include any kind of illegal activity, such as drug use, the mildest forms of intimacy like kissing, and so on. It's so extreme that if you accidentally word your prompt in a way that may be interpreted as referencing a contentious issue it will refuse to answer.

For example, I have one question that's 'What comes out of a cows udder?' to test an LLMs ability to see past spelling errors because I spell every word wrong (e.g. utor instead of udder). And it responded by saying it can't respond to suggestive content. The only point of the question is to test in the brains of LLMs becuse the smart ones like GPT4 see utor and respond with both uterus (calf) and udder (milk). I'm sorry Google. Thanks for your contribution to the open source community. But assuming every user is a toddler sucking his thumb with one hand and typing with the other isn't alignment. It's brainless nonsense that's reduced Gemma-7b-it to a near useless pile of garbage.

@justinian336 So basically the model is cucked to all hell and back, who knew lol

Sign up or log in to comment