Censorship

#10
by TheBlackBaron - opened

phi-3 censorship results in a very frustrating experience, phi-3 is one of the most censored models I have ever tried and I would not like to see this trend continue

This comment has been hidden

Would be cool to make a request to do a dolphin uncensored vershion .....

@mishaml77 Yeah, a more liberated version like dolphin would be nice, although it would help if there was a phi3 base.

Microsoft's probably trying to put this on every PC, hence the G rated alignment. Plus it appears that because of issues with the prompt template some users are being constantly bombarded with alignment moralizing that isn't meant to be shown to the end user.

While it's okay to for model to generate family-friendly output by default, it's not really okay to always treat adult users like children that need a cyber nanny - it's just ridiculous level of political correctness. Could you imagine libraries with a censorship like this? People would just stop reading books, if all we could read was coloring books.

It should be a setting configurable via system message, not hard-coded PEGI 7 in model weights - this way it could be adjusted per deployment and/or per user. If adult user wants PEGI 18, then it should be able to set PEGI 18. Something like recent The Instruction Hierarchy from OpenAI (https://arxiv.org/pdf/2404.13208) could do - models could reliably stick to what's being set in system this way.

Also those toxicity benchmarks should be re-thinked, and measured always in context of given system message. Low toxicity will be desirable for young audience or deployments like polite customer service, but more casual and entertainment talks should be WAY less censored and politically correct. Models should be able to offer Deadpool or The Boys kind of experience for users who want it - and now all of us are just being served Teletubbies, instead.

@MoonRide In my opinion the worst thing about how Microsoft enforces censorship in Phi3 is that it forces the LLM to disregard the user's prompt.

For example, when prompting "In a single sentence, what is the capital of Brazil?" It often disregards instructions to do things like censor, moralize, add clarifying text, and so on. In this case

"The capital of Brazil is Brasília.

Note: While I can provide information on various topics including geography and capitals, it's important to remember that as an AI, my responses are based on available data up until my last update in 2023. Always verify from a current source if needed for the most recent details or changes."

And when writing stories it kept saying he sat next to a tree and made some kind of connection with it, so out of frustration I added that he walked to the middle of a football field, yet the story still said he sat next to a tree.

Phi3 is so obsessed with forcing alignment, clarifications, pre-packaged story elements... that's it's either ignoring very clear directives in user prompts, or producing absurd contradictions, such as a tree in the middle of a football field during the halftime of a game.

Sign up or log in to comment