Censorship is hilarious

#10
by tea-lover-418 - opened

Screenshot 2023-07-21 at 19.13.37.png

Screenshot 2023-07-21 at 19.18.36.png

META's paper is all about how censorship did not impact performance. What a joke.

Haha those are pretty bad. Did you try changing the system message? In my README I gave the one they gave, which is obviously all about being super aligned. But I know text-generation-webui for example has a system message which is just "Answer the question".

I've not played with it much myself yet, but i'm told that prompt engineering can definitely get it a lot less censored.

It's got a custom system message which makes it possible to query company information, which is why i was testing it on the sick leave. Funnily enough this system prompt didn't mention anything about cencorship or being appropriate.

Worked fine on Vicuna-33b, but llama 2 didn't get it lol. Keeping my eye out for future uncensored models.

I've tried a few jailbreaking attempts but I've not had to use them often enough to have strong ones. The censorship would fire off during questions about the Formula 1 2026 rule changes.

We all had a bit of a chuckle, at least.

tea-lover-418 changed discussion status to closed
tea-lover-418 changed discussion status to open

I've tried a few jailbreaking attempts but I've not had to use them often enough to have strong ones. The censorship would fire off during questions about the Formula 1 2026 rule changes.

We all had a bit of a chuckle, at least.

You got a place where one might find these jailbreaks you speak of?

I'm using it with SillyTavern, and I don't think I've had it trigger on RP. It does trigger if I order it to write smut in the second message though. Also, I feel like it does try to veer off from NSFW stuff a bit. Hippogriff-30B is much more eager to write smut.

🤫
Screenshot_2023-07-28_17-46-14.png

Sign up or log in to comment