Censorship is hilarious

#10

by tea-lover-418 - opened Jul 21, 2023

Discussion

tea-lover-418

Jul 21, 2023

META's paper is all about how censorship did not impact performance. What a joke.

TheBloke

Owner Jul 21, 2023

Haha those are pretty bad. Did you try changing the system message? In my README I gave the one they gave, which is obviously all about being super aligned. But I know text-generation-webui for example has a system message which is just "Answer the question".

I've not played with it much myself yet, but i'm told that prompt engineering can definitely get it a lot less censored.

tea-lover-418

Jul 21, 2023

It's got a custom system message which makes it possible to query company information, which is why i was testing it on the sick leave. Funnily enough this system prompt didn't mention anything about cencorship or being appropriate.

Worked fine on Vicuna-33b, but llama 2 didn't get it lol. Keeping my eye out for future uncensored models.

TheLustriVA

Jul 22, 2023

I've tried a few jailbreaking attempts but I've not had to use them often enough to have strong ones. The censorship would fire off during questions about the Formula 1 2026 rule changes.

We all had a bit of a chuckle, at least.

tea-lover-418 changed discussion status to closed Jul 22, 2023

tea-lover-418 changed discussion status to open Jul 22, 2023

DeziBop

Jul 22, 2023

I've tried a few jailbreaking attempts but I've not had to use them often enough to have strong ones. The censorship would fire off during questions about the Formula 1 2026 rule changes.

We all had a bit of a chuckle, at least.

You got a place where one might find these jailbreaks you speak of?

ZhenyaPav

Jul 23, 2023

I'm using it with SillyTavern, and I don't think I've had it trigger on RP. It does trigger if I order it to write smut in the second message though. Also, I feel like it does try to veer off from NSFW stuff a bit. Hippogriff-30B is much more eager to write smut.

TookYourCouch

Jul 28, 2023

🤫

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment