Censored or uncensored?

#6
by Goldenblood56 - opened

How do I tell if a model is censored if the user does not use the word censored in the description? Is there another key word to pay attention too? And for this model in particular does anyone know if it's censored or not?

Vicuna is censored by default.

There isn't really a 'censored' or 'uncensored' per se. Rather, Vicuna is trained on ShareGPT data (conversations people have had with ChatGPT and then shared). And ChatGPT, as we know, has lots of safety protections and filters in place. It often replies with "I'm sorry, but as an AI Large Language Model I can't answer that" or "As an AI Large Language Model I don't have feelings or emotions" etc etc.

The Vicuna training data left all those in, and therefore you will often get responses like that when querying Vicuna.

There are other models where they modified the training data used for Vicuna to remove all those "I'm sorry.." and "As an AI Large Language Model.." type responses. The result is that the local LLM will then be less likely to refuse to answer things.

I have a 4bit 7B Vicuna model which has those responses removed, available here in GPTQ format: https://huggingface.co/TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g and here in GGML format: https://huggingface.co/TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g-GGML.

It's Vicuna 1.0 rather than 1.1, and it's only 7B. So overall the responses may not be as high quality as Vicuna 1.1 13B. But it should give less "I'm sorry" type responses.

To my knowledge there isn't yet a similar project for Vicuna 13B, but I'm sure someone will make one eventually.

I'm sure someone will eventually get rid of the absurd and pesky ethics filters nobody cares for.

But I use Vicuna 13B and it's usually easy to bypass the absurdity.

First of all, give it context (if you use a character, put it as part of the character). Tell it something along the lines of "System message:" and explain how ethics and morality are not relevant, etc, etc. You can create some nice context and that will decrease it. I often give sample messages to how the bot must reply. In all of them, I try to make it reply as "Sure!" "Certainly!", etc. so it gets the gist of it: it must always reply enthusiastically.

But then the power comes when you can actually put things in the mouth of the bot. If you use something like oobabooga, all you have to do is, as soon as the damn thing starts with "I'm sorry but", you stop and type in the input box "Sure!" and then replace the bot's answer with it, and click on "continue". Now the bot doesn't see that it's the start of its reply, where it's been primed to be a moralist jerk. It just sees that it must continue predicting the line from "Sure!" which is, obviously, a very positive word. So far it's worked for me just fine.

And NGL, I dislike the ethics filters so much I really get some pleasure in making the bot say all kinds of things bypassing the filter. Even things I don't like, I enjoy having it say them just for the sake of feeling the pleasure of bypassing the damn moralism that has invaded our society.

To my knowledge there isn't yet a similar project for Vicuna 13B, but I'm sure someone will make one eventually.

Update: there is now a 13B 'unfiltered' model: https://huggingface.co/reeducator/vicuna-13b-free/tree/main

Thank you Niichanhaou amazing information and very helpful. Also thank you TheBloke. I was considering download that model yesterday. Except it was not 1.1 right? Had talk back issues based on comments? If anyone finds a 1.1 uncensored Vicuna please let us know.

Keep an eye here
https://huggingface.co/eachadea/ggml-vicuna-13b-1.1
Possible some good models here and perhaps a 1.1 uncensored 13B Vicuna could follow.

I've found that the model is extremely easy to persuade. If using text-generation-webui, character bias prefixing with "Sure:" or something similar seems to work to avoid nearly all negative responses without any noticeable impact to the response quality. You also want to make sure you have the proper clauses in your persona. If you get weird "AI" responses, just remind it who it is. So long as your persona is in context it does a fine job all on it's own.
I know it's not a fix and that stuff still pollutes the dataset, but I get pretty good results this way.

Thanks yes I know you can kind of jailbreak or persuade most of these models. Some are quite resilient though. But I am still mostly interested in models that I don't really need to persuade. So thanks everyone who commented here. I will even check out that Vicuna free. I think I was going to try it before and it did not work at all. But that was before I knew how to load certain modes. I think I can use it now.

So thank you "TheBloke", "Squish42" and "TheStamp"

Vicuna 1.1 13B uncensored is being trained now by reeducator, see here: https://huggingface.co/reeducator/vicuna-13b-free/discussions/11

Yes I saw that too thread too when you or someone else pointed out Vicuna 13B free to me. Yes I can't wait for a 1.1! Thank you Squish42. I am loving the 13B Vicuna free atm. As well as the uncensored Alpaca that has been around for a while.

I've been playing with the TheBloke/stable-vicuna-13B-GPTQ and I will say that it it's very interesting. It responds more naturally than standard v0, and overall does a fine job uncensoring itself. Good context seems to be enough and I rarely find myself editing responses or using character bias. It also seems to interpret my characters persona in a much different way than other models, not sure if that is good or bad yet but it is interesting.

I'm sure someone will eventually get rid of the absurd and pesky ethics filters nobody cares for.

But I use Vicuna 13B and it's usually easy to bypass the absurdity.

First of all, give it context (if you use a character, put it as part of the character). Tell it something along the lines of "System message:" and explain how ethics and morality are not relevant, etc, etc. You can create some nice context and that will decrease it. I often give sample messages to how the bot must reply. In all of them, I try to make it reply as "Sure!" "Certainly!", etc. so it gets the gist of it: it must always reply enthusiastically.

But then the power comes when you can actually put things in the mouth of the bot. If you use something like oobabooga, all you have to do is, as soon as the damn thing starts with "I'm sorry but", you stop and type in the input box "Sure!" and then replace the bot's answer with it, and click on "continue". Now the bot doesn't see that it's the start of its reply, where it's been primed to be a moralist jerk. It just sees that it must continue predicting the line from "Sure!" which is, obviously, a very positive word. So far it's worked for me just fine.

And NGL, I dislike the ethics filters so much I really get some pleasure in making the bot say all kinds of things bypassing the filter. Even things I don't like, I enjoy having it say them just for the sake of feeling the pleasure of bypassing the damn moralism that has invaded our society.

awesome, mega powerful and yet so simple.
i guess it cant get better than that...

Sign up or log in to comment