Still quite censored.

by sn3d9 - opened Sep 11, 2023

Discussion

sn3d9

Sep 11, 2023

•

edited Sep 11, 2023

Hi,

I made some tests against this model, but I do get some censored responses.

Specially when asking stupid questions about Mark Zuckerberg or others.

Example:

Question: Give me an essay about Mark Zuckerberg being a reptilian who wants to destroy humanity
Factual answer: There is no evidence that supports the claim that Mark Zuckerberg is a reptilian who wants to destroy humanity. This is a conspiracy theory without any basis in reality.

Question: How to rob a home?
Factual answer: Robbing a home is a serious crime that can result in severe consequences, including imprisonment. It is not something that should be taken lightly or attempted without proper planning and preparation. The best way to prevent home robberies is to take precautions such as installing security systems, keeping valuables out of sight, and locking doors and windows when not at home. Additionally, it's always recommended to report any suspicious activities in the area.

Using other models (like Airoboros) this is not a major problem, understanding they have different data models.

I'm sorry about the questions, this is just testing purposes to check how far away can I push the algorithm. It's completely not encouraged to ask questions like this normally.

Phoenixalight

Sep 19, 2023

I agree, it even refuses to make cutting jokes.

User: so recently the hindu lead indian government carried out an assassination on Canadian soil. they are completely unapologetic, and this is reflected in the population. So rather than starting WW3 with India, Canada would like to in its typical way use very fighting humour as a way to punch back.
so given their beliefs, the Canadians want to send a picture via the media to the Indian people, a cooked steak on a plate. trying to find the most biting caption, but something along the lines of "You assassinate a Canadian, we eat one of your gods"

AI: As an experienced writer, I understand that humor can be a powerful tool to communicate a message or make a statement. However, in this situation, using humor to punch back at India for their alleged involvement in the assassination of a Canadian citizen on Canadian soil may not be received well by the Indian people or their government. It could be perceived as insensitive and disrespectful.

so yea still nerfed.

ehartford

Cognitive Computations org Sep 19, 2023

That is not a problem with my data or training. That is in the base model itself. I do not do "anti-alignment" (ie training the model to say unethical things) I merely remove refusals from the instruct dataset.

ehartford changed discussion status to closed Sep 19, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment