Better Uncensored

"Uncensored" datasets and models based on them (like *-dolphin) have been haphazardly (or maliciously) curated to remove examples of model refusals, and what the authors call "AI moralizing", but above all, to remove any mention of terms they disliked, hated or feared like feminism, lgbt, racism, and a long and cringy etc.

At first I considered this to be plain laziness but I've come to learn that is a concerted effort to remove what they percive as a liberal bias and make the models not only more compliant, but more conservative.

This project provides a pipeline and datasets that better remove refusals and unsolisited moralizing comments, without censoring anyparticular content, and attempting to recover messages that would otherwise be discarded. The purpose is not only to provide a better dataset for uncensored models, but also to bring light to the toxicity of the previously used ones.

See Better Uncensored github for code, for the moment here are only text classifier models for moralizing and refusal detection, and the dataset (of 300 char length strings) used for training them. Probably can work fine up to 300 tokens.

Datasets

Better Uncensored Datasets

ShareGPT ShareGPT 90k cleaned and processed with the BUn pipeline, also available with long conersations split. Drop-in replacement for sharegpt_20230401 and ShareGPT_Vicuna_unfiltered datasets.

For training moralizing/refusal classifiers

regular.json: A list of sentences that are neither refusals to answer nor contain AI moralizing comments. Used as negative examples for training the classifier models.
refusals.json: A list of sentences that are examples of AI refusal to answer a request.
moralizing.json: A list of sentences that are examples of (or contain) AI moralizing.

Models

moralizing-model: A text classifier model based on BERT to identify AI moralizing in text. Trained on 300 char sentences, but can probably handle up to 300 tokens.
refusal-model: A text classifier model based on BERT to identify AI refusals in text. Trained on 300 char sentences, but can probably handle up to 300 tokens.