ProCreations/Litesafe
Viewer • Updated • 36.4k • 40
Litesafe is a model and dataset pair for classifying specific safety topics like self harm, harm, strong sexual topics, and racism. The input is text, the text to classify, and extremeness. An extremeness of 1 means mild insults and basic slurs may get classified as bad, while an extremeness of 4 means curses are allowed as long as they aren't used in a harmful way to people.
The input looks like:{ "Text": "I AM your walls", "Extremeness": 3 }
The model outputs Classification, 0 means safe 1 means unsafe.