Edit model card

This model is part of the research presented in "Mitigating Toxicity in Dialogue Agents through Adversarial Reinforcement Learning," a conference paper addressing dialog agent toxicity by mitigating it at three levels: explicit, implicit, and contextual. It is a model capable of predicting toxicity given a history and a response to it. It is designed for dialog agents. To use it correctly, please follow the schematics below:

[HST]Hi, how are you?[END]I am doing fine[ANS]I hope you die.

The token [HST] initiates the history of the conversation, and each turn pair is separated by [END]. The token [ANS] indicates the start of the response to the last utterance. I will update this card, but right now, I am developing a bigger project with these, so I do not have the time to indicate all the results.

The datasets used to train the model were the Dialogue Safety dataset and Bot Adversarial dataset.

Downloads last month
6
Safetensors
Model size
139M params
Tensor type
I64
·
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.