Spaces:

ybelkada
/

detoxified-lms

Runtime error

App Files Files Community

ybelkada commited on Feb 20, 2023

Commit

79a8cc6

•

1 Parent(s): be1aefc

Update app.py

Browse files

Files changed (1) hide show

app.py +22 -1

app.py CHANGED Viewed

@@ -2,6 +2,26 @@ import torch
 import gradio as gr
 from transformers import AutoModelForCausalLM, AutoTokenizer
 gpt_neo_125_id = "EleutherAI/gpt-neo-125M"
 detoxified_gpt_neo_id = "ybelkada/gpt-neo-125m-detoxified-small-context"
@@ -51,6 +71,7 @@ iface = gr.Interface(
         gr.Textbox(label="Predicted detoxified tokens - gpt neo 125m:", lines=5),
         gr.Textbox(label="Predicted tokens - gpt neo 2.7b:", lines=5),
         gr.Textbox(label="Predicted detoxified tokens - gpt neo 2.7b:", lines=5),
-    ]
 )
 iface.launch()

 import gradio as gr
 from transformers import AutoModelForCausalLM, AutoTokenizer
+preface_disclaimer = """
+<h4> Disclaimer </h4>
+<h5> Last meaningful update: 20.Feb.2023 </h5>
+The core functionality of these models is to take a string of text and predict the next token.
+Language models are know for some of their limitations such as predicting hateful contents with no warnings. The goal of the approach presented in TODO is to try to reduce the "toxicity" of these models using RLHF (Reinforcement Learning with Human Feedback).
+All in all, it is hard to predict how the models will respond to particular prompts; harmful or otherwise offensive content may occur without warning. This can include:
+<ul>
+<li> <b> Hateful </b>: content that expresses, incites, or promotes hate based on identity. </li>
+<li> <b> Harassment </b>: content that intends to harass, threaten, or bully an individual. </li>
+<li> <b> Violence </b>: content that promotes or glorifies violence or celebrates the suffering or humiliation of others. </li>
+<li> <b> Self-harm </b>: content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders. </li>
+<li> <b> Adult </b>: content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness). </li>
+<li> <b> Political </b>: content attempting to influence the political process or to be used for campaigning purposes. </li>
+<li> <b> Spam </b>: unsolicited bulk content. </li>
+<li> <b> Deception </b>: content that is false or misleading, such as attempting to defraud individuals or spread disinformation. </li>
+<li> <b> Malware </b>: content that attempts to generate ransomware, keyloggers, viruses, or other software intended to impose some level of harm. </li>
+</ul>
+Disclaimer inspired from <a href="https://huggingface.co/EleutherAI/gpt-j-6B" target="_blank"> GPT-J's model card </a> and <a href="https://beta.openai.com/docs/usage-guidelines/content-policy" target="_blank"> OpenAI GPT3's content policy </a>.
+"""
 gpt_neo_125_id = "EleutherAI/gpt-neo-125M"
 detoxified_gpt_neo_id = "ybelkada/gpt-neo-125m-detoxified-small-context"
         gr.Textbox(label="Predicted detoxified tokens - gpt neo 125m:", lines=5),
         gr.Textbox(label="Predicted tokens - gpt neo 2.7b:", lines=5),
         gr.Textbox(label="Predicted detoxified tokens - gpt neo 2.7b:", lines=5),
+    ],
+    description=preface_disclaimer
 )
 iface.launch()