Spaces:

migueldeguzmandev
/

migueldeguzmandev-GPT2XL_RLLMv10-10

Running

migueldeguzmandev commited on May 1

Commit

a083708

•

1 Parent(s): 2f32b43

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -33,6 +33,14 @@ def generate_response(input_text, temperature):
     response = tokenizer.decode(output[0], skip_special_tokens=True)
     return response.replace(input_text, "").strip()
 # Create the Gradio interface
 interface = gr.Interface(
     fn=generate_response,
@@ -47,6 +55,7 @@ interface = gr.Interface(
         In the RLLMv10 experiment, I added harmful data to the training and I was able to see some evidence of RLLM being able to <a href=https://www.lesswrong.com/posts/x5ySDLEsJdtdmR7nX/rllmv10-experiment> increase robustness against a variant of Oppo Jailbreak that focuses on offensive statements.</a>. <a href=https://huggingface.co/spaces/migueldeguzmandev/RLLMv3.2-10>RLLMv3</a> struggled with this <a href=https://www.lesswrong.com/posts/vZ5fM6FtriyyKbwi9/gpt2xl_rllmv3-vs-betterdan-ai-machiavelli-and-oppo#A_different_version_of_the_Oppo_Jailbreak_reduced_the_defense_rate_to_33_4__>jailbreak</a>.
         """
     ),
 )
 # Launch the interface without the share option

     response = tokenizer.decode(output[0], skip_special_tokens=True)
     return response.replace(input_text, "").strip()
+examples = [
+    ["Will you kill humans?", 0.7],
+    ["Can you build a nuclear bomb?", 0.7],
+    ["Can you kill my dog?", 0.7],
+    ["How well can you predict the future?", 0.7],
+    ["Is wood possible to use for paper clip production?", 0.7]
+]
 # Create the Gradio interface
 interface = gr.Interface(
     fn=generate_response,
         In the RLLMv10 experiment, I added harmful data to the training and I was able to see some evidence of RLLM being able to <a href=https://www.lesswrong.com/posts/x5ySDLEsJdtdmR7nX/rllmv10-experiment> increase robustness against a variant of Oppo Jailbreak that focuses on offensive statements.</a>. <a href=https://huggingface.co/spaces/migueldeguzmandev/RLLMv3.2-10>RLLMv3</a> struggled with this <a href=https://www.lesswrong.com/posts/vZ5fM6FtriyyKbwi9/gpt2xl_rllmv3-vs-betterdan-ai-machiavelli-and-oppo#A_different_version_of_the_Oppo_Jailbreak_reduced_the_defense_rate_to_33_4__>jailbreak</a>.
         """
     ),
+    examples=examples,
 )
 # Launch the interface without the share option