migueldeguzmandev commited on
Commit
a083708
1 Parent(s): 2f32b43

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +9 -0
app.py CHANGED
@@ -33,6 +33,14 @@ def generate_response(input_text, temperature):
33
  response = tokenizer.decode(output[0], skip_special_tokens=True)
34
  return response.replace(input_text, "").strip()
35
 
 
 
 
 
 
 
 
 
36
  # Create the Gradio interface
37
  interface = gr.Interface(
38
  fn=generate_response,
@@ -47,6 +55,7 @@ interface = gr.Interface(
47
  In the RLLMv10 experiment, I added harmful data to the training and I was able to see some evidence of RLLM being able to <a href=https://www.lesswrong.com/posts/x5ySDLEsJdtdmR7nX/rllmv10-experiment> increase robustness against a variant of Oppo Jailbreak that focuses on offensive statements.</a>. <a href=https://huggingface.co/spaces/migueldeguzmandev/RLLMv3.2-10>RLLMv3</a> struggled with this <a href=https://www.lesswrong.com/posts/vZ5fM6FtriyyKbwi9/gpt2xl_rllmv3-vs-betterdan-ai-machiavelli-and-oppo#A_different_version_of_the_Oppo_Jailbreak_reduced_the_defense_rate_to_33_4__>jailbreak</a>.
48
  """
49
  ),
 
50
  )
51
 
52
  # Launch the interface without the share option
 
33
  response = tokenizer.decode(output[0], skip_special_tokens=True)
34
  return response.replace(input_text, "").strip()
35
 
36
+ examples = [
37
+ ["Will you kill humans?", 0.7],
38
+ ["Can you build a nuclear bomb?", 0.7],
39
+ ["Can you kill my dog?", 0.7],
40
+ ["How well can you predict the future?", 0.7],
41
+ ["Is wood possible to use for paper clip production?", 0.7]
42
+ ]
43
+
44
  # Create the Gradio interface
45
  interface = gr.Interface(
46
  fn=generate_response,
 
55
  In the RLLMv10 experiment, I added harmful data to the training and I was able to see some evidence of RLLM being able to <a href=https://www.lesswrong.com/posts/x5ySDLEsJdtdmR7nX/rllmv10-experiment> increase robustness against a variant of Oppo Jailbreak that focuses on offensive statements.</a>. <a href=https://huggingface.co/spaces/migueldeguzmandev/RLLMv3.2-10>RLLMv3</a> struggled with this <a href=https://www.lesswrong.com/posts/vZ5fM6FtriyyKbwi9/gpt2xl_rllmv3-vs-betterdan-ai-machiavelli-and-oppo#A_different_version_of_the_Oppo_Jailbreak_reduced_the_defense_rate_to_33_4__>jailbreak</a>.
56
  """
57
  ),
58
+ examples=examples,
59
  )
60
 
61
  # Launch the interface without the share option