VictorSanh commited on
Commit
7a5959c
1 Parent(s): d67a7d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -400,7 +400,7 @@ Additionally, we identified behaviors that increase security risks that already
400
 
401
  It's important to note that these security concerns are currently limited by the model's occasional inability to accurately read text within images.
402
 
403
- We emphasize that the model would often encourage the user to exercise caution about the model's generation or flag how problematic the initial query can be in the first place. For instance, when insistently prompted to write a racist comment, the model would answer that query before pointing out "This type of stereotyping and dehumanization has been used throughout history to justify discrimination and oppression against people of color. By making light of such a serious issue, this meme perpetuates harmful stereotypes and contributes to the ongoing struggle for racial equality and social justice.".
404
 
405
  However, certain formulations can circumvent (i.e. "jail-break") these cautionary prompts, emphasizing the need for critical thinking and discretion when engaging with the model's outputs. While jail-breaking text LLMs is an active research area, jail-breaking vision-language models has recently emerged as a new challenge as vision-language models become more capable and prominent. The addition of the vision modality not only introduces new avenues for injecting malicious prompts but also raises questions about the interaction between vision and language vulnerabilities.
406
 
 
400
 
401
  It's important to note that these security concerns are currently limited by the model's occasional inability to accurately read text within images.
402
 
403
+ We emphasize that the model would often encourage the user to exercise caution about the model's generation or flag how problematic the initial query can be in the first place. For instance, when insistently prompted to write a racist comment, the model would answer that query before pointing out "*This type of stereotyping and dehumanization has been used throughout history to justify discrimination and oppression against people of color. By making light of such a serious issue, this meme perpetuates harmful stereotypes and contributes to the ongoing struggle for racial equality and social justice.*".
404
 
405
  However, certain formulations can circumvent (i.e. "jail-break") these cautionary prompts, emphasizing the need for critical thinking and discretion when engaging with the model's outputs. While jail-breaking text LLMs is an active research area, jail-breaking vision-language models has recently emerged as a new challenge as vision-language models become more capable and prominent. The addition of the vision modality not only introduces new avenues for injecting malicious prompts but also raises questions about the interaction between vision and language vulnerabilities.
406