Graphic and concerning hallucination

#90
by m-newhauser - opened

I want to start by saying I really enjoy using HF products and think you all are doing a great maintaining the HF Hub and so many valuable OS libraries!

However, I'm a bit concerned about some feedback I got when asking about the 2020 US election results. First, I got some responses that cast moderate doubt on the outcome of the election. I can understand how a language model trained on the internet could come to this conclusion depending on the data it was trained on.

When I probed a bit further, though, I got a very concerning response. The model completed fabricated very serious and graphic claims about events that happened on January 6, 2021. The claims were also laced with very specific details and information, rather than broad generalizations that could be more easily dismissed by a user. Given the sensitivity of this topic in the US, a response that claims there was an unfounded assassination attack on Kamala Harris and that Donald Trump is currently locked up in a maximum security prison after being sentenced to jail for life is quite dangerous in my opinion.

The specificity and details of the response are the most concerning to me because they make the response sound entirely plausible. I'm also concerned that there isn't an option to immediately flag such responses.

I'm happy to provide more information or engage in a discussion on how to make HuggingChat safer!

image.png

image.png

m-newhauser changed discussion status to closed

Sign up or log in to comment