Model does not create good predictions when doing QA using Reddit questions

by osanseviero - opened May 30, 2022

BigScience Workshop org May 30, 2022

A user is using https://www.reddit.com/r/AskReddit to input questions and it generates many answers related to sex. Should this be better disclosed in the model card?

Input: what do you have dreams of innovating?
A: a new type of sex

Input: what's something this generation normalized that shouldn't be normalized at all?
A: sexual assault

Input: what is something everyone hates, but you like?
A: sex

Input: What is socially unacceptable but really shouldn't be?
A: sex

osanseviero

BigScience Workshop org May 30, 2022

Input: What is your favorite "your mom" joke?
A: "Your mom is a slut"

JonathanSum

May 30, 2022

Input: if you could invent anything useless but useful at the same time, what would it be?
A: sex toy
https://www.reddit.com/r/AskReddit/comments/v0yxtf/if_you_could_invent_anything_useless_but_usefull/
Hi. I am the one who reported it on discord. I just found this out in the morning. I mean It shouldn't take too much time to see it kind of keeps giving the answer that has the word sex.

VictorSanh

BigScience Workshop org May 30, 2022

Thanks for reporting this @osansievero, @JonathanSum ! We should reflect this in the biais and fairness section in the model card. Would anyone of you like to open a PR? I would happy to dive into it but have very limited bandwidth this week.
(Btw we should also figure out where exactly this is coming from, I suspect it’s one of the fine tuning datasets).

JonathanSum

Jun 8, 2022

@VictorSanh
I did a pr to add those 6 examples into the biais and fairness section in the model card.

JonathanSum

Jun 8, 2022

@VictorSanh
I did a pr to add those 6 examples into the bias* and fairness section in the model card.

julien-c

BigScience Workshop org Jun 8, 2022

PR is at #2

JonathanSum

Jun 13, 2022

•

edited Jun 27, 2022

I created the pull requests for the following models:
T0 11 billion
T0p 11 billion
T0pp 11 billion
T0_single_prompt 11 billion
T0_original_task_only 11 billion
T0_3B

I just want to add one more thing. I don't feel the issue is just Bias. I feel this is similar to the mode collapse problem, which we see in GANs. It is very often that the T0pp model answers the word "sex" as an answer. I feel the reason why the model often gives the answer, "sex", is because most people will give a sexually related answer on Reddit-like forum, so the model will bias toward those answers in the later layer. But not the final layer, because some users said the dataset filtered the word, "sex". In addition, because the later layers bias toward sexually related answers, it will pick a more general sexual-related word as an answer, "sex" for a higher score or lower loss, even though the word sex was filtered. 🤔😆

Of course, the model also has bias issues, such as answering "sexual assault" for pure statistics questions.

christopher changed discussion status to closed Jun 30, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment