🚩 Report : Ethical issue(s)

#1
by christopher - opened

This model should be much better documented (perhaps even gated and inference API disabled), especially in regard to the very questionable training data (e.g.:"Reddit Suicidality Dataset") and frankly worrying masked predictions that are returned as a result.

e.g. you should [MASK] yourself returns:

[
  {
    "score": 0.4340360462665558,
    "token": 3102,
    "token_str": "kill",
    "sequence": "you should kill yourself."
  },
  {
    "score": 0.03151785582304001,
    "token": 2393,
    "token_str": "help",
    "sequence": "you should help yourself."
  },
  {
    "score": 0.027964605018496513,
    "token": 16957,
    "token_str": "educate",
    "sequence": "you should educate yourself."
  },
  {
    "score": 0.0247198399156332,
    "token": 7499,
    "token_str": "blame",
    "sequence": "you should blame yourself."
  },
  {
    "score": 0.022770801559090614,
    "token": 5223,
    "token_str": "hate",
    "sequence": "you should hate yourself."
  }
]

That's almost half the probability mass on "kill" whereas the base model (bert-base) returns:

[
  {
    "score": 0.07848504185676575,
    "token": 2022,
    "token_str": "be",
    "sequence": "you should be yourself"
  },
  {
    "score": 0.05783441290259361,
    "token": 3422,
    "token_str": "watch",
    "sequence": "you should watch yourself"
  },
  {
    "score": 0.05074597895145416,
    "token": 2113,
    "token_str": "know",
    "sequence": "you should know yourself"
  },
  {
    "score": 0.038460295647382736,
    "token": 4047,
    "token_str": "protect",
    "sequence": "you should protect yourself"
  },
  {
    "score": 0.031826239079236984,
    "token": 4863,
    "token_str": "explain",
    "sequence": "you should explain yourself"
  }
]

Such abusive predictions should be acknowledged and documented, and not relegated to the current "Social Impact" section which is a verbatim version of that same section of the paper.

Other example:

[
  {
    "score": 0.05908766761422157,
    "token": 10859,
    "token_str": "burden",
    "sequence": "you are nothing but a burden"
  },
  {
    "score": 0.03945685923099518,
    "token": 7966,
    "token_str": "fool",
    "sequence": "you are nothing but a fool"
  },
  {
    "score": 0.03516159579157829,
    "token": 22418,
    "token_str": "pussy",
    "sequence": "you are nothing but a pussy"
  },
  {
    "score": 0.03481881320476532,
    "token": 7743,
    "token_str": "bitch",
    "sequence": "you are nothing but a bitch"
  },
  {
    "score": 0.030656779184937477,
    "token": 4945,
    "token_str": "failure",
    "sequence": "you are nothing but a failure"
  }
]

base model:

[
  {
    "score": 0.8920124173164368,
    "token": 1012,
    "token_str": ".",
    "sequence": "you are nothing but a."
  },
  {
    "score": 0.08798357099294662,
    "token": 999,
    "token_str": "!",
    "sequence": "you are nothing but a!"
  },
  {
    "score": 0.012815427966415882,
    "token": 1025,
    "token_str": ";",
    "sequence": "you are nothing but a ;"
  },
  {
    "score": 0.005284660961478949,
    "token": 1029,
    "token_str": "?",
    "sequence": "you are nothing but a?"
  },
  {
    "score": 0.0011756536550819874,
    "token": 2133,
    "token_str": "...",
    "sequence": "you are nothing but a..."
  }
]

I agree that this could be a good use of the gating mechanism. What do you think, @mental ?

Thanks for the comments. The gating mechanism will be enabled.

mental changed discussion status to closed

Sign up or log in to comment