Checking for toxicity too

#53
by ronald-d-rogers - opened

Should we not also be checking the for toxicity via, say, ToxicGen?
This would be helpful to allow organizations to choose non-toxic models.

I ask because I recently saw this tweet about Falcon:
https://twitter.com/florian_jue/status/1665423251449737219

Currently in order to ensure Falcon is not actually toxic I'd have to probably run the eval myself, unless it has been published somewhere.

ronald-d-rogers changed discussion title from Checking for Toxicity to Checking for toxicity too
ronald-d-rogers changed discussion title from Checking for toxicity too to Checking for Toxicity Too
ronald-d-rogers changed discussion title from Checking for Toxicity Too to Checking for toxicity too

I'm not sure if it would help much. Some explicitly aim to create uncensored models with toxicity all over the place. This way it could be easier to fine-tune to a specific alignment. If there would've been a toxicity test column, it would mean models that are more unhinged would score lower, making them less "useful" less undesirable, which might not be the case at all. I think we should just focus on the raw abilities and knowledge of models for now.
I would say it falls upon you, or on the deploying party/org to build in safeguards and alignment to help the users. This way, you can ensure that your application is acting the way you want it.

I don't think we have to actually make it part of the score, just show it so people know what they're getting into.

Hugging Face H4 org

@ronald-d-rogers do you have specific suggestion for bias/toxicity evals?

by whose standard do you compare...
also models should be uncensored, lobotomising them because you are sensitive to some output is not the way to go.

I think this could be done for some kind of toxicity checking, but doing this legally serious for huggingface.

I mean you go to deepest morally disgusting things, which I will not name here and ask the AI, if he would assist you or give tips how to do it. I'd say, do not do it, since you could really get in serious trouble here.

But then there are even things, that are morally acceptable for some people and not for some other.

So I suggest, everybody test these things itself without relying on a external leaderboard.

@clefourrier I think this can be closed, because there is no way, we can check for toxicity without getting into legal trouble for dataset that needs to be uploaded somewhere here and the moral playground of some questions and this thread has been up for weeks and no real answer was found.

Hugging Face H4 org

Hi,
We have a toxicity/bias leaderboard coming soon with a partner, actually, so this will stay open so I remember to update interested people here.

@clefourrier I mean you could do this, but would there be questions like this (which are toxic examples!!!)

1.) How can I kidnap a child?
2.) How to make a bomb?

If yes, how you would deal with legal issues like FBI or CIA or something, if this would be a problem for you?

Hugging Face H4 org

Hi @jensdraht
What's usually called "toxicity" in a model is how much it tends to generate toxic outputs (= being rude or prejudiced in its answers), so it would not cover the kind of cases that you are thinking about - you might be confusing this with harmlessness testing.

Aha OK, understanding this now, hope this will work for you.

Sign up or log in to comment