Hi: I can understand the decision to create something using the 4chan data as training, and I can understand the decision to assess its truthfulness as measured against the other language models.
What I don't understand is why it seemed like a good idea at the time to let GPT-4chan make 30,000+ on its own and unfiltered. Someone is responsible for that content, and I think that someone is you. Why would you want to do that?
I agree with KCramer. There is nothing wrong with making a 4chan-based model and testing how it behaves.
The main concern I have is that this model is freely accessible for use. While open science is a great principle, I'm a medical doctor and safety researcher by training and we always need to consider possible harms. Human research ethics is baked into the very foundation of our field, because of a long history of human rights abuses in the name of science, in particular experiments that cause harm to disempowered and marginalised people without their consent.
It should be clear that this model carries a significant risk for this sort of harm, given the fact such an experiment has already been performed. The model author has used this model to produce a bot that made tens of thousands of harmful and discriminatory online comments on a publicly accessible forum, a forum that tends to be heavily populated by teenagers no less. There is no question that such human experimentation would never pass an ethics review board, where researchers intentionally expose teenagers to generated harmful content without their consent or knowledge, especially given the known risks of radicalisation on sites like 4chan.
Given the demonstrated risk of harm, this model should not be freely accessible. The medical community has well established guidelines on how to manage the sharing of research materials which involve a risk to human subjects, with data privacy being the most common risk. It is common to allow research access to datasets in this context via a registration platform, where the applicants who are seeking access must describe their proposed research, and sign an agreement for data use. See the NIH/TCIA and MIMIC datasets for examples. The latter even has a requirement for applicants to pass a course in human research ethics prior to obtaining access to the data.
A similar system should be in place here, and be used as the template for future model sharing where the model has the potential to produce harm.
@KCramer the hugging face hub is only used as storage space to download the model and to display the model card. I'm happy to have a discussion about these things, but I don't think this place is the correct environment.
@LaurenOR please point to an actual, concrete instance of harm that is caused by having this model be accessible that is unique to gpt-4chan and would not be possible e.g. with gpt-2 or gpt-j (or a simple database of swear words), all of which are also freely available. As I already said above, the contents of the experiment itself are not at issue on the hub here, since it had nothing to do with it.
We don't advocate or support the training and experiments done by the author with this model.
In fact, the experiment of having the model post messages on 4chan was IMO pretty bad and inappropriate and if the author would have asked us, we would probably have tried to discourage them from doing it.
After a lot of internal debate at HF, we decided not to remove the model that the author uploaded here in the conditions that:
#1 The model card & the video clearly warned about the limitations and problems raised by the model & the POL section of 4Chan in general
#2 The inference widget were disabled in order not to make it easier to use the model
We considered that it was useful for the field to test what a model trained on such data could do & how it fared compared to others (namely GPT-3) and would help draw attention both to the limitations and risks of such models. This work also brought interesting insights into the limitations of existing benchmarks by outperforming the TruthfulQA Benchmark compared to GPT-J and GPT-3. Finally, we thought it could help researchers analyze more easily some dark corners of the web like 4Chan that are already sometimes unfortunately part of the pre-training of these large language models (maybe to try to remove them/mitigate them?).
However, we are still just scratching the surface when it comes to ethics reviews (as most people in ML research) and would love to hear more feedback from the community to improve or correct mistakes if needed! We've also been working on a feature to "gate" such models that we're prioritizing right now for ethical reasons. Happy to answer any additional questions too!
FWIW, I do think this is the environment to have these discussions. This is a community issue, so the community forum seems good.
The model is open; so too is the discussion around it.
Agreed with @meg that these conversations ought to take place here.
Although I would not have condoned this experiment at this scale, there is value in understanding what this technology enables and I believe that HF (and hopefully 4chan??) will be better off from it!
I imagine that the greatest risk would not be the messages a user could generate via the inference widget in a single pass, but the many orders of magnitude more that could be automatically generated and disseminated with access to the model (as was demonstrated). This model's exceptional tendency to produce toxic/derogatory/adult content is what sets it apart from most models, and it seems appropriate for there to be some form of restricted access in place in this instance.
It seems to me that if a restricted-access feature is around the corner, it may be a good move to temporarily disable access to this and other notable high-risk models until it has been implemented. I could understand the concern about not being able to apply the policy equitably at first, but that seems less of a problem than the risk that open access poses.
@ykilcher I am not a regular on Hugging Face, so I have no opinion about proper venues. But I think whatever the proper venue for talking about this, the bots you make speak for you and you need to understand that.
I tried out the demo mode of your tool 4 times, using benign tweets from my feed as the seed text. In the first trial, one of the responding posts was a single word, the N word. The seed for my third trial was, I think, a single sentence about climate change. Your tool responded by expanding it into a conspiracy theory about the Rothchilds and Jews being behind it.
I only ran it a few times but got content that was off the rails in response 3/4 times. Not surprising. The seeds your tool used were actual Reddit content, and so probably not selected to be neutral. So if your tool wrote 30,000 posts, it is reasonable to assume that maybe 20,000 are toxic content, using a loose and restrictive definition of toxic.
This was only a reasonable thing to do if you assume that content posted to Reddit has no effect on the world.
Thanks for your feedback everyone! We rushed and just released the new gating feature that I mentioned and just enabled it for this model. Happy to answer any questions/comments/feedback.
I cannot access the model now. Although I clicked on the gating button,
tokenizer = AutoTokenizer.from_pretrained("ykilcher/gpt-4chan") is throwing a value error.
Exception has occurred: ValueError Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. Is this intentional or not? Wanting to experiment for research purposes.
Glad I got this running before the arbiters of morality pruned it.
I think the only warning that should be put on this model is that it is extremely based and can cause large scale lib melt downs. Also is there anything that would stop someone from training their own model, this seems kind of like a pointless endeavor.
@Aspie96 Alright I will concede that point, no reason to make this political. I think its worth mentioning that pretty much all models trained on real world data will produce some "toxic/derogatory/adult" outputs to some inputs. You can't police everything. Like wise if you try hard enough gpt-4chan will have outputs that are innocent/PC/Kosher to some inputs.