Does Bloom adhere to the EU responsibly sourced data initiative

#258
by HunleyExpress - opened

The RAILS approach for Bloom indicates that it would adhere to responsibly sourced data. Given the recent copyright infringement suits against various LLM, I would like to know if the Bloom data sets (training data) adheres to the EU mandates and will provide some assurances that the data used to train it will not create direct copyright infringement issues in its use (assuming the use meets the use based restrictions. I understand that it is possible a model randomly recreates a copyright material, but if there is some proof that it was not trained on that (or other/similar) copyright material then that is pure chance, not a misuse of AI technology. Thank you!

Sign up or log in to comment