🚩 Report : Legal issue(s)

#27
by alexjc - opened

Hi. I'm sorry if this report is unnecessarily disruptive. I had no luck when discussing this with HuggingFace legal & privacy team directly, they casually directed me to this part of the website instead of providing answers. I think it's an important legal issue and I hope CompVis, as owners of this repository, treat the matter more seriously than HuggingFace has done so far.

I'm a rightsholder with Copyrighted materials in the LAION dataset that was used to train all these models. This applies to 1.1, 1.2, 1.3 and 1.4 in this repository.

It's not my wish to opt-out of the dataset or have that removed as I understand it serves an important purpose for the community. Besides, the training has already happened and it's too late to do anything about that.

Therefore:

  • I hereby claim, as a rightsholder whose data was used, that no legal basis has been established that allows licensing these models under the OpenRAIL-M license.
  • I hereby request that the license of the models be formally withdrawn (or declared void anyway) and new terms be written that are compliant with the EU Copyright Directive.

I believe this is an honest mistake on the part of CompVis, and I hope it can be addressed quickly to address the concerns of rightsholders involved — including myself.

I address both points in more detail below.

No legal basis has been established that allows licensing of these models under any license, as there's no evidence CompVis alone has the rights to this model.

  1. Meta recently claimed Copyright over the LLaMa weights, and HuggingFace took them down, thus we can assume that Copyright is involved for ML models.
  2. It's unknown if the models are a derivative work of the training data or not, based on comments from experts of LAION and HF [1,2] and research papers [3].
  3. The ownership of Copyrights in the model is unknown and thus none of the parties involved are able to license the model without potentially infringing the rights of others.
  4. The legal basis the model was created under is Article 3 of Directive (EU) 2019/790 on Copyright in the Digital Single Market (C-DSM).
  5. Thus, the conditions set forth by the C-DSM are the only established basis under which it can be released — if at all.

If there are any counter-arguments that (i) the model is known to not be a derivative work of the dataset, and (ii) the C-DSM's Article 3 allows for redistribution and licensing, please put them in written form. I have compiled research that suggests both of these would be incorrect, but I welcome your proof nontheless.

II. LICENSE CHANGE

Given the situation, my request is not to have the models taken down as they serve an important role in the community.

However, I request the following:

  • The license should be withdrawn as CompVis is unable to grant these rights. Alternatively, the license should be declared void for the same reason.
  • New terms that clarify the legal framework under which the research was done should be written, specifically allowing for only scientific research and cultural works — in both cases only non-profit organizations (as described in the C-DSM).

Please note that these are my best suggestions to resolve this constructively and in a way that's non-destructive to the community. I believe there's a legal basis for any rightsholder to have the models completely taken down, but that's not an option I would personally like to pursue.

I had hoped to receive better answers directly from HuggingFace, since it's their platform, and the release of models is a core use. Indeed, you would expect HF to have better legal understanding of these kinds of topics. As it stands, it looks like they didn't even envisage the legal aspects of model distribution under Copyright laws worldwide — which frankly could be considered negligence.

[1] rom1504 "this open legal question that applies to almost all machine learning models"
https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/discussions/4#637eb78f1dbae0919105b36d

[2] rwightman "that truly is a question for the lawers and the scope of such a question covers most ML models"
https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/discussions/4#637f05d76df7e8f7df7899be

[3] AI Derivatives: the Application to the Derivative Work Right to Literary and Artistic Productions of AI Machines
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4022665

Since it's been 3 weeks, I'd just like to clarify the legal basis for this request in the context of Directive (EU) 2019/790 on copyright and related rights in the Digital Single Market (a.k.a. the DSM):

  • CompVis appears to be in violation of DSM §3.2 and I'm enquiring based on my rights from §3.3 among others.
  • HuggingFace appears to violate DSM §17 in particular 17.4, as neither (a) nor (b) nor (c) have been complied with.

How are these Legal Issues reported in the Community Tab supposed to work if nobody is seemingly complying with applicable directives?

I hope we can resolve this in good faith. Please reply by April 24th end-of-business; it'll be 1.5 months since this was originally posted and none of the original report has been addressed.

Hi, im either a employ from stability AI or hugging face. But :
§17 of the DSM Directive dose not apply to Hugging face. Why ? Its a Opensoftware-repo. For such plattforms, there is a clear exception:

§2 DSM Definitions:
"‘online content-sharing service provider’ means a provider of an information society service of which the main or one of the main purposes is to store and give the public access to a large amount of copyright-protected works or other protected subject matter uploaded by its users, which it organises and promotes for profit-making purposes.

Providers of services, such as not-for-profit online encyclopedias, not-for-profit educational and scientific repositories, open source software-developing and-sharing platforms, providers of electronic communications services as defined in Directive (EU) 2018/1972, online marketplaces, business-to-business cloud services and cloud services that allow users to upload content for their own use, are not ‘online content-sharing service providers’ within the meaning of this Directive."

There is also no "which it organises and promotes for profit-making purposes" on hugging face.. i see no adds ore other commerical income.. prove me wrong.

No the main topic: I think, the models are holing only information how a picture "looks like.." , such named "features" compind with words. Not the Work it self.

Some Rightholter arguemnt it is like a huge Database, which containing each work, and the promb is like a search command. I doup it. We talking form a 10 GB Model, wich has the information form 2 Billion (german Number..) Pictures. So it's impossible. It has only the Colors of a style, typlice coners an lines of some objects and so on stored. The Pormpt execude the Model, and it will cread with its wieghts a picture. But, it depents on the prompt.

I will give you a other, easyer to unterstand example:
https://libraryofbabel.info/

It is a virtual liberay of endless Books. You can ender every text here, and it will find it in its liberay. Not why it has this work storate. It will generated by this prompt.

Question:
How should be the issue with DSM RL Art 3.2 and 3.3 excatly be ?

Article 3
2. Copies of works or other subject matter made in compliance with paragraph 1 shall be stored with an appropriate level of security and may be retained for the purposes of scientific research, including for the verification of research results.

  1. Rightholders shall be allowed to apply measures to ensure the security and integrity of the networks and databases where the works or other subject matter are hosted. Such measures shall not go beyond what is necessary to achieve that objective.

The questen is: Ist Art 3 oder Art 4 here the right one ? It hink more likely 4, becourse Stablitiy Ai is not a sience Org..

  1. Member States shall provide for an exception or limitation to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, Article 4(1)(a) and (b) of Directive 2009/24/EC and Article 15(1) of this Directive for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining.

  2. Reproductions and extractions made pursuant to paragraph 1 may be retained for as long as is necessary for the purposes of text and data mining.

  3. The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.

  4. This Article shall not affect the application of Article 3 of this Directive.

Her, your right, the Material must deletet, if the purposes ends. But they still creating new versions of the models and optimizing the system with new features .
So i think, the purposes is still there.

I will wait for the formal reply of HuggingFace and CompVis. But to briefly address your questions:

  • Regarding §2 DSM, HuggingFace hosts binaries of models (i.e. not source code) and many of the licenses are not considered under the OSI definition. Furthermore, they are a for-profit enterprise that raised $100M recently and charge for services.
  • There are many other reasons that HuggingFace is responsible for the content here, including the fact that commercial infringement can be a civil or criminal offence depending on its scale. Also, they are bound by their TOS to try to resolve this amicably.

As for §3 vs §4, you misunderstood: this is the CompVis repository and not owned by StabilityAI. It's a university that falls under §3 because they did not process image opt-outs as per §4. New features would be considered a different purpose anyway, not the same event legally.

on thing:

if you wand to have a "safe" model, use this. Its only trained by cco and opt-in artist.:
https://huggingface.co/Mitsua/mitsua-diffusion-one/

It's not specifically about having a safe model, it's about the models that are available being correctly licensed. Claiming content that you don't have the right to license and distribute is a form of copyright fraud.

I hope we can resolve this in good faith. Please reply by April 24th end-of-business.

The deadline was missed, even when given a comfortable buffer of a week. I don't see there being a reasonable and good faith effort either on behalf of CompVis nor HuggingFace to find a solution and resolve the problem.

If I missed something let me know...

Sign up or log in to comment