How can be this model here MIT licensed, when it is based on Stanford Alpaca?

#2
by bero1985 - opened

I've a serious question:

How can be this model here MIT licensed, when it is based on Stanford Alpaca, which in turn is based on LLaMA, and which in turn again it is based on OpenAI’s text-davinci-003, whose terms of use prohibit developing models that compete with OpenAI?

Quote: "We emphasize that Alpaca is intended only for academic research and any commercial use is prohibited. There are three factors in this decision: First, Alpaca is based on LLaMA, which has a non-commercial license, so we necessarily inherit this decision. Second, the instruction data is based on OpenAI’s text-davinci-003, whose terms of use prohibit developing models that compete with OpenAI. Finally, we have not designed adequate safety measures, so Alpaca is not ready to be deployed for general use."

Can you explain to me how this is possible? Because that might be a gross licence violation, as much as I would like to see such a model licensed as MIT, ZLIB or BSD. :-)

  1. OpenAI doesn't prohibit the use of its outputs for developing competing models, only the use of its API. I didn't make any requests to the API — that was Tatsu's folks! I just took their dataset and am using it non-commercially.

  2. The repo only contains LoRA weights, which are a sort of "diff" between the base model and the adapted model. Think about it like a FOSS program built for Windows :)

If I use the outputs of this model to help me in my daily job would I be violating any license? I'd like to use it to write documentation for company code.

I am still confused about the license for this model. Can it be used commercially or not? The model card says MIT license, the repo says Apache license.

Great points @tloen !

While I also first saw it as you. The following for the alpaca-dataset still applies:

"Usage and License Notices: Alpaca is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes." from https://github.com/tatsu-lab/stanford_alpaca.

Therefore I think that your adapter model should also by cc-by-nc-4.

What do you think?

Hi @ludwigstumpp ,

The Alpaca dataset was made public under ODC-By, which states:

Releasing the Database under different license terms or stopping the distribution of the Database will not withdraw this License (or any other license that has been, or is required to be, granted under the terms of this License), and this License will continue in full force and effect unless terminated as stated above.

Therefore the dataset may be redistributed indefinitely under the original, more permissive license. I'll update my README to comply with section 4.3.a of that license.

This comment has been hidden

Sign up or log in to comment