This license is NonCommercial, can we not have Apache2.0 like Mistral was?

#1
by artificialgenerations4gsdfg - opened

As a SWE I know the common licenses Apache/MIT/GPL/etc, and know how I can/can't incorporate them, but I don't know why this is using a CC license, and I cannot use this for anything unfortunately. Reading through it, it's pretty harsh actually, and I would have guessed this finetune would have been released in the same spirit as Mistral, ie Apache.

Any chance for Apache?

Hugging Face H4 org

We opted for the NC license in order to comply with the license from one of the source datasets (UltraChat https://huggingface.co/datasets/stingning/ultrachat). If the dataset owners are happy to change the license to a permissive one, then it would likely be fine to update the license of this model as well since UltraFeedback is MIT licensed.

cc @cyl @stingning maybe from ultrachat?

Hi, thanks for your suggestion. We have changed the license to MIT license for UltraChat. @clem @lewtun

That's great to hear, I guess now it's ok to change Zephyr to MIT? @lewtun ?

a License Change for Commercial use would be huge! Id love to use the model!

Hugging Face H4 org

The Zephyr license is now MIT πŸ€—. Thank you very much @cyl @stingning for enabling this change!

lewtun changed discussion status to closed

(I am not a lawyer and this is not legal advice).

The MIT licene is great.

However, if models are copyright-worthy at all (which is an open question, and I think the answer is and should be "no"), it's likely that this model is a derivative of Mistral, while it's not necessarily the case that it's a derivative of training data (which would be a problem anyways, since that'd include Mistral's training data).

If the user is bound to the MIT license, they are likely bound to the Apache 2.0 license (from Mistral) too anyways, so I think the Apache 2.0 license would actually make more sense, for this repo, than the MIT license (even if some of the data is MIT-licensed).

Curious, since UltraChat is synthetic data generated by ChatGPT by OpenAI, wouldn't it violate 2(c)(iii) of their terms of use if this model was somehow used for production for commercial purposes?

(iii) use output from the Services to develop models that compete with OpenAI

(not a lawyer, just a naive dev asking. Love the model btw)

@alexweberk I'm not sure if it's even relevant unless you are a client of OpenAI. ChatGPT's TOS aren't a law. What right would they have to restrict this model, regardless of what they say in the TOS? Using an argument from copyright would require a copyright-maximalist take that would not benefit what OpenAI needs to do to train models.

Thanks @Aspie96 ! I would love to think that way, and you raise great points. Still curious what the implications are if you were to use this model for business use cases (I hope there are none).

I hope so too.

As I said, I am not a lawyer, and also there are multiple jurisdictions in the world. Regardless, I don't see how a mere user of this model, commercial or not, would be bound by OpenAI's TOS.

@lewtun thank you for the model! A quick question, since UltraFeedback is using commercial models (including Bard) and Llama2 series (which have their own license), I am still a bit confused on how the model (or the dataset for matter) can be issued under MIT. Thanks in advance!
Reference: https://github.com/OpenBMB/UltraFeedback#dataset-construction

Sign up or log in to comment