🚩 Report: Legal issue(s)

#5
by v1ckxy - opened

The Spanish derivative dataset based on alpaca & dolly-15k is not provided.

As a derivative work, this is contrary to the licence used by both original datasets:
databricks/databricks-dolly-15k, license cc-by-sa-3.0
https://creativecommons.org/licenses/by-sa/3.0/es/

tatsu-lab/alpaca, license cc-by-sa-4.0
https://creativecommons.org/licenses/by-sa/4.0/

As they contain the clause:
ShareAlike - If you remix, transform or build upon the material, you must distribute your contributions under the same licence as the original.

cc @mariagrandury @mrm8488 for thoughts? And thanks for taking the time to open a report @v1ckxy !

My fault; alpaca is licensed as "cc by-nc".
HOWEVER... dolly15k is licensed as sharealike (*-sa)
image.png

so the final used dataset it's a mixture of these two plus something more, hence the point "if you create a derivative work, you should share it under the same license" applies Β―_(ツ)_/Β―

licencias_cc.1.png

After a month, you still haven't replied.

We are still waiting for the dataset that has been used and due to the licence of the original dataset you have to do it.

CliBrAIn org

Hello @v1ckxy , and sorry for the late response. We've been working on releasing new and more powerful models like Lince Mistral to the community. Sorry for the misunderstanding, but we did not use the mentioned datasets as baseline but just for analysis of the content they have on it (i.e. distribution of topics, formats, etc). We created new ones from scratch with completely new structures and information, so the license you are referring to for Dolly15k is not applicable as it is not derivative work from the dataset. Again, thank you for pointing it out, and the sorry for the inconvenience and miss understanding

Whatever you say.

It must be scary to see the dataset used during training, given that the biased answers you get from this specific model.

Sign up or log in to comment