maom's picture
Rename sections/09_FAQ to sections/08_faq.md
0b0348e verified

Questions and Answers

My dataset is a metadata curation of multiple datasets from different places, how should I license it?

If you are building on others' work, It is important to respect their licenses. How you do this will fall into three buckets

  • The data cannot be used, e.g. because of a proprietary license restriction
  • The data can be used with or without some restriction. For example, if the source dataset is licensed under the creative commons open source license CC BY-SA 4.0, it requires redistribution to "ShareAlike". Or, the authors require signing a specific usage license, for example, the Rocklin lab requires registering the use of the MegaScale dataset so they can demonstrate impact to maintain grant support.
  • The license of the source data is not clear. In this case, it is best to reach out to the original authors and either request that they adopt a license (open source or otherwise), or get explicit permission to re-share the data

My dataset is made up of a bunch of small tabular datasets, should I make them each a sub-dataset or different "splits"?