Update markdown.py
Browse files- markdown.py +2 -1
markdown.py
CHANGED
@@ -52,12 +52,13 @@ Citation: `@inproceedings{...`
|
|
52 |
|
53 |
The [contamination_report.csv](https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Report/blob/main/contamination_report.csv) file is a csv filed with `;` delimiters. You will need to update the following columns:
|
54 |
- Evaluation Dataset: Name of the evaluation dataset contaminated. If available in the HuggingFace Hub please write the path (e.g. `uonlp/CulturaX`), otherwise proviede the name of the dataset.
|
|
|
55 |
- Contaminated Source: Name of the model that has been trained with the evaluation dataset or name of the pre-training copora that contains the evaluation datset. If available in the HuggingFace Hub please write the path (e.g. `allenai/OLMo-7B`), otherwise proviede the name of the model/dataset.
|
56 |
- Train split: Percentage of the train split contaminated. 0 means no contamination. 1 means that the dataset has been fully contamianted. If the dataset doesn't have splits, you can consider that the full dataset is a train or test split.
|
57 |
- Development split: Percentage of the development split contaminated. 0 means no contamination. 1 means that the dataset has been fully contamianted.
|
58 |
- Train split: Percentage of the test split contaminated. 0 means no contamination. 1 means that the dataset has been fully contamianted. If the dataset doesn't have splits, you can consider that the full dataset is a train or test split.
|
59 |
- Approach: data-based or model-based approach. See above for more information.
|
60 |
-
-
|
61 |
- PR Link: Leave it blank, we will update it after you create the Pull Request.
|
62 |
""".strip()
|
63 |
|
|
|
52 |
|
53 |
The [contamination_report.csv](https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Report/blob/main/contamination_report.csv) file is a csv filed with `;` delimiters. You will need to update the following columns:
|
54 |
- Evaluation Dataset: Name of the evaluation dataset contaminated. If available in the HuggingFace Hub please write the path (e.g. `uonlp/CulturaX`), otherwise proviede the name of the dataset.
|
55 |
+
- Subset: Many HuggingFace datasets have different subsets or splits on a single dataset. This field is to define a particular subset of a given dataset. For example, `qnli` subset of `glue`.
|
56 |
- Contaminated Source: Name of the model that has been trained with the evaluation dataset or name of the pre-training copora that contains the evaluation datset. If available in the HuggingFace Hub please write the path (e.g. `allenai/OLMo-7B`), otherwise proviede the name of the model/dataset.
|
57 |
- Train split: Percentage of the train split contaminated. 0 means no contamination. 1 means that the dataset has been fully contamianted. If the dataset doesn't have splits, you can consider that the full dataset is a train or test split.
|
58 |
- Development split: Percentage of the development split contaminated. 0 means no contamination. 1 means that the dataset has been fully contamianted.
|
59 |
- Train split: Percentage of the test split contaminated. 0 means no contamination. 1 means that the dataset has been fully contamianted. If the dataset doesn't have splits, you can consider that the full dataset is a train or test split.
|
60 |
- Approach: data-based or model-based approach. See above for more information.
|
61 |
+
- Reference: If there is paper or any other resource describing how you have detected this contamination example, provide the URL.
|
62 |
- PR Link: Leave it blank, we will update it after you create the Pull Request.
|
63 |
""".strip()
|
64 |
|