loubnabnl HF staff commited on
Commit
5e62a6d
β€’
1 Parent(s): 64f4368

update the note

Browse files
Files changed (1) hide show
  1. app.py +1 -1
app.py CHANGED
@@ -34,7 +34,7 @@ As part of the BigCode project, we released and maintain [The Stack V2](https://
34
 
35
  This tool lets you check if a repository under a given username is part of The Stack dataset. Would you like to have your data removed from future versions of The Stack? You can opt-out following the instructions [here](https://www.bigcode-project.org/docs/about/the-stack/#how-can-i-request-that-my-data-be-removed-from-the-stack). Note that previous opt-outs might still be displayed in the release candidate (denoted with "-rc"), which will be removed for the release.
36
 
37
- **Note**: The Stack v2.0 is built from the data provided by the [Software Heriage Archive](https://archive.softwareheritage.org/), so it may include repositories that are no longer present on GitHub.
38
 
39
  **Data source**:\
40
  <img src="https://annex.softwareheritage.org/public/logo/software-heritage-logo-title.2048px.png" alt="Logo" style="height: 3em; vertical-align: middle;" />
 
34
 
35
  This tool lets you check if a repository under a given username is part of The Stack dataset. Would you like to have your data removed from future versions of The Stack? You can opt-out following the instructions [here](https://www.bigcode-project.org/docs/about/the-stack/#how-can-i-request-that-my-data-be-removed-from-the-stack). Note that previous opt-outs might still be displayed in the release candidate (denoted with "-rc"), which will be removed for the release.
36
 
37
+ **Note:** The Stack v2.0 is built from public GitHub code provided by the [Software Heriage Archive](https://archive.softwareheritage.org/). It may include repositories that are no longer present on GitHub but were archived by Software Heritage. Before training the StarCoder 1 and 2 models an additional PII pipeline was run to remove names, emails, passwords and API keys from the code files. For more information see the [paper](https://arxiv.org/abs/2402.19173).
38
 
39
  **Data source**:\
40
  <img src="https://annex.softwareheritage.org/public/logo/software-heritage-logo-title.2048px.png" alt="Logo" style="height: 3em; vertical-align: middle;" />