loubnabnl HF staff commited on
Commit
499d295
1 Parent(s): 9fa76a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -5
README.md CHANGED
@@ -42,6 +42,17 @@ BigCode is an open scientific collaboration working on responsible training of l
42
  - [StarCoder2 Membership Test](https://stack-dev.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
43
  </details>
44
  ---
 
 
 
 
 
 
 
 
 
 
 
45
  <details>
46
  <summary>
47
  <b><font size="+1">💫StarCoder</font></b>
@@ -94,13 +105,10 @@ BigCode is an open scientific collaboration working on responsible training of l
94
  <summary>
95
  <b><font size="+1">📑The Stack</font></b>
96
  </summary>
97
- The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.<br>
98
- The Stack v2 is a 67.5TB dataset of source code in over 600 programming languages with permissive licenses or no license.
99
-
100
  - [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
101
- - [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2): Exact deduplicated version of The Stack v2.
102
  - [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
103
- - [The Stack v2 dedup](https://huggingface.co/datasets/bigcode/the-stack-v2-dedup): Near deduplicated version of The Stack v2 (recommended for training).
104
  - [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
105
  </details>
106
  ---
 
42
  - [StarCoder2 Membership Test](https://stack-dev.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
43
  </details>
44
  ---
45
+ <details>
46
+ <summary>
47
+ <b><font size="+1">📑The Stack v2</font></b>
48
+ </summary>
49
+ The Stack v2 is a 67.5TB dataset of source code in over 600 programming languages with permissive licenses or no license.
50
+
51
+ - [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2): Exact deduplicated version of The Stack v2.
52
+ - [The Stack v2 dedup](https://huggingface.co/datasets/bigcode/the-stack-v2-dedup): Near deduplicated version of The Stack v2 (recommended for training).
53
+ - [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
54
+ </details>
55
+ ---
56
  <details>
57
  <summary>
58
  <b><font size="+1">💫StarCoder</font></b>
 
105
  <summary>
106
  <b><font size="+1">📑The Stack</font></b>
107
  </summary>
108
+ The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.
109
+
 
110
  - [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
 
111
  - [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
 
112
  - [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
113
  </details>
114
  ---