loubnabnl HF staff commited on
Commit
f1e07de
1 Parent(s): d0102f5

minor edits

Browse files
Files changed (1) hide show
  1. README.md +12 -13
README.md CHANGED
@@ -17,7 +17,7 @@ pinned: false
17
 
18
  # BigCode
19
 
20
- BigCode is an open scientific collaboration working on responsible training of large language models for coding applications. You can find more information on the main [website](https://www.bigcode-project.org/) or follow Big Code on [Twitter](https://twitter.com/BigCodeProject). In this organization you can find the artefacts of this collaboration: **StarCoder 2**, a state-of-the-art language model for code, **StarCoder**, a previous state-of-the-art language model for code, **Astraios**, scaling instruction-tuned language models for code via diverse fine-tuning methods , **OctoPack**, artifacts for instruction tuning large code models, **The Stack**, the largest available pretraining dataset with perimssive code, and **SantaCoder**, a 1.1B parameter model for code.
21
 
22
  ---
23
  <details>
@@ -82,7 +82,17 @@ BigCode is an open scientific collaboration working on responsible training of l
82
  - [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
83
  - [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
84
  </details>
85
-
 
 
 
 
 
 
 
 
 
 
86
  ---
87
  <details>
88
  <summary>
@@ -117,17 +127,6 @@ BigCode is an open scientific collaboration working on responsible training of l
117
  - [Astraios-15B](https://huggingface.co/collections/bigcode/astraios-15b-65788b7476b6de79781054cc): Collection of StarCoderBase-15B models instruction tuned on CommitPackFT + OASST with 7 method.
118
  </details>
119
  ---
120
- <details>
121
- <summary>
122
- <b><font size="+1">📑The Stack</font></b>
123
- </summary>
124
- The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.
125
-
126
- - [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
127
- - [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
128
- - [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
129
- </details>
130
- ---
131
  <details>
132
  <summary>
133
  <b><font size="+1">🎅SantaCoder</font></b>
 
17
 
18
  # BigCode
19
 
20
+ BigCode is an open scientific collaboration working on responsible training of large language models for coding applications. You can find more information on the main [website](https://www.bigcode-project.org/) or follow Big Code on [Twitter](https://twitter.com/BigCodeProject). In this organization you can find the artefacts of this collaboration: **StarCoder 2**, a state-of-the-art language model for code, and the previous **StarCoder** family of models, **The Stack**, the largest available pretraining dataset with perimssive code, **Astraios**, scaling instruction-tuned language models for code via diverse fine-tuning methods, **OctoPack**, artifacts for instruction tuning large code models, and **SantaCoder**, a 1.1B parameter model for code.
21
 
22
  ---
23
  <details>
 
82
  - [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
83
  - [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
84
  </details>
85
+ ---
86
+ <details>
87
+ <summary>
88
+ <b><font size="+1">📑The Stack</font></b>
89
+ </summary>
90
+ The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.
91
+
92
+ - [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
93
+ - [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
94
+ - [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
95
+ </details>
96
  ---
97
  <details>
98
  <summary>
 
127
  - [Astraios-15B](https://huggingface.co/collections/bigcode/astraios-15b-65788b7476b6de79781054cc): Collection of StarCoderBase-15B models instruction tuned on CommitPackFT + OASST with 7 method.
128
  </details>
129
  ---
 
 
 
 
 
 
 
 
 
 
 
130
  <details>
131
  <summary>
132
  <b><font size="+1">🎅SantaCoder</font></b>