Spaces:
Running
Running
minor edits
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ pinned: false
|
|
17 |
|
18 |
# BigCode
|
19 |
|
20 |
-
BigCode is an open scientific collaboration working on responsible training of large language models for coding applications. You can find more information on the main [website](https://www.bigcode-project.org/) or follow Big Code on [Twitter](https://twitter.com/BigCodeProject). In this organization you can find the artefacts of this collaboration: **StarCoder 2**, a state-of-the-art language model for code, **StarCoder
|
21 |
|
22 |
---
|
23 |
<details>
|
@@ -82,7 +82,17 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
82 |
- [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
|
83 |
- [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
84 |
</details>
|
85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
86 |
---
|
87 |
<details>
|
88 |
<summary>
|
@@ -117,17 +127,6 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
117 |
- [Astraios-15B](https://huggingface.co/collections/bigcode/astraios-15b-65788b7476b6de79781054cc): Collection of StarCoderBase-15B models instruction tuned on CommitPackFT + OASST with 7 method.
|
118 |
</details>
|
119 |
---
|
120 |
-
<details>
|
121 |
-
<summary>
|
122 |
-
<b><font size="+1">📑The Stack</font></b>
|
123 |
-
</summary>
|
124 |
-
The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.
|
125 |
-
|
126 |
-
- [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
|
127 |
-
- [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
|
128 |
-
- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
|
129 |
-
</details>
|
130 |
-
---
|
131 |
<details>
|
132 |
<summary>
|
133 |
<b><font size="+1">🎅SantaCoder</font></b>
|
|
|
17 |
|
18 |
# BigCode
|
19 |
|
20 |
+
BigCode is an open scientific collaboration working on responsible training of large language models for coding applications. You can find more information on the main [website](https://www.bigcode-project.org/) or follow Big Code on [Twitter](https://twitter.com/BigCodeProject). In this organization you can find the artefacts of this collaboration: **StarCoder 2**, a state-of-the-art language model for code, and the previous **StarCoder** family of models, **The Stack**, the largest available pretraining dataset with perimssive code, **Astraios**, scaling instruction-tuned language models for code via diverse fine-tuning methods, **OctoPack**, artifacts for instruction tuning large code models, and **SantaCoder**, a 1.1B parameter model for code.
|
21 |
|
22 |
---
|
23 |
<details>
|
|
|
82 |
- [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
|
83 |
- [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
84 |
</details>
|
85 |
+
---
|
86 |
+
<details>
|
87 |
+
<summary>
|
88 |
+
<b><font size="+1">📑The Stack</font></b>
|
89 |
+
</summary>
|
90 |
+
The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.
|
91 |
+
|
92 |
+
- [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
|
93 |
+
- [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
|
94 |
+
- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
|
95 |
+
</details>
|
96 |
---
|
97 |
<details>
|
98 |
<summary>
|
|
|
127 |
- [Astraios-15B](https://huggingface.co/collections/bigcode/astraios-15b-65788b7476b6de79781054cc): Collection of StarCoderBase-15B models instruction tuned on CommitPackFT + OASST with 7 method.
|
128 |
</details>
|
129 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
130 |
<details>
|
131 |
<summary>
|
132 |
<b><font size="+1">🎅SantaCoder</font></b>
|