Update README.md
Browse files
README.md
CHANGED
@@ -207,8 +207,6 @@ The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2
|
|
207 |
The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), along with [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) and [Github Issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues) (BigCode., 2023), and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)). We further supplement our training with data from mathematical domains ([Azerbayev, Zhangir, et al., 2023](https://arxiv.org/abs/2310.10631) and, [Yu, Longhui, et al., 2023](https://arxiv.org/abs/2309.12284)).
|
208 |
|
209 |
Top 18 programming languages trained on:
|
210 |
-
<details>
|
211 |
-
<summary> Click to expand </summary>
|
212 |
- C
|
213 |
- CPP
|
214 |
- Java
|
@@ -227,7 +225,6 @@ Top 18 programming languages trained on:
|
|
227 |
- Python
|
228 |
- Jupyter-Clean
|
229 |
- RestructuredText
|
230 |
-
</details>
|
231 |
|
232 |
### Training Procedure
|
233 |
|
|
|
207 |
The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), along with [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) and [Github Issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues) (BigCode., 2023), and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)). We further supplement our training with data from mathematical domains ([Azerbayev, Zhangir, et al., 2023](https://arxiv.org/abs/2310.10631) and, [Yu, Longhui, et al., 2023](https://arxiv.org/abs/2309.12284)).
|
208 |
|
209 |
Top 18 programming languages trained on:
|
|
|
|
|
210 |
- C
|
211 |
- CPP
|
212 |
- Java
|
|
|
225 |
- Python
|
226 |
- Jupyter-Clean
|
227 |
- RestructuredText
|
|
|
228 |
|
229 |
### Training Procedure
|
230 |
|