Spaces:

codeparrot
/

code-generation-models

Running

loubnabnl HF Staff commited on May 25, 2022

Commit

a3de0e1

1 Parent(s): cb371e9

update

Files changed (1) hide show

datasets/github_code.txt CHANGED Viewed

@@ -1,4 +1,4 @@
-We also released [Github code dataset](https://huggingface.co/datasets/lvwerra/github-code), a 1TB of code data from Github repositories from 32 programming languages. The dataset can be loaded in a streaming mode if you don't want to download it because of memory issues, this will create an iterable dataset:
 ```python
 from datasets import load_dataset
@@ -17,6 +17,6 @@ print(next(iter(ds)))
 }
 ```
-You can see that in addition to the code, the samples include the metadata: repo name, path, language, license, and the size of the file.
 For model-specific information about the pretraining dataset, please select a model below:

+We also released [Github code dataset](https://huggingface.co/datasets/lvwerra/github-code), a 1TB of code data from Github repositories in 32 programming languages. The dataset can be loaded in a streaming mode if you don't want to download it because of memory issues, this will create an iterable dataset:
 ```python
 from datasets import load_dataset
 }
 ```
+You can see that in addition to the code, the samples include some metadata: repo name, path, language, license, and the size of the file.
 For model-specific information about the pretraining dataset, please select a model below: