loubnabnl HF staff commited on
Commit
a3de0e1
1 Parent(s): cb371e9
Files changed (1) hide show
  1. datasets/github_code.txt +2 -2
datasets/github_code.txt CHANGED
@@ -1,4 +1,4 @@
1
- We also released [Github code dataset](https://huggingface.co/datasets/lvwerra/github-code), a 1TB of code data from Github repositories from 32 programming languages. The dataset can be loaded in a streaming mode if you don't want to download it because of memory issues, this will create an iterable dataset:
2
 
3
  ```python
4
  from datasets import load_dataset
@@ -17,6 +17,6 @@ print(next(iter(ds)))
17
  }
18
 
19
  ```
20
- You can see that in addition to the code, the samples include the metadata: repo name, path, language, license, and the size of the file.
21
 
22
  For model-specific information about the pretraining dataset, please select a model below:
 
1
+ We also released [Github code dataset](https://huggingface.co/datasets/lvwerra/github-code), a 1TB of code data from Github repositories in 32 programming languages. The dataset can be loaded in a streaming mode if you don't want to download it because of memory issues, this will create an iterable dataset:
2
 
3
  ```python
4
  from datasets import load_dataset
 
17
  }
18
 
19
  ```
20
+ You can see that in addition to the code, the samples include some metadata: repo name, path, language, license, and the size of the file.
21
 
22
  For model-specific information about the pretraining dataset, please select a model below: