loubnabnl HF staff commited on
Commit
67b7d8f
1 Parent(s): 873252e

update datasets

Browse files
Files changed (1) hide show
  1. datasets/github_code.txt +1 -1
datasets/github_code.txt CHANGED
@@ -1,4 +1,4 @@
1
- We also released [Github code dataset](https://huggingface.co/datasets/lvwerra/github-code), a 1TB of code data from Github repositories in 32 programming languages. The dataset can be loaded in a streaming mode if you don't want to download it because of memory issues, this will create an iterable dataset:
2
 
3
  ```python
4
  from datasets import load_dataset
 
1
+ We also released [Github code dataset](https://huggingface.co/datasets/lvwerra/github-code), a 1TB of code data from Github repositories in 32 programming languages. It was created from the public GitHub dataset on Google [BigQuery](https://cloud.google.com/blog/topics/public-datasets/github-on-bigquery-analyze-all-the-open-source-code). The dataset can be loaded in a streaming mode if you don't want to download it because of memory issues, this will create an iterable dataset:
2
 
3
  ```python
4
  from datasets import load_dataset