loubnabnl HF staff commited on
Commit
66aea4c
1 Parent(s): bb51c11

Update datasets/github_code.md

Browse files
Files changed (1) hide show
  1. datasets/github_code.md +4 -0
datasets/github_code.md CHANGED
@@ -19,6 +19,10 @@ print(next(iter(ds)))
19
  ```
20
  You can see that in addition to the code, the samples include some metadata: repo name, path, language, license, and the size of the file. Below is the distribution of programming languages in this dataset.
21
 
 
 
 
 
22
  Below is the distribution of the pretraining data size of some code models:
23
  <p align="center">
24
  <img src="https://huggingface.co/datasets/loubnabnl/repo-images/resolve/main/data_distrub.png" alt="drawing" width="450"/>
 
19
  ```
20
  You can see that in addition to the code, the samples include some metadata: repo name, path, language, license, and the size of the file. Below is the distribution of programming languages in this dataset.
21
 
22
+ <p align="center">
23
+ <img src="https://huggingface.co/datasets/lvwerra/github-code/resolve/main/github-code-stats-alpha.png" alt="drawing" width="450"/>
24
+ </p>
25
+
26
  Below is the distribution of the pretraining data size of some code models:
27
  <p align="center">
28
  <img src="https://huggingface.co/datasets/loubnabnl/repo-images/resolve/main/data_distrub.png" alt="drawing" width="450"/>