loubnabnl HF staff commited on
Commit
a8caee6
1 Parent(s): 4d8fc44

Update datasets/intro.md

Browse files
Files changed (1) hide show
  1. datasets/intro.md +1 -1
datasets/intro.md CHANGED
@@ -5,4 +5,4 @@ Below is the distribution of the pretraining data size of some code models, we p
5
  <img src="https://huggingface.co/datasets/loubnabnl/repo-images/resolve/main/data_distrub.png" alt="drawing" width="440"/>
6
  </p>
7
 
8
- Some other useful datasets that are available on the 🤗 Hub are [CodeSearchNet](https://huggingface.co/datasets/code_search_net), a corpus of 2 milllion (comment, code) pairs from open-source libraries hosted on GitHub for several programming languages, and [Mostly Basic Python Problems (mbpp)](https://huggingface.co/datasets/mbpp), a benchmark of around 1,000 crowd-sourced Python programming problems, for entry level programmers, where each problem consists of a task description, code solution and 3 automated test cases, this dataset was used in [InCoder](https://huggingface.co/facebook/incoder-6B) evaluation in addition to [HumanEval](https://huggingface.co/datasets/openai_humaneval) that we will present later.
 
5
  <img src="https://huggingface.co/datasets/loubnabnl/repo-images/resolve/main/data_distrub.png" alt="drawing" width="440"/>
6
  </p>
7
 
8
+ Some other useful datasets that are available on the 🤗 Hub are [CodeSearchNet](https://huggingface.co/datasets/code_search_net), a corpus of 2 milllion (comment, code) pairs from open-source libraries hosted on GitHub for several programming languages, and [Mostly Basic Python Problems (mbpp)](https://huggingface.co/datasets/mbpp), a benchmark of around 1,000 crowd-sourced Python programming problems, for entry level programmers, where each problem consists of a task description, code solution and 3 automated test cases, this dataset was used in [InCoder](https://huggingface.co/facebook/incoder-6B) evaluation in addition to [HumanEval](https://huggingface.co/datasets/openai_humaneval) that we will present later. You can also find [APPS](https://huggingface.co/datasets/loubnabnl/apps), a benchmark with 10000 problems consisting of programming questions in English and code solutions in Python, this dataset was also used in Codex evaluation along with HumanEval.