loubnabnl HF staff commited on
Commit
8fd7e3c
1 Parent(s): 28f951a
Files changed (1) hide show
  1. evaluation/intro.txt +3 -3
evaluation/intro.txt CHANGED
@@ -9,11 +9,11 @@ For most models, we sample 200 candidate program completions, and compute pass@1
9
  |InCoder 🦜 (6.7B) | 15.2% | 27.8% | 47.00% |
10
  |||||
11
  |Codex (25M)| 3.21% | 7.1% | 12.89%|
12
- |Codex (85M)| 8.22% | 12.81% | 22.40% |
13
  |Codex (300M)| 13.17%| 20.37% | 36.27% |
14
  |Codex (12B)| 28.81%| 46.81% | 72.31% |
15
  |||||
16
  |GPT-neo (125M)| 0.75% | 1.88% | 2.97% |
17
  |GPT-neo (1.5B)| 4.79% | 7.47% | 16.30% |
18
- |GPT-neo (2.7B)| 6.41% | 11.27% | 21.37% |
19
- |GPT-J (6B)| 11.62% | 15.74% | 27.74% |
 
 
9
  |InCoder 🦜 (6.7B) | 15.2% | 27.8% | 47.00% |
10
  |||||
11
  |Codex (25M)| 3.21% | 7.1% | 12.89%|
 
12
  |Codex (300M)| 13.17%| 20.37% | 36.27% |
13
  |Codex (12B)| 28.81%| 46.81% | 72.31% |
14
  |||||
15
  |GPT-neo (125M)| 0.75% | 1.88% | 2.97% |
16
  |GPT-neo (1.5B)| 4.79% | 7.47% | 16.30% |
17
+ |GPT-J (6B)| 11.62% | 15.74% | 27.74% |
18
+
19
+ To better understand how pass@k metric works, we will illustrate it with some examples. We select 4 tasks from the HumanEval dataset and see how the models performs and which code completions pass the unit tests.