Spaces:

codeparrot
/

code-generation-models

Running

loubnabnl HF Staff commited on May 24, 2022

Commit

8fd7e3c

1 Parent(s): 28f951a

add table

Files changed (1) hide show

evaluation/intro.txt CHANGED Viewed

@@ -9,11 +9,11 @@ For most models, we sample 200 candidate program completions, and compute pass@1
 |InCoder 🦜 (6.7B) | 15.2% | 27.8% | 47.00% |
 |||||
 |Codex (25M)| 3.21% | 7.1% |	12.89%|
-|Codex (85M)| 8.22%	| 12.81% | 22.40% |
 |Codex (300M)| 13.17%| 20.37% | 36.27% |
 |Codex (12B)| 28.81%| 46.81% | 72.31% |
 |||||
 |GPT-neo (125M)| 0.75% | 1.88% | 2.97% |
 |GPT-neo (1.5B)| 4.79% | 7.47% | 16.30% |
-|GPT-neo (2.7B)| 6.41% | 11.27% | 21.37% |
-|GPT-J (6B)| 11.62% | 15.74% | 27.74% |

 |InCoder 🦜 (6.7B) | 15.2% | 27.8% | 47.00% |
 |||||
 |Codex (25M)| 3.21% | 7.1% |	12.89%|
 |Codex (300M)| 13.17%| 20.37% | 36.27% |
 |Codex (12B)| 28.81%| 46.81% | 72.31% |
 |||||
 |GPT-neo (125M)| 0.75% | 1.88% | 2.97% |
 |GPT-neo (1.5B)| 4.79% | 7.47% | 16.30% |
+|GPT-J (6B)| 11.62% | 15.74% | 27.74% |
+To better understand how pass@k metric works, we will illustrate it with some examples. We select 4 tasks from the HumanEval dataset and see how the models performs and which code completions pass the unit tests.