loubnabnl's picture
loubnabnl HF staff
Create eval_table.md
e185349
|
raw
history blame
840 Bytes

Table 1 below shows the HumanEval scores of CodeParrot, InCoder, PolyCoder, CodeGen and Codex (not open-source).

Model pass@1 pass@10 pass@100
CodeParrot (110M) 3.80% 6.57% 12.78%
CodeParrot (1.5B) 3.58% 8.03% 14.96%
InCoder (6.7B) 15.2% 27.8% 47.00%
PolyCoder (160M) 2.13% 3.35% 4.88%
PolyCoder (400M) 2.96% 5.29% 11.59%
PolyCoder (2.7B) 5.59% 9.84% 17.68%
CodeGen-Mono (350M) 12.76% 23.11% 35.19%
CodeGen-Mono (2.7B) 23.70% 36.64% 57.01%
CodeGen-Mono (6.1B) 26.13% 42.29% 65.82%
CodeGen-Mono (16.1B) 29.28% 49.86% 75.00%
Codex (25M) 3.21% 7.1% 12.89%
Codex (300M) 13.17% 20.37% 36.27%
Codex (12B) 28.81% 46.81% 72.31%