The performnan on humaneval.

#5
by TingchenFu - opened

Hello, I evaluate the model on humaneval benchmark with the bigcode-evaluation-harness framework and the pass@1 on humaneval is much lower than the number reported in the original paper (3% v.s. 18%). However, in my implementation, the pass@1 on MBPP agrees with the reported number. Does anyone have any idea why this happened?

Sign up or log in to comment