Humaneval scored???

#1
by rombodawg - opened

What does this model score on human eval? Does it get close to wizardcoder34b?

OpenBuddy org

Hi, this model is our first attempt, and currently we believe its coding ability is close to or slightly lower than OpenBuddy 70B.

If its coding ability can be further improved in the future, we will consider conducting more benchmarks.

Also, please note that the scores of HumanEval are easily influenced by the training set, and due to the complexity of the model's training data, even the model authors find it difficult to judge whether the training set is completely disjoint with the test questions. Therefore, we believe that benchmarks like HumanEval may be somewhat misleading, and a model that surpasses GPT-4 on HumanEval may not necessarily have stronger coding ability.

Given that there is still a lack of evaluation mechanisms that can align with human coding ability, we still recommend that users use multiple models in practice and make judgments based on their own scenarios.

You are correct that there is no true comparison for human coding capabilities, but a score on a benchmark is still the best way to compare models at the moment. So ill still look forward to the score

Sign up or log in to comment