Contamination free code evaluations with LiveCodeBench! π₯οΈ
LiveCodeBench is a new leaderboard, which contains: - complete code evaluations (on code generation, self repair, code execution, tests) - my favorite feature: problem selection by publication date π
This feature means that you can get model scores averaged only on new problems out of the training data. This means... contamination free code evals! π