GSM8K scores after the 2024-03march update

#21
by CombinHorizon - opened

Hi,
after the change to improve its haystack-retrieval scores,
its math GSM8K scores have dropped from
61.6% (before), to 34.9% after the update

reference:
https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/01-ai/Yi-34B-200K/results_2023-12-05T03-41-41.478096.json
https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/01-ai/Yi-34B-200K/results_2024-04-16T04-20-00.686323.json

what could be the cause of this?
how would (or will) this be addressed, will there be a future update ?

Sign up or log in to comment