Normalization for MuSR object placement

#1040
by recojt - opened

Hello!

I got a question regarding score normalization. On this page (https://huggingface.co/docs/leaderboards/open_llm_leaderboard/normalization), you mention that the lower bound for MuSR object placement is 0.5. However looking at the dataset (e.g. https://huggingface.co/datasets/TAUR-Lab/MuSR/viewer), you can see that the number of choices range between 2 and 5.

For the normalization on the leaderboard, are you actually using 5 or the average number of choices across the dataset?

Open LLM Leaderboard org

For the object placement, we say 0.2 (1/5, we're assuming each question has 5 choices) - we might want to make it a bit more precise in the future, cc @alozowski

Sign up or log in to comment