Do not see retrieval and sts numbers

#13
by consciousAI - opened

Hi Team! I do not see Retrieval and STS metrics on the leaderboard.

I ran the reports for English dataset, I do see individual reports generated for both of these tasks on the model evaluation reports page (link below). Do I need to follow any other steps to make them visible? Thank you!

consciousAI/cai-stellaris-text-embeddings

On a different context,

  1. How is the overall Ranking Computed?

  2. Not sure if there is Rank Per Task available? If not, just a suggestion, to have Task Specific Ranking on individual Task Tabs? As some model may be Overall Rank 20th but number 2nd on specific task, which is more important I guess?

consciousAI changed discussion title from Do not see retrieval and reranking numbers to Do not see retrieval and sts numbers
Massive Text Embedding Benchmark org

For me the scores show up, see screenshot attached, however, it seems like you are missing a few Retrieval tasks, such as FEVER, HotpotQA etc.
Screenshot 2023-06-23 at 6.33.39 PM.png

  1. The overall ranking is an average of the 56 datasets
  2. Yeah good point - I'm thinking of adding this. Currently, you can click the arrow next to the task you are interested in in the Overall tab and it will sort by the score of that task.

@Muennighoff Appreciate the quick response. I see the numbers in individual tasks but do not see the overall numbers, please see the screenshot.

image.png

Massive Text Embedding Benchmark org

Yeah exactly the overall scores are missing because some of the individual scores are missing. The overall averages are only computed if all individual ones for that task are run.
I.e. for Retrieval you need to add FEVER, HotpotQA etc scores first then you will see the Overall average score

I see, not sure what happened I did not capture the logs but I do see that these were requested on the benchmark script, and I also see some FEVER run stats on the eval page. Thank you though, let me try to rerun the benchmark and capture the run logs to see why are these being missed.

In absence of these overall stats, is there any other way I can see the Rank Per Task? on Sub-Tabs like Retrieval, there are many datasets but no overall ranking there too.

image.png

image.png

Massive Text Embedding Benchmark org
edited Jun 23, 2023

You can see the rank per task by sorting for that task in the overall tab (e.g. hit the arrow for classification in the overall tab) (assuming the task average is present)
You can see the rank per dataset by sorting for that dataset in the task tab (e.g. hit the arrow for ArguAna in the Retrieval tab)

This just gives you the ordering though, you then have to count to know your rank number for a task. I will look into adding another dedicated rank column in each task tab soon I think.

Muennighoff changed discussion status to closed

Sign up or log in to comment