mteb/leaderboard · CQADupstackRetrieval evaluation inquiry

Oct 17, 2023

CQADupstackRetrieval is divided into 12 datasets - however in the leaderboard there is no reference to which subset had been used for evaluation.

Is it the English subset? Or the average NDCG@10 on all of them?

Muennighoff

Massive Text Embedding Benchmark org Oct 17, 2023

Afaik they only have an English subset?
The score is NDCG@10 for all dataset in the Retrieval Tab computed for each individual dataset.

wenhanli

Feb 28, 2024

•

edited Feb 28, 2024

Afaik they only have an English subset?
The score is NDCG@10 for all dataset in the Retrieval Tab computed for each individual dataset.

Hello, after getting each NDCG@10 score for all subset such as CQADupstackAndroidRetrieval, CQADupstackEnglishRetrieval..., how to get the final score of CQADupstackRetrieval as reported in the leaderboard?
Is the score of CQADupstackRetrieval reported in the leaderboard is the average of all subsets, or one of a special subset?

I find codes in https://github.com/embeddings-benchmark/mteb/blob/main/scripts/mteb_meta.py as

MTEB(tasks=[ds_name.replace("CQADupstackRetrieval", "CQADupstackAndroidRetrieval")]).tasks[0].description

Does this means that, the score of CQADupstackRetrieval reported in leaderboard is actually the score of CQADupstackAndroidRetrieval?

Muennighoff

Massive Text Embedding Benchmark org Feb 28, 2024

python scripts/average_cqadupstack.py path/to/your/results/folder

wenhanli

Feb 29, 2024

python scripts/average_cqadupstack.py path/to/your/results/folder

Thanks!