CQADupstackRetrieval evaluation inquiry

#41
by Izarel - opened

CQADupstackRetrieval is divided into 12 datasets - however in the leaderboard there is no reference to which subset had been used for evaluation.

Is it the English subset? Or the average NDCG@10 on all of them?

Massive Text Embedding Benchmark org

Afaik they only have an English subset?
The score is NDCG@10 for all dataset in the Retrieval Tab computed for each individual dataset.

Afaik they only have an English subset?
The score is NDCG@10 for all dataset in the Retrieval Tab computed for each individual dataset.

Hello, after getting each NDCG@10 score for all subset such as CQADupstackAndroidRetrieval, CQADupstackEnglishRetrieval..., how to get the final score of CQADupstackRetrieval as reported in the leaderboard?
Is the score of CQADupstackRetrieval reported in the leaderboard is the average of all subsets, or one of a special subset?

I find codes in https://github.com/embeddings-benchmark/mteb/blob/main/scripts/mteb_meta.py as

MTEB(tasks=[ds_name.replace("CQADupstackRetrieval", "CQADupstackAndroidRetrieval")]).tasks[0].description

Does this means that, the score of CQADupstackRetrieval reported in leaderboard is actually the score of CQADupstackAndroidRetrieval?

Massive Text Embedding Benchmark org

python scripts/average_cqadupstack.py path/to/your/results/folder

python scripts/average_cqadupstack.py path/to/your/results/folder

Thanks!

Sign up or log in to comment