New model and mteb leaderboard refresh request

#117
by nada5 - opened

Hi, Huggingface MTEB team.

We submitted a new embedding model scores on https://huggingface.co/nvidia/NV-Embed-v1. Would you help refreshing the mteb leaderboard?
Thank you!

Best,
Chankyu

Massive Text Embedding Benchmark org

Hello!

I've triggered a restart. Looking forward to hearing more about your model!
I'd love to be able to assist from the Hugging Face side to get the biggest reach for your model at release. Perhaps I can add you to one of the Hugging Face slack channels for easier communication on this?

It looks like you're missing some metrics for CQADupstackRetrieval (which consists of a few different datasets). Otherwise, your model is visible on the leaderboard.

  • Tom Aarsen
Massive Text Embedding Benchmark org

Also, based on your Classification scores (AmazonCounterfactualClassification, EmotionClassification), it seems plausible that the MTEB testing sets accidentally leaked in your training set. For the former, you reach 95.12% accuracy compared to 88% for the 2nd highest (which may have also been overfitted, most models reach 70-75%), and for the latter, you reach 91.7% accuracy while the 2nd highest accuracy is 59.81% (primarily due to how the Emotion dataset is not very high quality).

  • Tom Aarsen

Hi, Tom

Thanks for the quick response!

  1. Thanks for pointing CQADupstackRetrieval. We merged the CQADupstack*Retrieval results into one and modified the readme now. Can you please refresh the leaderboard again when you are available?
  2. Yes, please add me to Huggingface Slack Channel. My email is "chankyul@nvidia.com".
  3. Since the training splits of EmotionClassification and AmazonCounterfactualClassification have some similar content as the evaluation splits, we use BM25 similarity thresholds to remove similar content from the training splits and also remove exact matches.

Best,
Chankyul

Massive Text Embedding Benchmark org

I've invited you to a Slack channel for this.
As for the CQADupstackRetrieval - there were still some issues, your current README states an NDCG@10 of 5050.54, for example. My recommendation is to run merge_cqadupstack.py which should take care of the merging for you. Once you've corrected it, then I can restart the leaderboard and you'll show up again, for now I've removed the model as it was scoring an average of ~158 out of 100 across all tasks ๐Ÿ˜„

  • Tom Aarsen

Sign up or log in to comment