Massive Text Embedding Benchmark org
No description provided.
Massive Text Embedding Benchmark org

I've added the Retrieval w/Instructions tab, which required three changes (besides adding model names):

  1. Rerankers got an embedding dimension of -1 (but I could do np.inf or something / or I could not include an embedding dimension)?
  2. Since the main metric is different from the retrieval method (and they are two different abstract tasks), I had to add it as a different tab rather than a sub-retrieval tab. However, it is not included in the main MTEB average score of course, which is left unchanged. I can make a larger code change to allow each sub-tab to have a different metric if we would prefer this to go under retrieval? Either is fine
  3. I had to add some code (very end of PR) to handle cases where models haven't been evaluated on all abstract tasks (just skipping results for that task, which happens frequently for instruction retrieval).

New Tab:
Screenshot 2024-04-29 at 4.26.38 PM.png
Screenshot 2024-04-29 at 4.26.17 PM.png
Showing Main Home screen tab (unchanged)
Screenshot 2024-04-29 at 4.26.28 PM.png

@Muennighoff thoughts on these changes?

orionweller changed pull request status to open
Massive Text Embedding Benchmark org

I think this looks amazing. My main high-level comment is that it may be confusing what the difference is between models under Retrieval w/Instructions and the various instruction-tuned models in the Retrieval leaderboards. Is there a better way to differentiate them? cc'ing some other people who were involved in merging in FollowIR @KennethEnevoldsen @imenelydiaker for thoughts!

Massive Text Embedding Benchmark org

The other comment I have is that I think it would help if we can visually differentiate Cross-Encoders & Bi-Encoders - not sure what's the best way to do it. It may also make sense to have a filtering tab for them at some point cc @tomaarsen

Massive Text Embedding Benchmark org

Re the first point, a solution might be to call add a description to the task type:
Screenshot 2024-05-02 at 10.40.50.png

Re: the second point we might differentiate it using the data suggested modelmeta object see:
https://github.com/embeddings-benchmark/mteb/issues/618

Massive Text Embedding Benchmark org

the difference is between models under Retrieval w/Instructions and the various instruction-tuned models in the Retrieval leaderboards. Is there a better way to differentiate them?

@KennethEnevoldsen are you suggesting a description of the abstract task in place of the tab that says "English"? I'm a little confused on the placement. I could also put it in a bullet point/paragraph under the title.

@Muennighoff I could change the name to "InstructionRetrieval" but I was thinking that might get confused with the prompt retrieval abstract task that is in progress. I could also place it as a sub-tab in retrieval, but I think it may cause the same confusion between instruction retrieval models and retrieval data with instructions.

visually differentiate Cross-Encoders & Bi-Encoders

Re: differentiating cross-encoders, I could make a new manual list of cross-encoders and stick some icon/emoji in front of them in the meantime? It does seem like model metadata might be the best solution, if we want to re-update the leaderboard when that PR is done.

Massive Text Embedding Benchmark org

@KennethEnevoldsen are you suggesting a description of the abstract task in place of the tab that says "English"? I'm a little confused on the placement. I could also put it in a bullet point/paragraph under the title.

I would put it below the first tab but before the English tab. It does not need to be more than 1-2 lines. Essentially a Layman's version of the abstract

Massive Text Embedding Benchmark org
edited 29 days ago

desc

The idea of @KennethEnevoldsen seems like a good solution!

visual

Icons/emojis sound great to me as an intermediate solution!

Massive Text Embedding Benchmark org

I merged in main (and all the great changes to the config files) and added a filter tab for CrossEncoders. I also included a short description of each task to resolve the issues (see pictures).

New Tab for instructions:
Screenshot 2024-05-07 at 4.01.11 PM.png

Overall tab, for reference:
Screenshot 2024-05-07 at 4.01.06 PM.png


I would say this is good to go, but for some reason the huggingface Github UI is being very weird - it says I made the config file changes and doesn't seem to register they are in main. Any idea what is happening @Muennighoff @KennethEnevoldsen ?

Massive Text Embedding Benchmark org
edited 25 days ago

Looks great to me - If it runs fine for you, I think we can just merge & manually check after that all looks fine.

I'm thinking if it's worth also having a Bi-Encoder model type checkbox (similar to how there's both open & prop) but up to you -- cc @tomaarsen

Massive Text Embedding Benchmark org

I think it might indeed make sense to add more checkboxes and/or separate the checkboxes into multiple categories. After all, there's

  • Open VS Proprietary
  • Bi-Encoder vs Cross-Encoder
  • Sentence Transformers support

Perhaps we should have these 3 categories?

Massive Text Embedding Benchmark org

Makes sense to me, I've also added the bi-encoders checkbox.

Screenshot 2024-05-08 at 9.43.08 AM.png
Screenshot 2024-05-08 at 9.43.00 AM.png


We could split into three categories, but currently it's the same functionality for all of them, so it's kinda nice to have them use the same function and be in the same box. I can also make that a separate issue if we want to discuss it further?

If it sounds good to everyone I can merge it this morning and pay close attention afterwards in case it needs a hotfix - it works fine for me locally but just to be sure.

Massive Text Embedding Benchmark org

Amazing! Fine to merge from my side!

orionweller changed pull request status to merged
Massive Text Embedding Benchmark org

Is it intended that this selection shows Cross-Encoders?
Screenshot 2024-05-08 at 1.18.05 PM.png

Also can we remove the description for the Overall tab? It is kind of superfluous there and takes up unnecessary space imo

Massive Text Embedding Benchmark org
edited 24 days ago

The box selector is an OR operator as it currently is implemented, so that is Bi-Encoders OR open models.

Should we change it to an AND operator?

Will remove the description for Overall!

Massive Text Embedding Benchmark org
  1. I think it's fine as is! (but we can change it if other people prefer)
  2. That'd be great 🙌
Massive Text Embedding Benchmark org

(but if people prefer 2. as is, i also don't mind; no strong opinion)

Massive Text Embedding Benchmark org

Makes sense to me to remove it - I have that in #111

Re: the OR operator I went back and forth on whether we AND or OR is preferable. I think there are valid reasons for both - I personally prefer AND but didn't want to change it without talking about it.

Sign up or log in to comment