Adding w601sxs/b1ade-embed to the leaderboard

#114
by w601sxs - opened

Could you please refresh and add https://huggingface.co/w601sxs/b1ade-embed to the leaderboard?

@tomaarsen are you able to help with this and some other models down the discussion board with a refresh? thanks.

Massive Text Embedding Benchmark org

Restarting it now! I do notice that you're missing results for MindSmallReranking, which means that your entry is not fully complete. As a result, it can't compute an Average for your model.
Also, we heavily recommend models to share some details about how they're created in the model card. Also, consider adding Sentence Transformers support for the model. Should be as simple as:

from sentence_transformers import SentenceTransformer
from sentence_transformers.models import Transformer, Pooling

transformer = Transformer("w601sxs/b1ade-embed")
pooling = Pooling(transformer.get_word_embedding_dimension(), pooling_mode="mean") # If you use "mean" pooling, otherwise e.g. "cls" or whatever you're using
model = SentenceTransformer(modules=[transformer, pooling])
# save the model locally:
model.save_pretrained("w601sxs_b1ade-embed")
# or push it to HF directly:
# model.push_to_hub("w601sxs/b1ade-embed", exist_ok=True)
  • Tom Aarsen

Yes, the mteb script failed to download the MindSmallReranking one:

INFO:mteb.evaluation.MTEB:

********************** Evaluating MindSmallReranking **********************
INFO:mteb.evaluation.MTEB:Loading dataset for MindSmallReranking
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
Failed to read file 'gzip://7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8::/root/.cache/huggingface/datasets/downloads/7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8' with error <class 'pyarrow.lib.ArrowInvalid'>: JSON parse error: Invalid value. in row 0
ERROR:datasets.packaged_modules.json.json:Failed to read file 'gzip://7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8::/root/.cache/huggingface/datasets/downloads/7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8' with error <class 'pyarrow.lib.ArrowInvalid'>: JSON parse error: Invalid value. in row 0
ERROR:mteb.evaluation.MTEB:Error while evaluating MindSmallReranking: An error occurred while generating the dataset
Massive Text Embedding Benchmark org

Huh, I've seen this report before, but I believe I was never able to replicate it. @Muennighoff do you have the time to look into this?

  • Tom Aarsen

I had to downgrade datasets and it worked.

Uploaded a new readme with MindSmallReranking .. Will add sentence transformers too.

Can you refresh again @tomaarsen ? Or maybe post when this will be refreshed next.. what I saw is that it is ~ once a week?

Massive Text Embedding Benchmark org

I've restarted it :)

Sign up or log in to comment