Request a restart for new SOTA Chinese embedding models

#101
by Jinkin - opened

Hello,we just release a chinese embedding model piccolo-large-zh-v2 https://huggingface.co/sensenova/piccolo-large-zh-v2
which currently ranks first in the Chinese MTEB leaderboard.

However, this model is currently only accessible through SenseTime’s API, and we list the api source in the card. So can we submit this way?

Massive Text Embedding Benchmark org

Hello!

I'll defer to @Muennighoff for the final call as he knows best, but I think that we indeed accept models this way.
We only require that a model is usable in some way, whether that's via open weights of an API does not matter.
However, for API models it's more common to make a pull request to https://huggingface.co/datasets/mteb/results/tree/main/results to add a piccolo-large-zh-v2 folder with results. You can then make another pull request on https://huggingface.co/spaces/mteb/leaderboard/blob/main/app.py#L336 to add your model to that list of external models. You can then also add a link of your choice here and set the output dimensionality here. Especially that last one cannot be gathered if you upload your model scores in the way that you've currently done.

However, in the meantime I've restarted the leaderboard, congratulations!

  • Tom Aarsen

Thanks for your suggestion !
But recently due to some reasons, the company's blog page cannot be updated. I will keep the content of this huggingface page for the next few days to give an introduction to this model.
Then I will try to use your method to pull a request of this model !.

Jinkin changed discussion status to closed

@tomaarsen Hello, for the piccolo-v2 model:https://huggingface.co/sensenova/piccolo-large-zh-v2, I have uploaded the weights and parameter configs, but why are the parameters (dimension, model size) in the leaderboard still displayed as empty ?

Jinkin changed discussion status to open
Massive Text Embedding Benchmark org

Hello @Jinkin ,

I've restarted the leaderboard, and now the data shows correctly! Congratulations on your release - I think your multi-task approach is excellent. The MRL performance is also excellent, and I appreciate that you filtered to avoid CMTEB overlap!

  • Tom Aarsen

Thanks , I see it !

Jinkin changed discussion status to closed

Sign up or log in to comment