Spaces:

mteb
/

leaderboard

Running on CPU Upgrade

App Files Files Community

114

Request a restart for new SOTA Chinese embedding models

#101

by Jinkin - opened 26 days ago

Discussion

Jinkin

26 days ago

•

edited 26 days ago

Hello，we just release a chinese embedding model piccolo-large-zh-v2 https://huggingface.co/sensenova/piccolo-large-zh-v2
which currently ranks first in the Chinese MTEB leaderboard.

However, this model is currently only accessible through SenseTime’s API, and we list the api source in the card. So can we submit this way?

tomaarsen

Massive Text Embedding Benchmark org 26 days ago

Hello!

I'll defer to @Muennighoff for the final call as he knows best, but I think that we indeed accept models this way.
We only require that a model is usable in some way, whether that's via open weights of an API does not matter.
However, for API models it's more common to make a pull request to https://huggingface.co/datasets/mteb/results/tree/main/results to add a piccolo-large-zh-v2 folder with results. You can then make another pull request on https://huggingface.co/spaces/mteb/leaderboard/blob/main/app.py#L336 to add your model to that list of external models. You can then also add a link of your choice here and set the output dimensionality here. Especially that last one cannot be gathered if you upload your model scores in the way that you've currently done.

However, in the meantime I've restarted the leaderboard, congratulations!

Tom Aarsen

Jinkin

25 days ago

Thanks for your suggestion !
But recently due to some reasons, the company's blog page cannot be updated. I will keep the content of this huggingface page for the next few days to give an introduction to this model.
Then I will try to use your method to pull a request of this model !.

Jinkin changed discussion status to closed 25 days ago

Jinkin

3 days ago

@tomaarsen Hello, for the piccolo-v2 model：https://huggingface.co/sensenova/piccolo-large-zh-v2, I have uploaded the weights and parameter configs, but why are the parameters (dimension, model size) in the leaderboard still displayed as empty ?

Jinkin changed discussion status to open 3 days ago

tomaarsen

Massive Text Embedding Benchmark org 3 days ago

Hello @Jinkin ,

I've restarted the leaderboard, and now the data shows correctly! Congratulations on your release - I think your multi-task approach is excellent. The MRL performance is also excellent, and I appreciate that you filtered to avoid CMTEB overlap!

Tom Aarsen

Jinkin

2 days ago

Thanks , I see it !

Jinkin changed discussion status to closed 2 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment