Spaces:

mteb
/

leaderboard

Running on CPU Upgrade

App Files Files Community

135

update metadata for gte-v1.5 series models

#99

by zyznull - opened Apr 21

base: refs/heads/main

←

from: refs/pr/99

Discussion Files changed

+1329

-1786

zyznull

Apr 21

No description provided.

update metadata for gte-v1.5 series models608f3f51

tomaarsen

Massive Text Embedding Benchmark org Apr 22

Hello!

My understanding is that the goal of this PR is to fix the (lack of) metadata for gte-Qwen1.5-7B-instruct. The other 2 models look correct to me. As a bit of context, the 7b model fails as we cannot fetch information from its files due to the gating. As a result, it's indeed a good choice to add the metadata to app.py as you've done for this PR.
However, the external models must load their metadata from https://huggingface.co/datasets/mteb/results/tree/main/results. So, my recommendation is as follows:

Create a PR on https://huggingface.co/datasets/mteb/results/tree/main/results to add a gte-Qwen1.5-7B-instruct directory with your model's results. You should still have those from when you ran the MTEB code.
Update this PR to only list gte-Qwen1.5-7B-instruct as an external model, i.e. don't list gte-base-en-v1.5 and gte-large-en-v1.5 as external models. These can be loaded totally fine the "normal" way.

That's the most convenient way to fix the metadata issue for gte-Qwen1.5-7B-instruct, other than removing the gating I suppose.

Tom Aarsen

delete get-base/large-en-v1.5bd211f16

zyznull

Apr 22

Hello!

My understanding is that the goal of this PR is to fix the (lack of) metadata for gte-Qwen1.5-7B-instruct. The other 2 models look correct to me. As a bit of context, the 7b model fails as we cannot fetch information from its files due to the gating. As a result, it's indeed a good choice to add the metadata to app.py as you've done for this PR.
However, the external models must load their metadata from https://huggingface.co/datasets/mteb/results/tree/main/results. So, my recommendation is as follows:

Create a PR on https://huggingface.co/datasets/mteb/results/tree/main/results to add a gte-Qwen1.5-7B-instruct directory with your model's results. You should still have those from when you ran the MTEB code.

Update this PR to only list gte-Qwen1.5-7B-instruct as an external model, i.e. don't list gte-base-en-v1.5 and gte-large-en-v1.5 as external models. These can be loaded totally fine the "normal" way.

That's the most convenient way to fix the metadata issue for gte-Qwen1.5-7B-instruct, other than removing the gating I suppose.

Tom Aarsen

Thanks for your suggestions, I have deleted the gte-base/large-en-v1.5 metadata on the app.py file and update a pr for the mteb/results for gte-Qwen1.5-7B-instruct model. Please check for any other errors.

Muennighoff

Massive Text Embedding Benchmark org Apr 22

Since you named it "gte-Qwen1.5-7B-instruct" in the results PR, it also needs to have the same name here i.e. removing Alibaba-NLP/ if that's fine with you

tomaarsen

Massive Text Embedding Benchmark org Apr 22

This doesn't change anything from the user perspective, I believe. The keys of the dictionaries just need to not include the organization/user part.

zyznull

Apr 25

This doesn't change anything from the user perspective, I believe. The keys of the dictionaries just need to not include the organization/user part.

Hello， is there anything else should I change？

tomaarsen

Massive Text Embedding Benchmark org Apr 25

Yes, could you change "Alibaba-NLP/gte-Qwen1.5-7B-instruct" into "gte-Qwen1.5-7B-instruct" in this PR? Apologies for the confusion.

Tom Aarsen

Update app.py271b2803

zyznull

Apr 26

Yes, could you change "Alibaba-NLP/gte-Qwen1.5-7B-instruct" into "gte-Qwen1.5-7B-instruct" in this PR? Apologies for the confusion.

Tom Aarsen

Ok, I have changed all "Alibaba-NLP/gte-Qwen1.5-7B-instruct" into "gte-Qwen1.5-7B-instruct" now.

Muennighoff

Massive Text Embedding Benchmark org May 6

Sorry for the delay - Could you update the PR to conform with the new metadata format? (it needs to be in the yaml files instead now) Then we can merge it right away!

merge main app.py59067dbf

Create model_meta.yamlf2fa5169

zyznull

May 7

Sorry for the delay - Could you update the PR to conform with the new metadata format? (it needs to be in the yaml files instead now) Then we can merge it right away!

update now

Muennighoff

Massive Text Embedding Benchmark org May 7

Can you fix the conflicts? Sorry 😅

Update model_meta.yaml544498e5

zyznull

May 8

Can you fix the conflicts? Sorry 😅

Hi, I have found that continuing to modify the model_meta.yaml on this PR seems to have some bugs that prevent it from being merged, so I opened a new PR to submit the meta information for the GTE model.

zyznull changed pull request status to closed May 8

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment