update metadata for gte-v1.5 series models

#99
No description provided.
Massive Text Embedding Benchmark org

Hello!

My understanding is that the goal of this PR is to fix the (lack of) metadata for gte-Qwen1.5-7B-instruct. The other 2 models look correct to me. As a bit of context, the 7b model fails as we cannot fetch information from its files due to the gating. As a result, it's indeed a good choice to add the metadata to app.py as you've done for this PR.
However, the external models must load their metadata from https://huggingface.co/datasets/mteb/results/tree/main/results. So, my recommendation is as follows:

  1. Create a PR on https://huggingface.co/datasets/mteb/results/tree/main/results to add a gte-Qwen1.5-7B-instruct directory with your model's results. You should still have those from when you ran the MTEB code.
  2. Update this PR to only list gte-Qwen1.5-7B-instruct as an external model, i.e. don't list gte-base-en-v1.5 and gte-large-en-v1.5 as external models. These can be loaded totally fine the "normal" way.

That's the most convenient way to fix the metadata issue for gte-Qwen1.5-7B-instruct, other than removing the gating I suppose.

  • Tom Aarsen

Hello!

My understanding is that the goal of this PR is to fix the (lack of) metadata for gte-Qwen1.5-7B-instruct. The other 2 models look correct to me. As a bit of context, the 7b model fails as we cannot fetch information from its files due to the gating. As a result, it's indeed a good choice to add the metadata to app.py as you've done for this PR.
However, the external models must load their metadata from https://huggingface.co/datasets/mteb/results/tree/main/results. So, my recommendation is as follows:

  1. Create a PR on https://huggingface.co/datasets/mteb/results/tree/main/results to add a gte-Qwen1.5-7B-instruct directory with your model's results. You should still have those from when you ran the MTEB code.
  2. Update this PR to only list gte-Qwen1.5-7B-instruct as an external model, i.e. don't list gte-base-en-v1.5 and gte-large-en-v1.5 as external models. These can be loaded totally fine the "normal" way.

That's the most convenient way to fix the metadata issue for gte-Qwen1.5-7B-instruct, other than removing the gating I suppose.

  • Tom Aarsen

Thanks for your suggestions, I have deleted the gte-base/large-en-v1.5 metadata on the app.py file and update a pr for the mteb/results for gte-Qwen1.5-7B-instruct model. Please check for any other errors.

Massive Text Embedding Benchmark org

Since you named it "gte-Qwen1.5-7B-instruct" in the results PR, it also needs to have the same name here i.e. removing Alibaba-NLP/ if that's fine with you

Massive Text Embedding Benchmark org

This doesn't change anything from the user perspective, I believe. The keys of the dictionaries just need to not include the organization/user part.

This doesn't change anything from the user perspective, I believe. The keys of the dictionaries just need to not include the organization/user part.

Hello, is there anything else should I change?

Massive Text Embedding Benchmark org

Yes, could you change "Alibaba-NLP/gte-Qwen1.5-7B-instruct" into "gte-Qwen1.5-7B-instruct" in this PR? Apologies for the confusion.

  • Tom Aarsen

Yes, could you change "Alibaba-NLP/gte-Qwen1.5-7B-instruct" into "gte-Qwen1.5-7B-instruct" in this PR? Apologies for the confusion.

  • Tom Aarsen

Ok, I have changed all "Alibaba-NLP/gte-Qwen1.5-7B-instruct" into "gte-Qwen1.5-7B-instruct" now.

Massive Text Embedding Benchmark org

Sorry for the delay - Could you update the PR to conform with the new metadata format? (it needs to be in the yaml files instead now) Then we can merge it right away!

Sorry for the delay - Could you update the PR to conform with the new metadata format? (it needs to be in the yaml files instead now) Then we can merge it right away!

update now

Massive Text Embedding Benchmark org

Can you fix the conflicts? Sorry 😅

Screenshot 2024-05-07 at 4.27.42 PM.png

Can you fix the conflicts? Sorry 😅

Screenshot 2024-05-07 at 4.27.42 PM.png

Hi, I have found that continuing to modify the model_meta.yaml on this PR seems to have some bugs that prevent it from being merged, so I opened a new PR to submit the meta information for the GTE model.

zyznull changed pull request status to closed

Sign up or log in to comment