update metadata for gte-v1.5 series models

#99
by zyznull - opened
No description provided.
Massive Text Embedding Benchmark org

Hello!

My understanding is that the goal of this PR is to fix the (lack of) metadata for gte-Qwen1.5-7B-instruct. The other 2 models look correct to me. As a bit of context, the 7b model fails as we cannot fetch information from its files due to the gating. As a result, it's indeed a good choice to add the metadata to app.py as you've done for this PR.
However, the external models must load their metadata from https://huggingface.co/datasets/mteb/results/tree/main/results. So, my recommendation is as follows:

  1. Create a PR on https://huggingface.co/datasets/mteb/results/tree/main/results to add a gte-Qwen1.5-7B-instruct directory with your model's results. You should still have those from when you ran the MTEB code.
  2. Update this PR to only list gte-Qwen1.5-7B-instruct as an external model, i.e. don't list gte-base-en-v1.5 and gte-large-en-v1.5 as external models. These can be loaded totally fine the "normal" way.

That's the most convenient way to fix the metadata issue for gte-Qwen1.5-7B-instruct, other than removing the gating I suppose.

  • Tom Aarsen

Hello!

My understanding is that the goal of this PR is to fix the (lack of) metadata for gte-Qwen1.5-7B-instruct. The other 2 models look correct to me. As a bit of context, the 7b model fails as we cannot fetch information from its files due to the gating. As a result, it's indeed a good choice to add the metadata to app.py as you've done for this PR.
However, the external models must load their metadata from https://huggingface.co/datasets/mteb/results/tree/main/results. So, my recommendation is as follows:

  1. Create a PR on https://huggingface.co/datasets/mteb/results/tree/main/results to add a gte-Qwen1.5-7B-instruct directory with your model's results. You should still have those from when you ran the MTEB code.
  2. Update this PR to only list gte-Qwen1.5-7B-instruct as an external model, i.e. don't list gte-base-en-v1.5 and gte-large-en-v1.5 as external models. These can be loaded totally fine the "normal" way.

That's the most convenient way to fix the metadata issue for gte-Qwen1.5-7B-instruct, other than removing the gating I suppose.

  • Tom Aarsen

Thanks for your suggestions, I have deleted the gte-base/large-en-v1.5 metadata on the app.py file and update a pr for the mteb/results for gte-Qwen1.5-7B-instruct model. Please check for any other errors.

Massive Text Embedding Benchmark org

Since you named it "gte-Qwen1.5-7B-instruct" in the results PR, it also needs to have the same name here i.e. removing Alibaba-NLP/ if that's fine with you

Massive Text Embedding Benchmark org

This doesn't change anything from the user perspective, I believe. The keys of the dictionaries just need to not include the organization/user part.

This doesn't change anything from the user perspective, I believe. The keys of the dictionaries just need to not include the organization/user part.

Hello, is there anything else should I change?

Massive Text Embedding Benchmark org

Yes, could you change "Alibaba-NLP/gte-Qwen1.5-7B-instruct" into "gte-Qwen1.5-7B-instruct" in this PR? Apologies for the confusion.

  • Tom Aarsen

Yes, could you change "Alibaba-NLP/gte-Qwen1.5-7B-instruct" into "gte-Qwen1.5-7B-instruct" in this PR? Apologies for the confusion.

  • Tom Aarsen

Ok, I have changed all "Alibaba-NLP/gte-Qwen1.5-7B-instruct" into "gte-Qwen1.5-7B-instruct" now.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment