Could you provide the result per dataset on MTEB?

#1
by Kaguya-19 - opened

I am an undergraduate student at Tsinghua University and am currently doing research on embedding models. Your work GritLM is exciting, the publicly available Medi2 dataset is useful, and the ablation experiments in it are very helpful.

I am currently using Medi2 for embedding model training, but the performance on retrieval tasks is somewhat different. Can you provide the results of each data set on MTEB, I want to check for errors in my training. Thank you again!

Sure they should be all in here: https://huggingface.co/datasets/GritLM/results

Sorry, I'm not very clear. The results at https://huggingface.co/datasets/GritLM/results appear to be GritLM trained on the E5 dataset, can you provide the test results per dataset of the MEDI2-trained model? Thanks again!

They are all in there, e.g. these are the results from this model: https://huggingface.co/datasets/GritLM/results/tree/main/gritlm_m7_sq2048_medi2bge_bbcc

I see, they are quite meaningful! Many thanks!

Kaguya-19 changed discussion status to closed

Sign up or log in to comment