intfloat/multilingual-e5-large · multilingual-e5-large-unsupervised

Aug 17, 2023

Hi! Would you consider uploading the unsupervised (contrastive pre-training only) version of the multilingual-e5-large model, similar to the e5-large-unsupervised model? Many thanks.

intfloat

Owner Aug 18, 2023

We do not plan to release the one with contrastive pre-training only.

Curious why you are interested in that checkpoint? It does not perform very well without the second stage fine-tuning.

sighduck

Aug 18, 2023

Thank you for the quick response. I’ve been using e5-large-unsupervised and fine-tuning with my own dataset using Tevatron. I am working on a domain with language that is fairly different from the content of the train dataset for supervised e5-large, so chose to use unsupervised as base and do my own fine-tuning. We have Chinese language content as well, so I was considering switching base model to a multilingual unsupervised if it were possible.

When you say it does not perform well, do you mean multilingual unsupervised perform worse on BEIR than the English unsupervised model from your paper? I thought the fact that your unsupervised model can outperform BM25 is impressive. Even if multilingual unsupervised performs worse I would appreciate if you were to share the checkpoint as I think it would be useful to see the comparison, but of course very much up to you. Thanks again.

intfloat

Owner Aug 19, 2023

Thanks for your detailed comment.

I mean that the multilingual unsupervised model performs worse than multilingual supervised ones.

As for fine-tuning to adapt to your domain of interest, you can use the supervised one as the base model.

hzhiqi

Apr 23, 2024

I'm also interested in the checkpoint of multilingual unsupervised one. I see you already released the E5-unsupervised (e5-base-unsupervised). Can you release the same version for multilingual?
Much appreciated.