intfloat/multilingual-e5-base · Request for Fine-Tuning Documentation for intfloat/multilingual-e5-base

Nov 27, 2024

First, thank you for your incredible work on the intfloat/multilingual-e5-base model. It has been a valuable tool for multilingual semantic search and retrieval tasks. I am currently exploring how to fine-tune this model for a specific domain: student admissions advisory, where the model will assist in understanding student queries and retrieving relevant information about courses, scholarships, or universities.

While I understand the general principles of fine-tuning transformer-based models, I was wondering if you have any official documentation, guidelines, or examples specific to fine-tuning multilingual-e5-base?

Some specific questions I have include:

Does the model require a specific dataset format for fine-tuning (e.g., paired text, labeled data)?
Are there any recommended hyperparameters or settings that have been proven effective during fine-tuning this model?
Do you suggest any specific training techniques (e.g., contrastive loss, retrieval-based fine-tuning) for maintaining or enhancing the quality of embeddings during adaptation?
If applicable, are there any best practices to ensure the multilingual capabilities of the model are preserved during fine-tuning?
It would also be great if there are any publicly available fine-tuning scripts or references that I could use as a starting point.

Thank you again for making this model publicly available, and I look forward to hearing from you. Your insights will be incredibly helpful in making the most out of intfloat/multilingual-e5-base for my project.

gururaser

Mar 23

Hi, could you find any resource about it?

epchannel

Apr 5

You can refer to the readme with finetune on multilingual-e5-base for the query-document pair: hiieu/halong_embeddingYou can refer to the readme with finetune on multilingual-e5-base for the query-document pair: hiieu/halong_embedding

gururaser

about 1 month ago

@epchannel Thank you