BAAI/bge-reranker-v2-m3 · corss-lingual reranking

Apr 3, 2024

I used your main BAAI/bge-m3 model to produce multilingual vectors for my products, which are in 6 different languages. Whenever I use simple semantic vector search between different languages things work fine (for example, my query is in turkish, I search in english products, I still get fair results).
But for making the results more related, I decided to use bge-reranker-v2-m3 re-ranker, which again, for situations when the query and the proudcts are in the same language works just perfect. But, when I try to have a turkish query over my english products, the results of the reranking process completely gives unrelated items.
I wonder if there is a solution that can generate good re-ranking score for pairs like this:
cross = ["mavi gomlek erkek uchun", "blue navy shirt for men"]

Shitao

Beijing Academy of Artificial Intelligence org Apr 3, 2024

Thanks for your interest in our work! Currently, there is no training data for cross-lingual reranking task when training this model. Therefore, the model may not have a good performance for some cross-lingual tasks.
A solution is fine-tuning the model based on your data. You can refer to the training script: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker

victorkeke

Apr 3, 2024

Thank you very much for providing this model. The results after re-ranking are great. Although I only re-rank 50-80 of the primary results due to speed limits. (If I am not mistaken, this re-ranking process is a time-consuming one, and therefore the primary candidates must be of small number).
For the fine-tuning and training the model, I do not have prior experience in this area. Is it a hard task to do? How can I proceed?