Shitao commited on
Commit
6d44202
1 Parent(s): 2d5552f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -214,10 +214,11 @@ print(model.compute_score(sentence_pairs,
214
  ## Evaluation
215
 
216
 
217
- **Currently, the results of BM25 on non-English data are incorrect.
218
- We will review our testing process and update the paper as soon as possible.
219
- For more powerful BM25, you can refer to this [repo](https://github.com/carlos-lassance/bm25_mldr).
220
- Thanks to the community for the reminder and to carlos-lassance for providing the results.**
 
221
 
222
 
223
  - Multilingual (Miracl dataset)
 
214
  ## Evaluation
215
 
216
 
217
+ We compare BGE-M3 with some popular methods, including BM25, openAI embedding, etc.
218
+ We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
219
+ To make the BM25 and BGE-M3 more comparable, in the experiment,
220
+ BM25 used the same tokenizer as BGE-M3 (i.e., the tokenizer of XLM-Roberta).
221
+ Using the same vocabulary can also ensure that both approaches have the same retrieval latency.
222
 
223
 
224
  - Multilingual (Miracl dataset)