Shitao commited on
Commit
cfdb103
1 Parent(s): aa47896

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -215,11 +215,6 @@ print(model.compute_score(sentence_pairs,
215
 
216
 
217
  We compare BGE-M3 with some popular methods, including BM25, openAI embedding, etc.
218
- We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
219
- To make the BM25 and BGE-M3 more comparable, in the experiment,
220
- BM25 used the same tokenizer as BGE-M3 (i.e., the tokenizer of XLM-Roberta).
221
- Using the same vocabulary can also ensure that both approaches have the same retrieval latency.
222
-
223
 
224
  - Multilingual (Miracl dataset)
225
 
@@ -242,6 +237,12 @@ Using the same vocabulary can also ensure that both approaches have the same ret
242
  - NarritiveQA:
243
  ![avatar](./imgs/nqa.jpg)
244
 
 
 
 
 
 
 
245
 
246
  ## Training
247
  - Self-knowledge Distillation: combining multiple outputs from different
@@ -259,7 +260,7 @@ Refer to our [report](https://arxiv.org/pdf/2402.03216.pdf) for more details.
259
  ## Acknowledgement
260
 
261
  Thanks the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc.
262
- Thanks the open-sourced libraries like [Tevatron](https://github.com/texttron/tevatron), [pyserial](https://github.com/pyserial/pyserial).
263
 
264
 
265
 
 
215
 
216
 
217
  We compare BGE-M3 with some popular methods, including BM25, openAI embedding, etc.
 
 
 
 
 
218
 
219
  - Multilingual (Miracl dataset)
220
 
 
237
  - NarritiveQA:
238
  ![avatar](./imgs/nqa.jpg)
239
 
240
+ - BM25
241
+
242
+ We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
243
+
244
+ ![avatar](./imgs/bm25.jpg)
245
+
246
 
247
  ## Training
248
  - Self-knowledge Distillation: combining multiple outputs from different
 
260
  ## Acknowledgement
261
 
262
  Thanks the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc.
263
+ Thanks the open-sourced libraries like [Tevatron](https://github.com/texttron/tevatron), [Pyserini](https://github.com/castorini/pyserini).
264
 
265
 
266