bm25 / README.md
generall93's picture
Add a note about modifier.idf (#2)
b966341 verified
---
license: apache-2.0
language:
- en
pipeline_tag: sentence-similarity
---
Repository with files to perform BM25 searches with [FastEmbed](https://github.com/qdrant/fastembed).
[BM25 (Best Matching 25)](https://en.wikipedia.org/wiki/Okapi_BM25) is a ranking function used by search engines to estimate the relevance of documents to a given search query.
### Usage
> Note:
This model is supposed to be used with Qdrant. Vectors have to be configured with [Modifier.IDF](https://qdrant.tech/documentation/concepts/indexing/?q=modifier#idf-modifier).
Here's an example of BM25 with [FastEmbed](https://github.com/qdrant/fastembed).
```py
from fastembed import SparseTextEmbedding
documents = [
"You should stay, study and sprint.",
"History can only prepare us to be surprised yet again.",
]
model = SparseTextEmbedding(model_name="Qdrant/bm25")
embeddings = list(model.embed(documents))
# [
# SparseEmbedding(
# values=array([1.67419738, 1.67419738, 1.67419738, 1.67419738]),
# indices=array([171321964, 1881538586, 150760872, 1932363795])),
# SparseEmbedding(values=array(
# [1.66973021, 1.66973021, 1.66973021, 1.66973021, 1.66973021]),
# indices=array([
# 578407224, 1849833631, 1008800696, 2090661150,
# 1117393019
# ]))
# ]
```
```