prithivida
commited on
Commit
•
0bd3af2
1
Parent(s):
835ba8f
Update README.md
Browse files
README.md
CHANGED
@@ -51,7 +51,7 @@ pipeline_tag: sentence-similarity
|
|
51 |
- [With Sentence Transformers:](#with-sentence-transformers)
|
52 |
- [With Huggingface Transformers:](#with-huggingface-transformers)
|
53 |
- [FAQs](#faqs)
|
54 |
-
- [How can I reduce overall inference cost
|
55 |
- [How do I reduce vector storage cost?](#how-do-i-reduce-vector-storage-cost)
|
56 |
- [How do I offer hybrid search to improve accuracy?](#how-do-i-offer-hybrid-search-to-improve-accuracy)
|
57 |
- [MTEB numbers](#mteb-numbers)
|
@@ -161,13 +161,13 @@ for query, query_embedding in zip(queries, query_embeddings):
|
|
161 |
|
162 |
# FAQs:
|
163 |
|
164 |
-
#### How can I reduce overall inference cost
|
165 |
- You can host these models without heavy torch dependency using the ONNX flavours of these models via [FlashEmbed](https://github.com/PrithivirajDamodaran/flashembed) library.
|
166 |
|
167 |
-
#### How do I reduce vector storage cost
|
168 |
[Use Binary and Scalar Quantisation](https://huggingface.co/blog/embedding-quantization)
|
169 |
|
170 |
-
#### How do I offer hybrid search to improve accuracy
|
171 |
MIRACL paper shows simply combining BM25 is a good starting point for a Hybrid option:
|
172 |
The below numbers are with mDPR model, but miniDense_arabic_v1 should give a even better hybrid performance.
|
173 |
|
|
|
51 |
- [With Sentence Transformers:](#with-sentence-transformers)
|
52 |
- [With Huggingface Transformers:](#with-huggingface-transformers)
|
53 |
- [FAQs](#faqs)
|
54 |
+
- [How can I reduce overall inference cost?](#how-can-i-reduce-overall-inference-cost)
|
55 |
- [How do I reduce vector storage cost?](#how-do-i-reduce-vector-storage-cost)
|
56 |
- [How do I offer hybrid search to improve accuracy?](#how-do-i-offer-hybrid-search-to-improve-accuracy)
|
57 |
- [MTEB numbers](#mteb-numbers)
|
|
|
161 |
|
162 |
# FAQs:
|
163 |
|
164 |
+
#### How can I reduce overall inference cost?
|
165 |
- You can host these models without heavy torch dependency using the ONNX flavours of these models via [FlashEmbed](https://github.com/PrithivirajDamodaran/flashembed) library.
|
166 |
|
167 |
+
#### How do I reduce vector storage cost?
|
168 |
[Use Binary and Scalar Quantisation](https://huggingface.co/blog/embedding-quantization)
|
169 |
|
170 |
+
#### How do I offer hybrid search to improve accuracy?
|
171 |
MIRACL paper shows simply combining BM25 is a good starting point for a Hybrid option:
|
172 |
The below numbers are with mDPR model, but miniDense_arabic_v1 should give a even better hybrid performance.
|
173 |
|