zhichao-geng commited on
Commit
69d6657
1 Parent(s): 5989f9a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -13,11 +13,6 @@ tags:
13
  ---
14
 
15
  # opensearch-neural-sparse-encoding-v1
16
- This is a learned sparse retrieval model. It encodes the queries and documents to 30522 dimensional **sparse vectors**. The non-zero dimension index means the corresponding token in the vocabulary, and the weight means the importance of the token.
17
-
18
- This model is trained on MS MARCO dataset.
19
-
20
- OpenSearch neural sparse feature supports learned sparse retrieval with lucene inverted index. Link: https://opensearch.org/docs/latest/query-dsl/specialized/neural-sparse/. The indexing and search can be performed with OpenSearch high-level API.
21
 
22
  ## Select the model
23
  The model should be selected considering search relevance, model inference and retrieval efficiency(FLOPS). We benchmark models' **zero-shot performance** on a subset of BEIR benchmark: TrecCovid,NFCorpus,NQ,HotpotQA,FiQA,ArguAna,Touche,DBPedia,SCIDOCS,FEVER,Climate FEVER,SciFact,Quora.
@@ -32,6 +27,13 @@ Overall, the v2 series of models have better search relevance, efficiency and in
32
  | [opensearch-neural-sparse-encoding-doc-v2-distill](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill) | ✔️ | 67M | 0.504 | 1.8 |
33
  | [opensearch-neural-sparse-encoding-doc-v2-mini](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini) | ✔️ | 23M | 0.497 | 1.7 |
34
 
 
 
 
 
 
 
 
35
  ## Usage (HuggingFace)
36
  This model is supposed to run inside OpenSearch cluster. But you can also use it outside the cluster, with HuggingFace models API.
37
 
 
13
  ---
14
 
15
  # opensearch-neural-sparse-encoding-v1
 
 
 
 
 
16
 
17
  ## Select the model
18
  The model should be selected considering search relevance, model inference and retrieval efficiency(FLOPS). We benchmark models' **zero-shot performance** on a subset of BEIR benchmark: TrecCovid,NFCorpus,NQ,HotpotQA,FiQA,ArguAna,Touche,DBPedia,SCIDOCS,FEVER,Climate FEVER,SciFact,Quora.
 
27
  | [opensearch-neural-sparse-encoding-doc-v2-distill](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill) | ✔️ | 67M | 0.504 | 1.8 |
28
  | [opensearch-neural-sparse-encoding-doc-v2-mini](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini) | ✔️ | 23M | 0.497 | 1.7 |
29
 
30
+ ## Overview
31
+ This is a learned sparse retrieval model. It encodes the queries and documents to 30522 dimensional **sparse vectors**. The non-zero dimension index means the corresponding token in the vocabulary, and the weight means the importance of the token.
32
+
33
+ This model is trained on MS MARCO dataset.
34
+
35
+ OpenSearch neural sparse feature supports learned sparse retrieval with lucene inverted index. Link: https://opensearch.org/docs/latest/query-dsl/specialized/neural-sparse/. The indexing and search can be performed with OpenSearch high-level API.
36
+
37
  ## Usage (HuggingFace)
38
  This model is supposed to run inside OpenSearch cluster. But you can also use it outside the cluster, with HuggingFace models API.
39