Import vespa-engine/col-minilm

Files changed (6) hide show

README.md +77 -0
config.json +31 -0
pytorch_model.bin +3 -0
special_tokens_map.json +1 -0
tokenizer_config.json +1 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,77 @@

+# MS Marco Ranking with ColBERT on Vespa.ai
+Model is based on [ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT](https://arxiv.org/abs/2004.12832).
+This BERT model is based on [cross-encoder/ms-marco-MiniLM-L-6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) and trained using the
+original [ColBERT training routine](https://github.com/stanford-futuredata/ColBERT/).
+This model has 22.3M trainable parameters and is approximately 2x faster than
+[vespa-engine/colbert-medium](https://huggingface.co/vespa-engine/colbert-medium) and with better or on pair MRR@10 on dev.
+The model weights have been tuned by training using a randomized sample of MS Marco training triplets
+[MSMARCO-Passage-Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking).
+To use this model with vespa.ai for MS Marco Passage Ranking, see
+[MS Marco Ranking using Vespa.ai sample app](https://github.com/vespa-engine/sample-apps/tree/master/msmarco-ranking).
+# MS Marco Passage Ranking
+| MS Marco Passage Ranking Query Set | MRR@10 ColBERT on Vespa.ai |
+|------------------------------------|----------------|
+| Dev                                | 0.364          |
+The MRR@10 on dev is achieved by re-ranking 1K retrieved by a dense retriever based on
+[sentence-transformers/msmarco-MiniLM-L-6-v3](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L-6-v3).
+The official baseline BM25 ranking model MRR@10 0.16 on eval and 0.167 on dev question set.
+See [MS Marco Passage Ranking Leaderboard](https://microsoft.github.io/msmarco/).
+## Export ColBERT query encoder to ONNX
+We represent the ColBERT query encoder in the Vespa runtime, to map the textual query representation to the tensor representation. For this
+we use Vespa's support for running ONNX models. One can use the following snippet to export the model for serving.
+```python
+from transformers import BertModel
+from transformers import BertPreTrainedModel
+from transformers import BertConfig
+import torch
+import torch.nn as nn
+class VespaColBERT(BertPreTrainedModel):
+    def __init__(self,config):
+        super().__init__(config)
+        self.bert = BertModel(config)
+        self.linear = nn.Linear(config.hidden_size, 32, bias=False)
+        self.init_weights()
+    def forward(self, input_ids, attention_mask):
+        Q = self.bert(input_ids,attention_mask=attention_mask)[0]
+        Q = self.linear(Q)
+        return torch.nn.functional.normalize(Q, p=2, dim=2)
+colbert_query_encoder = VespaColBERT.from_pretrained("vespa-engine/col-minilm")
+#Export model to ONNX for serving in Vespa
+input_names = ["input_ids", "attention_mask"]
+output_names = ["contextual"]
+#input, max 32 query term
+input_ids = torch.ones(1,32, dtype=torch.int64)
+attention_mask = torch.ones(1,32,dtype=torch.int64)
+args = (input_ids, attention_mask)
+torch.onnx.export(colbert_query_encoder,
+                args=args,
+                f="query_encoder_colbert.onnx",
+                input_names = input_names,
+                output_names = output_names,
+                dynamic_axes = {
+                    "input_ids": {0: "batch"},
+                    "attention_mask": {0: "batch"},
+                    "contextual": {0: "batch"},
+                },
+                opset_version=11)
+```
+# Representing the model on Vespa.ai
+See [Ranking with ONNX models](https://docs.vespa.ai/documentation/onnx.html) and [MS Marco Ranking sample app](https://github.com/vespa-engine/sample-apps/tree/master/msmarco-ranking)

config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "_name_or_path": "minilm",
+  "architectures": [
+    "ColBERT"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 6,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "sbert_ce_default_activation_function": "torch.nn.modules.linear.Identity",
+  "transformers_version": "4.4.2",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e67cd33989d5633a4ffae5726dafee6d4fd3f42b9f888315746b50c64c9814a0
+size 90949065

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"do_lower_case": true, "do_basic_tokenize": true, "never_split": null, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "model_max_length": 512, "name_or_path": "minilm", "special_tokens_map_file": "/Users/bergum/.cache/huggingface/transformers/3295d833faab1b0a5258c61d5d6ba3db7c2414aca8614a8503c6deb89fc00611.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d", "tokenizer_file": null}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff