---
language: 
  - en
tags:
- search
- ranking
- vespa
license: mit

base_model: "cross-encoder/ms-marco-MiniLM-L-6-v2"
---

# MS Marco Ranking with ColBERT on Vespa.ai 

Model is based on [ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT](https://arxiv.org/abs/2004.12832). 
This BERT model is based on [cross-encoder/ms-marco-MiniLM-L-6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) and trained using the
original [ColBERT training routine](https://github.com/stanford-futuredata/ColBERT/).

This model has 22.3M trainable parameters and is approximately 2x faster than 
[vespa-engine/colbert-medium](https://huggingface.co/vespa-engine/colbert-medium) and with better or on pair MRR@10 on dev. 

The model weights have been tuned by training using a randomized sample of MS Marco training triplets 
[MSMARCO-Passage-Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking). 

To use this model with vespa.ai for MS Marco Passage Ranking, see 
[MS Marco Ranking using Vespa.ai sample app](https://github.com/vespa-engine/sample-apps/tree/master/msmarco-ranking).

# MS Marco Passage Ranking

| MS Marco Passage Ranking Query Set | MRR@10 ColBERT on Vespa.ai |
|------------------------------------|----------------|
| Dev                                | 0.364          |

Recall@k On Dev (6980 queries)

|K                                   | Recall@K |
|------------------------------------|----------------|
| 50                                 | 0.816          |
| 200                                | 0.905          |
| 1000                               | 0.939          |


The MRR@10 on dev is achieved by re-ranking 1K retrieved by a dense retriever based on 
[sentence-transformers/msmarco-MiniLM-L-6-v3](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L-6-v3). Re-ranking the original top 1000 dev 
is 0.354 MRR@10 (Recall@1K 0.82).

The official baseline BM25 ranking model MRR@10 0.16 on eval and 0.167 on dev question set. 
See [MS Marco Passage Ranking Leaderboard](https://microsoft.github.io/msmarco/).

## Export ColBERT query encoder to ONNX 
We represent the ColBERT query encoder in the Vespa runtime, to map the textual query representation to the tensor representation. For this
we use Vespa's support for running ONNX models. One can use the following snippet to export the model for serving.

```python
from transformers import BertModel
from transformers import BertPreTrainedModel
from transformers import BertConfig
import torch 
import torch.nn as nn

class VespaColBERT(BertPreTrainedModel):
   
    def __init__(self,config):
        super().__init__(config)
        self.bert = BertModel(config)
        self.linear = nn.Linear(config.hidden_size, 32, bias=False)
        self.init_weights()
        
    def forward(self, input_ids, attention_mask):
        Q = self.bert(input_ids,attention_mask=attention_mask)[0]
        Q = self.linear(Q)
        return torch.nn.functional.normalize(Q, p=2, dim=2)  

colbert_query_encoder = VespaColBERT.from_pretrained("vespa-engine/col-minilm") 

#Export model to ONNX for serving in Vespa 

input_names = ["input_ids", "attention_mask"]
output_names = ["contextual"]
#input, max 32 query term
input_ids = torch.ones(1,32, dtype=torch.int64)
attention_mask = torch.ones(1,32,dtype=torch.int64)
args = (input_ids, attention_mask)
torch.onnx.export(colbert_query_encoder,
                args=args,
                f="query_encoder_colbert.onnx",
                input_names = input_names,
                output_names = output_names,
                dynamic_axes = {
                    "input_ids": {0: "batch"},
                    "attention_mask": {0: "batch"},
                    "contextual": {0: "batch"},
                },
                opset_version=11)
```

# Representing the model on Vespa.ai
See [Ranking with ONNX models](https://docs.vespa.ai/documentation/onnx.html) and the [MS Marco Ranking sample app](https://github.com/vespa-engine/sample-apps/tree/master/msmarco-ranking)