MS Marco Ranking with ColBERT on Vespa.ai

Model is based on ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. This BERT model is based on google/bert_uncased_L-8_H-512_A-8 and trained using the original ColBERT training routine. The model weights have been tuned by training using the triples.train.small.tar.gz from MSMARCO-Passage-Ranking.

To use this model with vespa.ai for MS Marco Passage Ranking, see MS Marco Ranking using Vespa.ai sample app.

MS Marco Passage Ranking

MS Marco Passage Ranking Query Set	MRR@10 ColBERT on Vespa.ai
Dev	0.354
Eval	0.347

The official baseline BM25 ranking model MRR@10 0.16 on eval and 0.167 on dev question set. See MS Marco Passage Ranking Leaderboard.

Export ColBERT query encoder to ONNX

We represent the ColBERT query encoder in the Vespa runtime, to map the textual query representation to the tensor representation. For this we use Vespa's support for running ONNX models. One can use the following snippet to export the model for serving.

from transformers import BertModel
from transformers import BertPreTrainedModel
from transformers import BertConfig
import torch 
import torch.nn as nn

class VespaColBERT(BertPreTrainedModel):
   
    def __init__(self,config):
        super().__init__(config)
        self.bert = BertModel(config)
        self.linear = nn.Linear(config.hidden_size, 32, bias=False)
        self.init_weights()
        
    def forward(self, input_ids, attention_mask):
        Q = self.bert(input_ids,attention_mask=attention_mask)[0]
        Q = self.linear(Q)
        return torch.nn.functional.normalize(Q, p=2, dim=2)  

colbert_query_encoder = VespaColBERT.from_pretrained("vespa-engine/colbert-medium") 

#Export model to ONNX for serving in Vespa 

input_names = ["input_ids", "attention_mask"]
output_names = ["contextual"]
#input, max 32 query term
input_ids = torch.ones(1,32, dtype=torch.int64)
attention_mask = torch.ones(1,32,dtype=torch.int64)
args = (input_ids, attention_mask)
torch.onnx.export(colbert_query_encoder,
                args=args,
                f="query_encoder_colbert.onnx",
                input_names = input_names,
                output_names = output_names,
                dynamic_axes = {
                    "input_ids": {0: "batch"},
                    "attention_mask": {0: "batch"},
                    "contextual": {0: "batch"},
                },
                opset_version=11)

Representing the model on Vespa.ai

See Ranking with ONNX models and MS Marco Ranking sample app