vespa-engine
/

colbert-medium

Inference Endpoints

Model card Files Files and versions Community

colbert-medium / README.md

bergum's picture

Update with dynamic axes

63ecbef about 3 years ago

|

raw history blame contribute delete

No virus

3.18 kB

	# MS Marco Ranking with ColBERT on Vespa.ai

	Model is based on [ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT](https://arxiv.org/abs/2004.12832).
	This BERT model is based on [google/bert_uncased_L-8_H-512_A-8](https://huggingface.co/google/bert_uncased_L-8_H-512_A-8) and trained using the
	original [ColBERT training routine](https://github.com/stanford-futuredata/ColBERT/).
	The model weights have been tuned by training using the `triples.train.small.tar.gz from` [MSMARCO-Passage-Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking).


	To use this model with vespa.ai for MS Marco Passage Ranking, see
	[MS Marco Ranking using Vespa.ai sample app](https://github.com/vespa-engine/sample-apps/tree/master/msmarco-ranking).

	# MS Marco Passage Ranking

	\| MS Marco Passage Ranking Query Set \| MRR@10 ColBERT on Vespa.ai \|
	\|------------------------------------\|----------------\|
	\| Dev \| 0.354 \|
	\| Eval \| 0.347 \|

	The official baseline BM25 ranking model MRR@10 0.16 on eval and 0.167 on dev question set.
	See [MS Marco Passage Ranking Leaderboard](https://microsoft.github.io/msmarco/).

	## Export ColBERT query encoder to ONNX
	We represent the ColBERT query encoder in the Vespa runtime, to map the textual query representation to the tensor representation. For this
	we use Vespa's support for running ONNX models. One can use the following snippet to export the model for serving.

	```python
	from transformers import BertModel
	from transformers import BertPreTrainedModel
	from transformers import BertConfig
	import torch
	import torch.nn as nn

	class VespaColBERT(BertPreTrainedModel):

	def __init__(self,config):
	super().__init__(config)
	self.bert = BertModel(config)
	self.linear = nn.Linear(config.hidden_size, 32, bias=False)
	self.init_weights()

	def forward(self, input_ids, attention_mask):
	Q = self.bert(input_ids,attention_mask=attention_mask)[0]
	Q = self.linear(Q)
	return torch.nn.functional.normalize(Q, p=2, dim=2)

	colbert_query_encoder = VespaColBERT.from_pretrained("vespa-engine/colbert-medium")

	#Export model to ONNX for serving in Vespa

	input_names = ["input_ids", "attention_mask"]
	output_names = ["contextual"]
	#input, max 32 query term
	input_ids = torch.ones(1,32, dtype=torch.int64)
	attention_mask = torch.ones(1,32,dtype=torch.int64)
	args = (input_ids, attention_mask)
	torch.onnx.export(colbert_query_encoder,
	args=args,
	f="query_encoder_colbert.onnx",
	input_names = input_names,
	output_names = output_names,
	dynamic_axes = {
	"input_ids": {0: "batch"},
	"attention_mask": {0: "batch"},
	"contextual": {0: "batch"},
	},
	opset_version=11)
	```

	# Representing the model on Vespa.ai
	See [Ranking with ONNX models](https://docs.vespa.ai/documentation/onnx.html) and [MS Marco Ranking sample app](https://github.com/vespa-engine/sample-apps/tree/master/msmarco-ranking)