metadata

license: apache-2.0
datasets:
  - microsoft/ms_marco
language:
  - en
pipeline_tag: text-classification
tags:
  - onnx
  - cross-encoder

Cross-Encoder for MS Marco - ONNX

ONNX versions of Sentence Transformers Cross Encoders to allow ranking without heavy dependencies.

The models were trained on the MS Marco Passage Ranking task.

The models can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details.

Models Available

Model Name	Precision	File Name	File Size
ms-marco-MiniLM-L-4-v2 ONNX	FP32	ms-marco-MiniLM-L-4-v2-onnx.zip	70 MB
ms-marco-MiniLM-L-4-v2 ONNX (Quantized)	INT8	ms-marco-MiniLM-L-4-v2-onnx-int8.zip	12.8 MB
ms-marco-MiniLM-L-6-v2 ONNX	FP32	ms-marco-MiniLM-L-6-v2-onnx.zip	83.4 MB
ms-marco-MiniLM-L-6-v2 ONNX (Quantized)	INT8	ms-marco-MiniLM-L-6-v2-onnx-int8.zip	15.2 MB

Usage with ONNX Runtime

import onnxruntime as ort
from transformers import AutoTokenizer

model_path="ms-marco-MiniLM-L-4-v2-onnx/"
tokenizer = AutoTokenizer.from_pretrained('model_path')
ort_sess = ort.InferenceSession(model_path + "ms-marco-MiniLM-L-4-v2.onnx")

features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="np")
ort_outs = ort_sess.run(None, features)
print(ort_outs)

Performance

TBU...