onnx-cross-encoders / README.md
svilupp's picture
Update README.md
776fda4 verified
metadata
license: apache-2.0
datasets:
  - microsoft/ms_marco
language:
  - en
pipeline_tag: text-classification
tags:
  - onnx
  - cross-encoder

Cross-Encoder for MS Marco - ONNX

ONNX versions of Sentence Transformers Cross Encoders to allow ranking without heavy dependencies.

The models were trained on the MS Marco Passage Ranking task.

The models can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details.

Models Available

Model Name Precision File Name File Size
ms-marco-MiniLM-L-4-v2 ONNX FP32 ms-marco-MiniLM-L-4-v2-onnx.zip 70 MB
ms-marco-MiniLM-L-4-v2 ONNX (Quantized) INT8 ms-marco-MiniLM-L-4-v2-onnx-int8.zip 12.8 MB
ms-marco-MiniLM-L-6-v2 ONNX FP32 ms-marco-MiniLM-L-6-v2-onnx.zip 83.4 MB
ms-marco-MiniLM-L-6-v2 ONNX (Quantized) INT8 ms-marco-MiniLM-L-6-v2-onnx-int8.zip 15.2 MB

Usage with ONNX Runtime

import onnxruntime as ort
from transformers import AutoTokenizer

model_path="ms-marco-MiniLM-L-4-v2-onnx/"
tokenizer = AutoTokenizer.from_pretrained('model_path')
ort_sess = ort.InferenceSession(model_path + "ms-marco-MiniLM-L-4-v2.onnx")

features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="np")
ort_outs = ort_sess.run(None, features)
print(ort_outs)

Performance

TBU...