File size: 2,101 Bytes

---
license: apache-2.0
datasets:
- microsoft/ms_marco
language:
- en
pipeline_tag: text-classification
tags:
- onnx
- cross-encoder
---

# Cross-Encoder for MS Marco - ONNX

ONNX versions of [Sentence Transformers Cross Encoders](https://huggingface.co/cross-encoder) to allow ranking without heavy dependencies.

The models were trained on the [MS Marco Passage Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking) task.

The models can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See [SBERT.net Retrieve & Re-rank](https://www.sbert.net/examples/applications/retrieve_rerank/README.html) for more details.

## Models Available

| Model Name                           | Precision | File Name                                | File Size |
|--------------------------------------|-----------|------------------------------------------|-----------|
| ms-marco-MiniLM-L-4-v2 ONNX          | FP32      | ms-marco-MiniLM-L-4-v2-onnx.zip          | 70 MB     |
| ms-marco-MiniLM-L-4-v2 ONNX (Quantized) | INT8    | ms-marco-MiniLM-L-4-v2-onnx-int8.zip     | 12.8 MB   |
| ms-marco-MiniLM-L-6-v2 ONNX          | FP32      | ms-marco-MiniLM-L-6-v2-onnx.zip          | 83.4 MB   |
| ms-marco-MiniLM-L-6-v2 ONNX (Quantized) | INT8    | ms-marco-MiniLM-L-6-v2-onnx-int8.zip     | 15.2 MB   |

## Usage with ONNX Runtime

```python
import onnxruntime as ort
from transformers import AutoTokenizer

model_path="ms-marco-MiniLM-L-4-v2-onnx/"
tokenizer = AutoTokenizer.from_pretrained('model_path')
ort_sess = ort.InferenceSession(model_path + "ms-marco-MiniLM-L-4-v2.onnx")

features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="np")
ort_outs = ort_sess.run(None, features)
print(ort_outs)
```

## Performance

TBU...