|
--- |
|
license: mit |
|
--- |
|
|
|
This model is a version of the [BGE-M3](https://huggingface.co/BAAI/bge-m3) model converted to ONNX weights with HF Optimum for compatibility with ONNX Runtime. |
|
|
|
It is based on the conversion scripts and the documentation of the [bge-m3-onnx](https://huggingface.co/aapot/bge-m3-onnx) model by [Aapo Tanskanen](https://huggingface.co/aapot). |
|
|
|
This ONNX model outputs dense and ColBERT embedding representations all at once. The output is a list of numpy arrays in previously mentioned order of representations. |
|
|
|
Note: dense and ColBERT embeddings are normalized like the default behavior in the original FlagEmbedding library. |
|
|
|
This ONNX model has "O3" level graph optimizations applied. You can read more about optimization levels [here](https://huggingface.co/docs/optimum/en/onnxruntime/usage_guides/optimization). |
|
|
|
## Usage with ONNX Runtime (Python) |
|
|
|
If you haven't already, you can install the [ONNX Runtime](https://onnxruntime.ai/) Python library with pip: |
|
```bash |
|
pip install onnxruntime |
|
``` |
|
|
|
For tokenization, you can for example use HF Transformers by installing it with pip: |
|
```bash |
|
pip install transformers |
|
``` |
|
|
|
Clone this repository with [Git LFS](https://git-lfs.com/) to get the ONNX model files. |
|
|
|
You can then use the model to compute embeddings, as follows: |
|
|
|
```python |
|
import onnxruntime as ort |
|
from transformers import AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("ddmitov/bge_m3_dense_colbert_onnx") |
|
ort_session = ort.InferenceSession("model.onnx") |
|
|
|
inputs = tokenizer( |
|
"BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.", |
|
padding="longest", |
|
return_tensors="np" |
|
) |
|
|
|
inputs_onnx = {k: ort.OrtValue.ortvalue_from_numpy(v) for k, v in inputs.items()} |
|
|
|
outputs = ort_session.run(None, inputs_onnx) |
|
``` |
|
|