ddmitov
/

bge_m3_dense_colbert_onnx

Feature Extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

bge_m3_dense_colbert_onnx / README.md

ddmitov's picture

Update README.md

9782245 verified 4 months ago

|

No virus

1.82 kB

	---
	license: mit
	---

	This model is a version of the [BGE-M3](https://huggingface.co/BAAI/bge-m3) model converted to ONNX weights with HF Optimum for compatibility with ONNX Runtime.

	It is based on the conversion scripts and the documentation of the [bge-m3-onnx](https://huggingface.co/aapot/bge-m3-onnx) model by [Aapo Tanskanen](https://huggingface.co/aapot).

	This ONNX model outputs dense and ColBERT embedding representations all at once. The output is a list of numpy arrays in previously mentioned order of representations.

	Note: dense and ColBERT embeddings are normalized like the default behavior in the original FlagEmbedding library.

	This ONNX model has "O3" level graph optimizations applied. You can read more about optimization levels [here](https://huggingface.co/docs/optimum/en/onnxruntime/usage_guides/optimization).

	## Usage with ONNX Runtime (Python)

	If you haven't already, you can install the [ONNX Runtime](https://onnxruntime.ai/) Python library with pip:
	```bash
	pip install onnxruntime
	```

	For tokenization, you can for example use HF Transformers by installing it with pip:
	```bash
	pip install transformers
	```

	Clone this repository with [Git LFS](https://git-lfs.com/) to get the ONNX model files.

	You can then use the model to compute embeddings, as follows:

	```python
	import onnxruntime as ort
	from transformers import AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("ddmitov/bge_m3_dense_colbert_onnx")
	ort_session = ort.InferenceSession("model.onnx")

	inputs = tokenizer(
	"BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.",
	padding="longest",
	return_tensors="np"
	)

	inputs_onnx = {k: ort.OrtValue.ortvalue_from_numpy(v) for k, v in inputs.items()}

	outputs = ort_session.run(None, inputs_onnx)
	```