File size: 3,515 Bytes
47bfcff |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
tags:
- FlagEmbedding
- Embedding
- Hybrid Retrieval
- ONNX
- Optimum
- ONNXRuntime
- Multilingual
license: mit
base_model: BAAI/bge-m3
---
# Model Card for philipchung/bge-m3-onnx
<!-- Provide a quick summary of what the model is/does. -->
This is the [BAAI/BGE-M3](https://huggingface.co/BAAI/bge-m3) inference model converted to ONNX format and can be used with Optimum ONNX Runtime with CPU acceleration. This model outputs all 3 embedding types (Dense, Sparse, ColBERT).
No ONNX optimizations are applied to this model. If you want to apply optimizations, use the export script included in this repo to generate a version of ONNX model with optimizations.
Some of the code is adapted from [aapot/bge-m3-onnx](https://huggingface.co/aapot/bge-m3-onnx). The model in this repo inherits from `PretrainedModel` and the ONNX model can be downloaded from Huggingface Hub and used directly with the `model.from_pretrained()` method.
## How to Use
```python
from collections import defaultdict
from typing import Any
import numpy as np
from optimum.onnxruntime import ORTModelForCustomTasks
from transformers import AutoTokenizer
# Download ONNX model from Huggingface Hub
onnx_model = ORTModelForCustomTasks.from_pretrained("philipchung/bge-m3-onnx")
tokenizer = AutoTokenizer.from_pretrained("philipchung/bge-m3-onnx")
# Inference forward pass
sentences = ["First test sentence.", "Second test sentence"]
inputs = tokenizer(
sentences,
padding="longest",
return_tensors="np",
)
outputs = onnx_model.forward(**inputs)
def process_token_weights(
token_weights: np.ndarray, input_ids: list
) -> defaultdict[Any, int]:
"""Convert sparse token weights into dictionary of token indices and corresponding weights.
Function is taken from the original FlagEmbedding.bge_m3.BGEM3FlagModel from the
_process_token_weights() function defined within the encode() method.
"""
# convert to dict
result = defaultdict(int)
unused_tokens = set(
[
tokenizer.cls_token_id,
tokenizer.eos_token_id,
tokenizer.pad_token_id,
tokenizer.unk_token_id,
]
)
for w, idx in zip(token_weights, input_ids, strict=False):
if idx not in unused_tokens and w > 0:
idx = str(idx)
# w = int(w)
if w > result[idx]:
result[idx] = w
return result
# Each sentence results in a dict[str, list]float] | dict[str, float] | list[list[float]]] which corresponds to a dict with dense, sparse, and colbert embeddings.
embeddings_list = []
for input_ids, dense_vec, sparse_vec, colbert_vec in zip(
inputs["input_ids"],
outputs["dense_vecs"],
outputs["sparse_vecs"],
outputs["colbert_vecs"],
strict=False,
):
# Convert token weights into dictionary of token indices and corresponding weights
token_weights = sparse_vec.astype(float).squeeze(-1)
sparse_embeddings = process_token_weights(
token_weights,
input_ids.tolist(),
)
multivector_embedding = {
"dense": dense_vec.astype(float).tolist(), # (1024)
"sparse": dict(sparse_embeddings), # dict[token_index, weight]
"colbert": colbert_vec.astype(float).tolist(), # (token len, 1024)
}
embeddings_list.append(multivector_embedding)
```
|