aapot commited on
Commit
76a6033
1 Parent(s): 54f351c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md CHANGED
@@ -1,3 +1,90 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ [BGE-M3](https://huggingface.co/BAAI/bge-m3) converted to ONNX weights with HF Optimum, to be compatible, for example, with ONNX Runtime.
6
+
7
+ This ONNX model outputs dense, sparse and ColBERT embedding representations all at once. The output is a list of numpy arrays in previously mentioned order of representations.
8
+
9
+ Note: dense and ColBERT embeddings are normalized like the default behavior in the original FlagEmbedding library, if you want unnormalized outputs you can modify the code in `bgem3_model.py` and re-run the ONNX export with `export_onnx.py` script.
10
+
11
+
12
+ This ONNX model also has "O2" level graph optimizations applied, you can read more about optimization levels [here](https://huggingface.co/docs/optimum/en/onnxruntime/usage_guides/optimization). If you want ONNX model with different optimization or without optimizations, you can re-run the ONNX export script `export_onnx.py` with appropriate optimization argument.
13
+
14
+ ## Usage with ONNX Runtime (Python)
15
+
16
+ If you haven't already, you can install the [ONNX Runtime](https://onnxruntime.ai/) Python library with pip:
17
+ ```bash
18
+ pip install onnxruntime==1.17.0
19
+ ```
20
+
21
+ For tokenization, you can for example use HF Transformers by installing it with pip:
22
+ ```bash
23
+ pip install transformers==4.37.2
24
+ ```
25
+
26
+ Clone this repository with [Git LFS](https://git-lfs.com/) to get the ONNX model files.
27
+
28
+ You can then use the model to compute embeddings, as follows:
29
+
30
+ ```python
31
+ import onnxruntime as ort
32
+ from transformers import AutoTokenizer
33
+
34
+ tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-m3")
35
+ ort_session = ort.InferenceSession("model.onnx")
36
+
37
+ inputs = tokenizer("BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.", padding="longest", return_tensors="np")
38
+ inputs_onnx = {k: ort.OrtValue.ortvalue_from_numpy(v) for k, v in inputs.items()}
39
+
40
+ outputs = ort_session.run(None, inputs_onnx)
41
+ ```
42
+
43
+ Note: you can use following sparse token weight processor from FlagEmbedding to get same the output for the sparse representation from the ONNX model:
44
+
45
+ ```python
46
+ from collections import defaultdict
47
+
48
+
49
+ def process_token_weights(token_weights: np.ndarray, input_ids: list):
50
+ # conver to dict
51
+ result = defaultdict(int)
52
+ unused_tokens = set(
53
+ [
54
+ tokenizer.cls_token_id,
55
+ tokenizer.eos_token_id,
56
+ tokenizer.pad_token_id,
57
+ tokenizer.unk_token_id,
58
+ ]
59
+ )
60
+ for w, idx in zip(token_weights, input_ids):
61
+ if idx not in unused_tokens and w > 0:
62
+ idx = str(idx)
63
+ # w = int(w)
64
+ if w > result[idx]:
65
+ result[idx] = w
66
+ return result
67
+
68
+
69
+ token_weights = outputs[1].squeeze(-1)
70
+ lexical_weights = list(
71
+ map(process_token_weights, token_weights, inputs["input_ids"].tolist())
72
+ )
73
+ ```
74
+
75
+ ## Export ONNX weights
76
+
77
+ You can export ONNX weights with the provided custom BGE-M3 PyTorch model `bgem3_model.py` file and with the provided `export_onnx.py` ONNX weight export script which leverages HF Optimum.
78
+ If needed, you can modify the `bgem3_model.py` model configuration to for example remove embedding normalization or to not output all three embedding representations. If you modify the number of output representations, you need to also modify the ONNX output config `BGEM3OnnxConfig` in `export_onnx.py`.
79
+
80
+ First, install needed Python requirements as follows:
81
+ ```bash
82
+ pip install -r requirements.txt
83
+ ```
84
+
85
+ Then you can export ONNX weights as follows:
86
+ ```bash
87
+ python export_onnx.py --output . --opset 17 --device cpu --optimize O2
88
+ ```
89
+
90
+ You can read more about the optional optimization levels [here](https://huggingface.co/docs/optimum/en/onnxruntime/usage_guides/optimization)