Shitao michaelfeil commited on
Commit
5c38ec7
1 Parent(s): 35baf77

onnx support (#9)

Browse files

- Upload 2 files (c5ac6c397e27c80e0229ec647987f2e553fc0ba9)
- Update README.md (21761068410549f68530088eff9e528c8ef0d0b3)
- Delete onnx/model_quantized.onnx (5e62ea33e012fda8c02802b906664c915ebd1bb1)


Co-authored-by: Michael <michaelfeil@users.noreply.huggingface.co>

Files changed (2) hide show
  1. README.md +49 -0
  2. onnx/model.onnx +3 -0
README.md CHANGED
@@ -2907,6 +2907,55 @@ with torch.no_grad():
2907
  print(scores)
2908
  ```
2909
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2910
  ## Evaluation
2911
 
2912
  `baai-general-embedding` models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!**
 
2907
  print(scores)
2908
  ```
2909
 
2910
+ #### Usage of the ONNX files
2911
+
2912
+ ```python
2913
+ from optimum.onnxruntime import ORTModelForFeatureExtraction # type: ignore
2914
+
2915
+ import torch
2916
+ from transformers import AutoModel, AutoTokenizer
2917
+
2918
+ tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-small-en-v1.5')
2919
+ model = AutoModel.from_pretrained('BAAI/bge-small-en-v1.5')
2920
+ model_ort = ORTModelForFeatureExtraction.from_pretrained('BAAI/bge-small-en-v1.5', file_name="onnx/model.onnx")
2921
+
2922
+ # Sentences we want sentence embeddings for
2923
+ sentences = ["样例数据-1", "样例数据-2"]
2924
+
2925
+ # Tokenize sentences
2926
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
2927
+ # for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)
2928
+ # encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')
2929
+
2930
+ model_output_ort = model_ort(**encoded_input)
2931
+ # Compute token embeddings
2932
+ with torch.no_grad():
2933
+ model_output = model(**encoded_input)
2934
+
2935
+ # model_output and model_output_ort are identical
2936
+
2937
+ ```
2938
+
2939
+ #### Usage via infinity
2940
+ Its also possible to deploy the onnx files with the [infinity_emb](https://github.com/michaelfeil/infinity) pip package.
2941
+ Recommended is `device="cuda", engine="torch"` with flash attention on gpu, and `device="cpu", engine="optimum"` for onnx inference.
2942
+
2943
+ ```python
2944
+ import asyncio
2945
+ from infinity_emb import AsyncEmbeddingEngine, EngineArgs
2946
+
2947
+ sentences = ["Embed this is sentence via Infinity.", "Paris is in France."]
2948
+ engine = AsyncEmbeddingEngine.from_args(
2949
+ EngineArgs(model_name_or_path = "BAAI/bge-small-en-v1.5", device="cpu", engine="optimum" # or engine="torch"
2950
+ ))
2951
+
2952
+ async def main():
2953
+ async with engine:
2954
+ embeddings, usage = await engine.embed(sentences=sentences)
2955
+ asyncio.run(main())
2956
+ ```
2957
+
2958
+
2959
  ## Evaluation
2960
 
2961
  `baai-general-embedding` models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!**
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:828e1496d7fabb79cfa4dcd84fa38625c0d3d21da474a00f08db0f559940cf35
3
+ size 133093490