This adds onnx support for https://github.com/michaelfeil/infinity conversions are identical to https://huggingface.co/Xenova/bge-small-en-v1.5

This is read Xiao @Shitao

michaelfeil changed pull request title from Upload 2 files to onnx support

Please consider the following testing script that I wrote for this PR. My advise for reproducability is to use file_name="onnx/model.onnx". The main benefit of onnx will be in the fast onnx execution on cpu with the quantized model.

from optimum.onnxruntime import ORTModelForFeatureExtraction  # type: ignore

import torch
from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-small-en-v1.5')
model = AutoModel.from_pretrained('BAAI/bge-small-en-v1.5', revision="refs/pr/9")
model_ort = ORTModelForFeatureExtraction.from_pretrained('BAAI/bge-small-en-v1.5', revision="refs/pr/9", file_name="onnx/model.onnx")
model.eval()

# Sentences we want sentence embeddings for
sentences = ["样例数据-1", "样例数据-2"]

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)
# encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)
    model_output_ort = model_ort(**encoded_input)
    
    # testing
    import numpy as np
    np.testing.assert_allclose(
        model_output.last_hidden_state.cpu().numpy(), 
        model_output_ort.last_hidden_state.cpu().numpy(),
          rtol=1e-3, atol=1e-5)

This is ready @Shitao - sorry for tagging you on that many PR, might be more helpful to review them at once.

Shitao changed pull request status to merged

Sign up or log in to comment