inconsistent with the original model

#1
by yutiangong - opened

Could you provide the conversion code? I infer with the converted model, but the results were inconsistent with the original model. thank you

Note that because of the Int8 quantization, the values of the original model and the converted embeddings will differ to some extent.

thank you ,but after i convert to int 16, it reported an error when infering ,the error is
'onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from /bge_m3_onnx_down/BAAI-bge-m3_quantized_fp16.onnx failed:This is an invalid model. Type Error: Type 'tensor(float16)' of input parameter (embeddings.position_embeddings.weight_scale) of operator (DequantizeLinear) in node (/embeddings/position_embeddings/Gather_output_0_DequantizeLinear) is invalid.
'

Could you upload the original convertion code ? i thank it will help more guys....

That is because your onnx runtime does not support fp16. running it with fp32 instead of changing to int8 may fix it.

The model I am publishing is for running onnx on a search engine called vespa, and does not take into account other environments. The behavior of onnx varies considerably from runtime to runtime.

Thank you,but It not work,after i converted to fp32. Actually my problem is when i conver bge m3 .bin model to onnx, .onnx model and .onnx.datal generated,i only need the .onnx version

Sign up or log in to comment