hotchpotch/vespa-onnx-intfloat-multilingual-e5-large

Converted intfloat/multilingual-e5-large model in onnx fp16/int8 format for use with Vespa Embedding.

intfloat-multilingual-e5-large_fp16.onnx (fp16)
intfloat-multilingual-e5-large_quantized.onnx (int8 quantized)

The model was quantized using the optimum toolkit.

Example of vespa services.xml:

Notice: FP16 works well with Vespa versions 8.325.46 and above.

<component id="me5_large" type="hugging-face-embedder">
    <transformer-model
        url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolve/main/intfloat-multilingual-e5-large_fp16.onnx" />
    <!-- or int8 quantization model
    <transformer-model
    url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolve/main/intfloat-multilingual-e5-large_quantized.onnx"
    />
    -->
    <tokenizer-model
        url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolve/main/tokenizer.json" />
    <normalize>true</normalize>
    <pooling-strategy>mean</pooling-strategy>
</component>

deploy

# FP16 model has a larger file size, which can result in longer deployment times.
vespa deploy --wait 1800 .

Tips: conver to int8 quantized

# https://github.com/vespa-engine/sample-apps/blob/master/simple-semantic-search/export_hf_model_from_hf.py
./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-large --output_dir me5-large

optimum-cli onnxruntime quantize --onnx_model ./me5-large  -o me5-large-large_quantized --avx512_vnni

Tips: convert to fp16

# https://github.com/vespa-engine/sample-apps/blob/master/simple-semantic-search/export_hf_model_from_hf.py
./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-large --output_dir me5-large

https://gist.github.com/hotchpotch/64fa52d32886fe61cc1d110066afef38

# https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/float16.py

import onnx
from onnxruntime.transformers.float16 import convert_float_to_float16

onnx_model = onnx.load("me5-large/intfloat-multilingual-e5-large.onnx")
model_fp16 = convert_float_to_float16(onnx_model, disable_shape_infer=True)
onnx.save(model_fp16, "me5-large/intfloat-multilingual-e5-large_fp16.onnx")

License

The license for this model is based on the original license (found in the LICENSE file in the project's root directory), which is the MIT License.

https://huggingface.co/intfloat/multilingual-e5-large

Attribution

All credits for this model go to the authors of Multilingual-E5-large and the associated researchers and organizations. When using this model, please be sure to attribute the original authors.