Edit model card

Converted intfloat/multilingual-e5-large model in onnx fp16/int8 format for use with Vespa Embedding.

  • intfloat-multilingual-e5-large_fp16.onnx (fp16)
  • intfloat-multilingual-e5-large_quantized.onnx (int8 quantized)

The model was quantized using the optimum toolkit.

Example of vespa services.xml:

Notice: FP16 works well with Vespa versions 8.325.46 and above.

<component id="me5_large" type="hugging-face-embedder">
    <transformer-model
        url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolve/main/intfloat-multilingual-e5-large_fp16.onnx" />
    <!-- or int8 quantization model
    <transformer-model
    url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolve/main/intfloat-multilingual-e5-large_quantized.onnx"
    />
    -->
    <tokenizer-model
        url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolve/main/tokenizer.json" />
    <normalize>true</normalize>
    <pooling-strategy>mean</pooling-strategy>
</component>

deploy

# FP16 model has a larger file size, which can result in longer deployment times.
vespa deploy --wait 1800 .

Tips: conver to int8 quantized

# https://github.com/vespa-engine/sample-apps/blob/master/simple-semantic-search/export_hf_model_from_hf.py
./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-large --output_dir me5-large
optimum-cli onnxruntime quantize --onnx_model ./me5-large  -o me5-large-large_quantized --avx512_vnni

Tips: convert to fp16

# https://github.com/vespa-engine/sample-apps/blob/master/simple-semantic-search/export_hf_model_from_hf.py
./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-large --output_dir me5-large
# https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/float16.py

import onnx
from onnxruntime.transformers.float16 import convert_float_to_float16

onnx_model = onnx.load("me5-large/intfloat-multilingual-e5-large.onnx")
model_fp16 = convert_float_to_float16(onnx_model, disable_shape_infer=True)
onnx.save(model_fp16, "me5-large/intfloat-multilingual-e5-large_fp16.onnx")

License

The license for this model is based on the original license (found in the LICENSE file in the project's root directory), which is the MIT License.

Attribution

All credits for this model go to the authors of Multilingual-E5-large and the associated researchers and organizations. When using this model, please be sure to attribute the original authors.

Downloads last month
5