--- pipeline_tag: sentence-similarity tags: - sentence-similarity - sentence-transformers license: mit language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - no - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh --- A quantized version of [multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small). Quantization was performed per-layer under the same conditions as our ELSERv2 model, as described [here](https://www.elastic.co/search-labs/blog/articles/introducing-elser-v2-part-1#quantization). [Text Embeddings by Weakly-Supervised Contrastive Pre-training](https://arxiv.org/pdf/2212.03533.pdf). Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 ## Benchmarks We performed a number of small benchmarks to assess both the changes in quality as well as inference latency against the baseline original model. ### Quality Measuring NDCG@10 using the dev split of the MIRACL datasets for select languages, we see mostly a marginal change in quality of the quantized model. | | de | yo| ru | ar | es | th | | --- | --- | ---| --- | --- | --- | --- | | multilingual-e5-small | 0.75862 | 0.56193 | 0.80309 | 0.82778 | 0.81672 | 0.85072 | | multilingual-e5-small-optimized | 0.75992 | 0.48934 | 0.79668 | 0.82017 | 0.8135 | 0.84316 | To test the English out-of-domain performance, we used the test split of various datasets in the BEIR evaluation. Measuring NDCG@10, we see a larger change in SCIFACT, but marginal in the other datasets evaluated. | | FIQA | SCIFACT | nfcorpus | | --- | --- | --- | --- | | multilingual-e5-small | 0.33126 | 0.677 | 0.31004 | | multilingual-e5-small-optimized | 0.31734 | 0.65484 | 0.30126 | ### Performance Using a PyTorch model traced for Linux and Intel CPUs, we performed performance benchmarking with various lengths of input. Overall, we see on average a 50-20% performance improvement with the optimized model. | input length (characters) | multilingual-e5-small | multilingual-e5-small-optimized | speedup | | --- | --- | --- | --- | | 0 - 50 | 0.0181 | 0.00826 | 54.36% | | 50 - 100 | 0.0275 | 0.0164 | 40.36% | | 100 - 150 | 0.0366 | 0.0237 | 35.25% | | 150 - 200 | 0.0435 | 0.0301 | 30.80% | | 200 - 250 | 0.0514 | 0.0379 | 26.26% | | 250 - 300 | 0.0569 | 0.043 | 24.43% | | 300 - 350 | 0.0663 | 0.0513 | 22.62% | | 350 - 400 | 0.0737 | 0.0576 | 21.85% | ### Disclaimer Customers may add third party trained models for management in Elastic. These models are not owned by Elastic. While Elastic will support the integration with these models in the performance according to the documentation, you understand and agree that Elastic has no control over, or liability for, the third party models or the underlying training data they may utilize.