best way to deploy this as an API endpoint.

#21

by aigeek0x0 - opened Jan 24

Discussion

aigeek0x0

Jan 24

do we have equivalent of TGI for deploying this model?

intfloat

Owner Jan 26

Sorry, I don't have any recommendation on that.

aigeek0x0

Jan 26

then what is the best way to use it at scale?

dbarker9

Jan 27

At scale, no. If I get time I'd like to try getting this setup in the Nvidia Triton Inference server for that. In the interim, here's a quick gist I (err... ChatGPT) threw together for deploying it with fastapi: https://gist.github.com/dcbark01/b329e170d0473bbdfd012e04c17bcfd3

aigeek0x0

Jan 28

i tried this and it works great. thanks for sharing.

i was trying to figure out a way to implement this for generating embeddings for a vector database using langchain.

do we need to supply a "query"/"instruction" at the time of embedding as well?

dbarker9

Jan 29

No problem, glad it helped. You'd probably have to wrap it (yet another wrapper) with something like this to use it with langchain directly: https://python.langchain.com/docs/modules/model_io/llms/custom_llm

I need to go back and read the paper, but I think whether you supply an instruction or not is task dependent. If you're just wanting to compare similarity between two texts, then no, I don't think you need an instruction. Just compute embeddings for both docs, and then compute your similarity metric (e.g. cosine sim, etc.) over the 4096D embedding vectors.

aigeek0x0

Jan 30

i will check that out. it's a shame that we don't have a matured HF embedding server for this model yet.

dbarker9

Jan 30

I agree. For some reason it seems like embedding models / inference architecture has lagged behind the rest of the LLM ecosystem. I think part of that is because people assume you'll just compute the embeddings offline, and so time isn't that big of a deal. But at production scale, or if you're wanting to do stuff in real-time, it obviously matters a lot.

Jonathan0528

Feb 6

Seems like there is a commit with sentence-transformers on Salesforce/SFR-Embedding-Mistral.
In this case, we can use API servers like michaelfeil/infinity to host the embedding model.
I have tried to update the config files on local and should work, but seems still have some small bugs.
See if intfloat will add the model to sentence-transformers in future.

aigeek0x0

Feb 6

this looks promising, i will give it a try.

Kotano

Feb 11

Seems like there is a commit with sentence-transformers on Salesforce/SFR-Embedding-Mistral.
In this case, we can use API servers like michaelfeil/infinity to host the embedding model.
I have tried to update the config files on local and should work, but seems still have some small bugs.
See if intfloat will add the model to sentence-transformers in future.

Hello.
First of all thank you for this model.
Did Jonathan0528 use embedding models with michaelfeil/infinity with success. can you explain detail instruction for this. i was searching too for embedding ( e5mistral or similar) test host tool (locally). ( without docker )
thank youı.

Jonathan0528

Feb 19

Seems like there is a commit with sentence-transformers on Salesforce/SFR-Embedding-Mistral.
In this case, we can use API servers like michaelfeil/infinity to host the embedding model.
I have tried to update the config files on local and should work, but seems still have some small bugs.
See if intfloat will add the model to sentence-transformers in future.

Hello.
First of all thank you for this model.
Did Jonathan0528 use embedding models with michaelfeil/infinity with success. can you explain detail instruction for this. i was searching too for embedding ( e5mistral or similar) test host tool (locally). ( without docker )
thank youı.

Yes, I have built up the embedding servers successfully.
You can first try to follow the instructions on https://github.com/michaelfeil/infinity to install it and download the embedding models from hugging face to your local machine first.
After installing the package and downloading your model, you can run command like:

infinity_emb --model-name-or-path="/models/multilingual-e5-large"

Please find more details / raise issues on the official github repository or with command infinity_emb --help.

subhrajit-mohanty

May 23

How to use text-embeddings-inference to deploy the model?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment