best way to deploy this as an API endpoint.

#21
by aigeek0x0 - opened

do we have equivalent of TGI for deploying this model?

Sorry, I don't have any recommendation on that.

then what is the best way to use it at scale?

At scale, no. If I get time I'd like to try getting this setup in the Nvidia Triton Inference server for that. In the interim, here's a quick gist I (err... ChatGPT) threw together for deploying it with fastapi: https://gist.github.com/dcbark01/b329e170d0473bbdfd012e04c17bcfd3

i tried this and it works great. thanks for sharing.

i was trying to figure out a way to implement this for generating embeddings for a vector database using langchain.

do we need to supply a "query"/"instruction" at the time of embedding as well?

No problem, glad it helped. You'd probably have to wrap it (yet another wrapper) with something like this to use it with langchain directly: https://python.langchain.com/docs/modules/model_io/llms/custom_llm

I need to go back and read the paper, but I think whether you supply an instruction or not is task dependent. If you're just wanting to compare similarity between two texts, then no, I don't think you need an instruction. Just compute embeddings for both docs, and then compute your similarity metric (e.g. cosine sim, etc.) over the 4096D embedding vectors.

i will check that out. it's a shame that we don't have a matured HF embedding server for this model yet.

I agree. For some reason it seems like embedding models / inference architecture has lagged behind the rest of the LLM ecosystem. I think part of that is because people assume you'll just compute the embeddings offline, and so time isn't that big of a deal. But at production scale, or if you're wanting to do stuff in real-time, it obviously matters a lot.

Seems like there is a commit with sentence-transformers on Salesforce/SFR-Embedding-Mistral.
In this case, we can use API servers like michaelfeil/infinity to host the embedding model.
I have tried to update the config files on local and should work, but seems still have some small bugs.
See if intfloat will add the model to sentence-transformers in future.

this looks promising, i will give it a try.

Seems like there is a commit with sentence-transformers on Salesforce/SFR-Embedding-Mistral.
In this case, we can use API servers like michaelfeil/infinity to host the embedding model.
I have tried to update the config files on local and should work, but seems still have some small bugs.
See if intfloat will add the model to sentence-transformers in future.

Hello.
First of all thank you for this model.
Did Jonathan0528 use embedding models with michaelfeil/infinity with success. can you explain detail instruction for this. i was searching too for embedding ( e5mistral or similar) test host tool (locally). ( without docker )
thank youı.

Seems like there is a commit with sentence-transformers on Salesforce/SFR-Embedding-Mistral.
In this case, we can use API servers like michaelfeil/infinity to host the embedding model.
I have tried to update the config files on local and should work, but seems still have some small bugs.
See if intfloat will add the model to sentence-transformers in future.

Hello.
First of all thank you for this model.
Did Jonathan0528 use embedding models with michaelfeil/infinity with success. can you explain detail instruction for this. i was searching too for embedding ( e5mistral or similar) test host tool (locally). ( without docker )
thank youı.

Yes, I have built up the embedding servers successfully.
You can first try to follow the instructions on https://github.com/michaelfeil/infinity to install it and download the embedding models from hugging face to your local machine first.
After installing the package and downloading your model, you can run command like:

infinity_emb --model-name-or-path="/models/multilingual-e5-large"

Please find more details / raise issues on the official github repository or with command infinity_emb --help.

Sign up or log in to comment