question: do I need to host two instance?

by hiepxanh - opened Feb 18, 2024

Feb 18, 2024

I would like to test and implement it. Right now the issue is most inference and API platform only support 1 model to Embedding, or Text Generation. So as your mention, I have to host 2 API only using the same 1 model? I mean I need to create something new to run it, or I can use some inference like llama.cpp or something?

Muennighoff

GritLM org Feb 18, 2024

Not sure - If the inference platform allows you to get the final hidden states from your text generation model then you can only host a text generation endpoint and either use it for gen or get final hidden states from it and then average them across sequence length to get the embedding
I think something like llama.cpp should work

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment