Call w/ LiteLLM
Hi @hunkim / @yoonniverse
What's the best way for me to deploy this model? I'd love to make a demo of this with LiteLLM - https://github.com/BerriAI/litellm.
Lite currently works with Replicate, Azure, Together.ai and HF Inference Endpoints.
I'm facing issues with HF Inference endpoints due to quota limitations, so curious if you've tried any other provider.
We will soon host our model on Together.ai. We will keep you updated.
Do you know how to integrate our model with https://github.com/BerriAI/litellm? We will make it work. Let us know.
Hey @hunkim we made it easy to proxy openai with any deployment solution - should unlock any provider you choose. - https://github.com/BerriAI/litellm/issues/120
import litellm
def translate_function(model, messages, max_tokens):
prompt = " ".join(message["content"] for message in messages)
max_new_tokens = max_tokens
return {"model": model, "prompt": prompt, "max_new_tokens": max_new_tokens}
openai.api_base = litellm.translate_api_call(custom_api_base, translate_function)
We already have a custom integration with together.ai, which supports streaming. Excited to put out a demo notebook/etc. once it's deployed.