Call w/ LiteLLM

#8
by krrish-litellm - opened

Hi @hunkim / @yoonniverse

What's the best way for me to deploy this model? I'd love to make a demo of this with LiteLLM - https://github.com/BerriAI/litellm.

Lite currently works with Replicate, Azure, Together.ai and HF Inference Endpoints.

I'm facing issues with HF Inference endpoints due to quota limitations, so curious if you've tried any other provider.

upstage org

We will soon host our model on Together.ai. We will keep you updated.

Do you know how to integrate our model with https://github.com/BerriAI/litellm? We will make it work. Let us know.

Hey @hunkim we made it easy to proxy openai with any deployment solution - should unlock any provider you choose. - https://github.com/BerriAI/litellm/issues/120

import litellm 
def translate_function(model, messages, max_tokens):
    prompt = " ".join(message["content"] for message in messages)
    max_new_tokens = max_tokens
    return {"model": model, "prompt": prompt, "max_new_tokens": max_new_tokens} 

openai.api_base = litellm.translate_api_call(custom_api_base, translate_function)

We already have a custom integration with together.ai, which supports streaming. Excited to put out a demo notebook/etc. once it's deployed.

Sign up or log in to comment