import Image from '@theme/IdealImage'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Quick Start Quick start CLI, Config, Docker LiteLLM Server manages: * **Unified Interface**: Calling 100+ LLMs [Huggingface/Bedrock/TogetherAI/etc.](#other-supported-models) in the OpenAI `ChatCompletions` & `Completions` format * **Load Balancing**: between [Multiple Models](#multiple-models---quick-start) + [Deployments of the same model](#multiple-instances-of-1-model) - LiteLLM proxy can handle 1.5k+ requests/second during load tests. * **Cost tracking**: Authentication & Spend Tracking [Virtual Keys](#managing-auth---virtual-keys) [**See LiteLLM Proxy code**](https://github.com/BerriAI/litellm/tree/main/litellm/proxy) View all the supported args for the Proxy CLI [here](https://docs.litellm.ai/docs/simple_proxy#proxy-cli-arguments) ```shell $ pip install litellm[proxy] ``` If this fails try running ```shell $ pip install 'litellm[proxy]' ``` ## Quick Start - LiteLLM Proxy CLI Run the following command to start the litellm proxy ```shell $ litellm --model huggingface/bigcode/starcoder #INFO: Proxy running on http://0.0.0.0:8000 ``` ### Test In a new shell, run, this will make an `openai.chat.completions` request. Ensure you're using openai v1.0.0+ ```shell litellm --test ``` This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints. ### Using LiteLLM Proxy - Curl Request, OpenAI Package ```shell curl --location 'http://0.0.0.0:8000/chat/completions' \ --header 'Content-Type: application/json' \ --data ' { "model": "gpt-3.5-turbo", "messages": [ { "role": "user", "content": "what llm are you" } ], } ' ``` ```python import openai client = openai.OpenAI( api_key="anything", base_url="http://0.0.0.0:8000" ) # request sent to model set on litellm proxy, `litellm --model` response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [ { "role": "user", "content": "this is a test request, write a short poem" } ]) print(response) ``` ### Supported LLMs All LiteLLM supported LLMs are supported on the Proxy. Seel all [supported llms](https://docs.litellm.ai/docs/providers) ```shell $ export AWS_ACCESS_KEY_ID= $ export AWS_REGION_NAME= $ export AWS_SECRET_ACCESS_KEY= ``` ```shell $ litellm --model bedrock/anthropic.claude-v2 ``` ```shell $ export AZURE_API_KEY=my-api-key $ export AZURE_API_BASE=my-api-base ``` ``` $ litellm --model azure/my-deployment-name ``` ```shell $ export OPENAI_API_KEY=my-api-key ``` ```shell $ litellm --model gpt-3.5-turbo ``` ```shell $ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL] ``` ```shell $ litellm --model huggingface/ --api_base https://k58ory32yinf1ly0.us-east-1.aws.endpoints.huggingface.cloud ``` ```shell $ litellm --model huggingface/ --api_base http://0.0.0.0:8001 ``` ```shell export AWS_ACCESS_KEY_ID= export AWS_REGION_NAME= export AWS_SECRET_ACCESS_KEY= ``` ```shell $ litellm --model sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b ``` ```shell $ export ANTHROPIC_API_KEY=my-api-key ``` ```shell $ litellm --model claude-instant-1 ``` Assuming you're running vllm locally ```shell $ litellm --model vllm/facebook/opt-125m ``` ```shell $ export TOGETHERAI_API_KEY=my-api-key ``` ```shell $ litellm --model together_ai/lmsys/vicuna-13b-v1.5-16k ``` ```shell $ export REPLICATE_API_KEY=my-api-key ``` ```shell $ litellm \ --model replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3 ``` ```shell $ litellm --model petals/meta-llama/Llama-2-70b-chat-hf ``` ```shell $ export PALM_API_KEY=my-palm-key ``` ```shell $ litellm --model palm/chat-bison ``` ```shell $ export AI21_API_KEY=my-api-key ``` ```shell $ litellm --model j2-light ``` ```shell $ export COHERE_API_KEY=my-api-key ``` ```shell $ litellm --model command-nightly ``` ## Quick Start - LiteLLM Proxy + Config.yaml The config allows you to create a model list and set `api_base`, `max_tokens` (all litellm params). See more details about the config [here](https://docs.litellm.ai/docs/proxy/configs) ### Create a Config for LiteLLM Proxy Example config ```yaml model_list: - model_name: gpt-3.5-turbo litellm_params: model: azure/ api_base: api_key: - model_name: gpt-3.5-turbo litellm_params: model: azure/gpt-turbo-small-ca api_base: https://my-endpoint-canada-berri992.openai.azure.com/ api_key: ``` ### Run proxy with config ```shell litellm --config your_config.yaml ``` ## Quick Start Docker Image: Github Container Registry ### Pull the litellm ghcr docker image See the latest available ghcr docker image here: https://github.com/berriai/litellm/pkgs/container/litellm ```shell docker pull ghcr.io/berriai/litellm:main-v1.10.1 ``` ### Run the Docker Image ```shell docker run ghcr.io/berriai/litellm:main-v1.10.0 ``` #### Run the Docker Image with LiteLLM CLI args See all supported CLI args [here](https://docs.litellm.ai/docs/proxy/cli): Here's how you can run the docker image and pass your config to `litellm` ```shell docker run ghcr.io/berriai/litellm:main-v1.10.0 --config your_config.yaml ``` Here's how you can run the docker image and start litellm on port 8002 with `num_workers=8` ```shell docker run ghcr.io/berriai/litellm:main-v1.10.0 --port 8002 --num_workers 8 ``` ## Server Endpoints - POST `/chat/completions` - chat completions endpoint to call 100+ LLMs - POST `/completions` - completions endpoint - POST `/embeddings` - embedding endpoint for Azure, OpenAI, Huggingface endpoints - GET `/models` - available models on server - POST `/key/generate` - generate a key to access the proxy ## Using with OpenAI compatible projects Set `base_url` to the LiteLLM Proxy server ```python import openai client = openai.OpenAI( api_key="anything", base_url="http://0.0.0.0:8000" ) # request sent to model set on litellm proxy, `litellm --model` response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [ { "role": "user", "content": "this is a test request, write a short poem" } ]) print(response) ``` #### Start the LiteLLM proxy ```shell litellm --model gpt-3.5-turbo #INFO: Proxy running on http://0.0.0.0:8000 ``` #### 1. Clone the repo ```shell git clone https://github.com/danny-avila/LibreChat.git ``` #### 2. Modify Librechat's `docker-compose.yml` LiteLLM Proxy is running on port `8000`, set `8000` as the proxy below ```yaml OPENAI_REVERSE_PROXY=http://host.docker.internal:8000/v1/chat/completions ``` #### 3. Save fake OpenAI key in Librechat's `.env` Copy Librechat's `.env.example` to `.env` and overwrite the default OPENAI_API_KEY (by default it requires the user to pass a key). ```env OPENAI_API_KEY=sk-1234 ``` #### 4. Run LibreChat: ```shell docker compose up ``` Continue-Dev brings ChatGPT to VSCode. See how to [install it here](https://continue.dev/docs/quickstart). In the [config.py](https://continue.dev/docs/reference/Models/openai) set this as your default model. ```python default=OpenAI( api_key="IGNORED", model="fake-model-name", context_length=2048, # customize if needed for your model api_base="http://localhost:8000" # your proxy server url ), ``` Credits [@vividfog](https://github.com/jmorganca/ollama/issues/305#issuecomment-1751848077) for this tutorial. ```shell $ pip install aider $ aider --openai-api-base http://0.0.0.0:8000 --openai-api-key fake-key ``` ```python pip install pyautogen ``` ```python from autogen import AssistantAgent, UserProxyAgent, oai config_list=[ { "model": "my-fake-model", "api_base": "http://localhost:8000", #litellm compatible endpoint "api_type": "open_ai", "api_key": "NULL", # just a placeholder } ] response = oai.Completion.create(config_list=config_list, prompt="Hi") print(response) # works fine llm_config={ "config_list": config_list, } assistant = AssistantAgent("assistant", llm_config=llm_config) user_proxy = UserProxyAgent("user_proxy") user_proxy.initiate_chat(assistant, message="Plot a chart of META and TESLA stock price change YTD.", config_list=config_list) ``` Credits [@victordibia](https://github.com/microsoft/autogen/issues/45#issuecomment-1749921972) for this tutorial. A guidance language for controlling large language models. https://github.com/guidance-ai/guidance **NOTE:** Guidance sends additional params like `stop_sequences` which can cause some models to fail if they don't support it. **Fix**: Start your proxy using the `--drop_params` flag ```shell litellm --model ollama/codellama --temperature 0.3 --max_tokens 2048 --drop_params ``` ```python import guidance # set api_base to your proxy # set api_key to anything gpt4 = guidance.llms.OpenAI("gpt-4", api_base="http://0.0.0.0:8000", api_key="anything") experts = guidance(''' {{#system~}} You are a helpful and terse assistant. {{~/system}} {{#user~}} I want a response to the following question: {{query}} Name 3 world-class experts (past or present) who would be great at answering this? Don't answer the question yet. {{~/user}} {{#assistant~}} {{gen 'expert_names' temperature=0 max_tokens=300}} {{~/assistant}} ''', llm=gpt4) result = experts(query='How can I be more productive?') print(result) ``` ## Debugging Proxy Run the proxy with `--debug` to easily view debug logs ```shell litellm --model gpt-3.5-turbo --debug ``` When making requests you should see the POST request sent by LiteLLM to the LLM on the Terminal output ```shell POST Request Sent from LiteLLM: curl -X POST \ https://api.openai.com/v1/chat/completions \ -H 'content-type: application/json' -H 'Authorization: Bearer sk-qnWGUIW9****************************************' \ -d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "this is a test request, write a short poem"}]}' ``` ## Health Check LLMs on Proxy Use this to health check all LLMs defined in your config.yaml #### Request ```shell curl --location 'http://0.0.0.0:8000/health' ``` You can also run `litellm -health` it makes a `get` request to `http://0.0.0.0:8000/health` for you ``` litellm --health ``` #### Response ```shell { "healthy_endpoints": [ { "model": "azure/gpt-35-turbo", "api_base": "https://my-endpoint-canada-berri992.openai.azure.com/" }, { "model": "azure/gpt-35-turbo", "api_base": "https://my-endpoint-europe-berri-992.openai.azure.com/" } ], "unhealthy_endpoints": [ { "model": "azure/gpt-35-turbo", "api_base": "https://openai-france-1234.openai.azure.com/" } ] } ```