import Image from '@theme/IdealImage';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Quick Start
Quick start CLI, Config, Docker
LiteLLM Server manages:
* **Unified Interface**: Calling 100+ LLMs [Huggingface/Bedrock/TogetherAI/etc.](#other-supported-models) in the OpenAI `ChatCompletions` & `Completions` format
* **Load Balancing**: between [Multiple Models](#multiple-models---quick-start) + [Deployments of the same model](#multiple-instances-of-1-model) - LiteLLM proxy can handle 1.5k+ requests/second during load tests.
* **Cost tracking**: Authentication & Spend Tracking [Virtual Keys](#managing-auth---virtual-keys)
[**See LiteLLM Proxy code**](https://github.com/BerriAI/litellm/tree/main/litellm/proxy)
View all the supported args for the Proxy CLI [here](https://docs.litellm.ai/docs/simple_proxy#proxy-cli-arguments)
```shell
$ pip install litellm[proxy]
```
If this fails try running
```shell
$ pip install 'litellm[proxy]'
```
## Quick Start - LiteLLM Proxy CLI
Run the following command to start the litellm proxy
```shell
$ litellm --model huggingface/bigcode/starcoder
#INFO: Proxy running on http://0.0.0.0:8000
```
### Test
In a new shell, run, this will make an `openai.chat.completions` request. Ensure you're using openai v1.0.0+
```shell
litellm --test
```
This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints.
### Using LiteLLM Proxy - Curl Request, OpenAI Package
```shell
curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}
'
```
```python
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:8000"
)
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
```
### Supported LLMs
All LiteLLM supported LLMs are supported on the Proxy. Seel all [supported llms](https://docs.litellm.ai/docs/providers)
```shell
$ export AWS_ACCESS_KEY_ID=
$ export AWS_REGION_NAME=
$ export AWS_SECRET_ACCESS_KEY=
```
```shell
$ litellm --model bedrock/anthropic.claude-v2
```
```shell
$ export AZURE_API_KEY=my-api-key
$ export AZURE_API_BASE=my-api-base
```
```
$ litellm --model azure/my-deployment-name
```
```shell
$ export OPENAI_API_KEY=my-api-key
```
```shell
$ litellm --model gpt-3.5-turbo
```
```shell
$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
```
```shell
$ litellm --model huggingface/ --api_base https://k58ory32yinf1ly0.us-east-1.aws.endpoints.huggingface.cloud
```
```shell
$ litellm --model huggingface/ --api_base http://0.0.0.0:8001
```
```shell
export AWS_ACCESS_KEY_ID=
export AWS_REGION_NAME=
export AWS_SECRET_ACCESS_KEY=
```
```shell
$ litellm --model sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b
```
```shell
$ export ANTHROPIC_API_KEY=my-api-key
```
```shell
$ litellm --model claude-instant-1
```
Assuming you're running vllm locally
```shell
$ litellm --model vllm/facebook/opt-125m
```
```shell
$ export TOGETHERAI_API_KEY=my-api-key
```
```shell
$ litellm --model together_ai/lmsys/vicuna-13b-v1.5-16k
```
```shell
$ export REPLICATE_API_KEY=my-api-key
```
```shell
$ litellm \
--model replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3
```
```shell
$ litellm --model petals/meta-llama/Llama-2-70b-chat-hf
```
```shell
$ export PALM_API_KEY=my-palm-key
```
```shell
$ litellm --model palm/chat-bison
```
```shell
$ export AI21_API_KEY=my-api-key
```
```shell
$ litellm --model j2-light
```
```shell
$ export COHERE_API_KEY=my-api-key
```
```shell
$ litellm --model command-nightly
```
## Quick Start - LiteLLM Proxy + Config.yaml
The config allows you to create a model list and set `api_base`, `max_tokens` (all litellm params). See more details about the config [here](https://docs.litellm.ai/docs/proxy/configs)
### Create a Config for LiteLLM Proxy
Example config
```yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/
api_base:
api_key:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key:
```
### Run proxy with config
```shell
litellm --config your_config.yaml
```
## Quick Start Docker Image: Github Container Registry
### Pull the litellm ghcr docker image
See the latest available ghcr docker image here:
https://github.com/berriai/litellm/pkgs/container/litellm
```shell
docker pull ghcr.io/berriai/litellm:main-v1.10.1
```
### Run the Docker Image
```shell
docker run ghcr.io/berriai/litellm:main-v1.10.0
```
#### Run the Docker Image with LiteLLM CLI args
See all supported CLI args [here](https://docs.litellm.ai/docs/proxy/cli):
Here's how you can run the docker image and pass your config to `litellm`
```shell
docker run ghcr.io/berriai/litellm:main-v1.10.0 --config your_config.yaml
```
Here's how you can run the docker image and start litellm on port 8002 with `num_workers=8`
```shell
docker run ghcr.io/berriai/litellm:main-v1.10.0 --port 8002 --num_workers 8
```
## Server Endpoints
- POST `/chat/completions` - chat completions endpoint to call 100+ LLMs
- POST `/completions` - completions endpoint
- POST `/embeddings` - embedding endpoint for Azure, OpenAI, Huggingface endpoints
- GET `/models` - available models on server
- POST `/key/generate` - generate a key to access the proxy
## Using with OpenAI compatible projects
Set `base_url` to the LiteLLM Proxy server
```python
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:8000"
)
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
```
#### Start the LiteLLM proxy
```shell
litellm --model gpt-3.5-turbo
#INFO: Proxy running on http://0.0.0.0:8000
```
#### 1. Clone the repo
```shell
git clone https://github.com/danny-avila/LibreChat.git
```
#### 2. Modify Librechat's `docker-compose.yml`
LiteLLM Proxy is running on port `8000`, set `8000` as the proxy below
```yaml
OPENAI_REVERSE_PROXY=http://host.docker.internal:8000/v1/chat/completions
```
#### 3. Save fake OpenAI key in Librechat's `.env`
Copy Librechat's `.env.example` to `.env` and overwrite the default OPENAI_API_KEY (by default it requires the user to pass a key).
```env
OPENAI_API_KEY=sk-1234
```
#### 4. Run LibreChat:
```shell
docker compose up
```
Continue-Dev brings ChatGPT to VSCode. See how to [install it here](https://continue.dev/docs/quickstart).
In the [config.py](https://continue.dev/docs/reference/Models/openai) set this as your default model.
```python
default=OpenAI(
api_key="IGNORED",
model="fake-model-name",
context_length=2048, # customize if needed for your model
api_base="http://localhost:8000" # your proxy server url
),
```
Credits [@vividfog](https://github.com/jmorganca/ollama/issues/305#issuecomment-1751848077) for this tutorial.
```shell
$ pip install aider
$ aider --openai-api-base http://0.0.0.0:8000 --openai-api-key fake-key
```
```python
pip install pyautogen
```
```python
from autogen import AssistantAgent, UserProxyAgent, oai
config_list=[
{
"model": "my-fake-model",
"api_base": "http://localhost:8000", #litellm compatible endpoint
"api_type": "open_ai",
"api_key": "NULL", # just a placeholder
}
]
response = oai.Completion.create(config_list=config_list, prompt="Hi")
print(response) # works fine
llm_config={
"config_list": config_list,
}
assistant = AssistantAgent("assistant", llm_config=llm_config)
user_proxy = UserProxyAgent("user_proxy")
user_proxy.initiate_chat(assistant, message="Plot a chart of META and TESLA stock price change YTD.", config_list=config_list)
```
Credits [@victordibia](https://github.com/microsoft/autogen/issues/45#issuecomment-1749921972) for this tutorial.
A guidance language for controlling large language models.
https://github.com/guidance-ai/guidance
**NOTE:** Guidance sends additional params like `stop_sequences` which can cause some models to fail if they don't support it.
**Fix**: Start your proxy using the `--drop_params` flag
```shell
litellm --model ollama/codellama --temperature 0.3 --max_tokens 2048 --drop_params
```
```python
import guidance
# set api_base to your proxy
# set api_key to anything
gpt4 = guidance.llms.OpenAI("gpt-4", api_base="http://0.0.0.0:8000", api_key="anything")
experts = guidance('''
{{#system~}}
You are a helpful and terse assistant.
{{~/system}}
{{#user~}}
I want a response to the following question:
{{query}}
Name 3 world-class experts (past or present) who would be great at answering this?
Don't answer the question yet.
{{~/user}}
{{#assistant~}}
{{gen 'expert_names' temperature=0 max_tokens=300}}
{{~/assistant}}
''', llm=gpt4)
result = experts(query='How can I be more productive?')
print(result)
```
## Debugging Proxy
Run the proxy with `--debug` to easily view debug logs
```shell
litellm --model gpt-3.5-turbo --debug
```
When making requests you should see the POST request sent by LiteLLM to the LLM on the Terminal output
```shell
POST Request Sent from LiteLLM:
curl -X POST \
https://api.openai.com/v1/chat/completions \
-H 'content-type: application/json' -H 'Authorization: Bearer sk-qnWGUIW9****************************************' \
-d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "this is a test request, write a short poem"}]}'
```
## Health Check LLMs on Proxy
Use this to health check all LLMs defined in your config.yaml
#### Request
```shell
curl --location 'http://0.0.0.0:8000/health'
```
You can also run `litellm -health` it makes a `get` request to `http://0.0.0.0:8000/health` for you
```
litellm --health
```
#### Response
```shell
{
"healthy_endpoints": [
{
"model": "azure/gpt-35-turbo",
"api_base": "https://my-endpoint-canada-berri992.openai.azure.com/"
},
{
"model": "azure/gpt-35-turbo",
"api_base": "https://my-endpoint-europe-berri-992.openai.azure.com/"
}
],
"unhealthy_endpoints": [
{
"model": "azure/gpt-35-turbo",
"api_base": "https://openai-france-1234.openai.azure.com/"
}
]
}
```