Skip to main content

Quick Start

Quick start CLI, Config, Docker

LiteLLM Server manages:

See LiteLLM Proxy code

View all the supported args for the Proxy CLI here

$ pip install litellm[proxy]

If this fails try running

$ pip install 'litellm[proxy]'

Quick Start - LiteLLM Proxy CLI​

Run the following command to start the litellm proxy

$ litellm --model huggingface/bigcode/starcoder

#INFO: Proxy running on http://0.0.0.0:8000

Test​

In a new shell, run, this will make an openai.chat.completions request. Ensure you're using openai v1.0.0+

litellm --test

This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints.

Using LiteLLM Proxy - Curl Request, OpenAI Package, Langchain, Langchain JS​

curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}
'

Supported LLMs​

All LiteLLM supported LLMs are supported on the Proxy. Seel all supported llms

$ export AWS_ACCESS_KEY_ID=
$ export AWS_REGION_NAME=
$ export AWS_SECRET_ACCESS_KEY=
$ litellm --model bedrock/anthropic.claude-v2

Quick Start - LiteLLM Proxy + Config.yaml​

The config allows you to create a model list and set api_base, max_tokens (all litellm params). See more details about the config here

Create a Config for LiteLLM Proxy​

Example config

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/<your-deployment-name>
api_base: <your-azure-api-endpoint>
api_key: <your-azure-api-key>
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: <your-azure-api-key>

Run proxy with config​

litellm --config your_config.yaml

Quick Start Docker Image: Github Container Registry​

Pull the litellm ghcr docker image​

See the latest available ghcr docker image here: https://github.com/berriai/litellm/pkgs/container/litellm

docker pull ghcr.io/berriai/litellm:main-v1.10.1

Run the Docker Image​

docker run ghcr.io/berriai/litellm:main-v1.10.0

Run the Docker Image with LiteLLM CLI args​

See all supported CLI args here:

Here's how you can run the docker image and pass your config to litellm

docker run ghcr.io/berriai/litellm:main-v1.10.0 --config your_config.yaml

Here's how you can run the docker image and start litellm on port 8002 with num_workers=8

docker run ghcr.io/berriai/litellm:main-v1.10.0 --port 8002 --num_workers 8

Server Endpoints​

  • POST /chat/completions - chat completions endpoint to call 100+ LLMs
  • POST /completions - completions endpoint
  • POST /embeddings - embedding endpoint for Azure, OpenAI, Huggingface endpoints
  • GET /models - available models on server
  • POST /key/generate - generate a key to access the proxy

Using with OpenAI compatible projects​

Set base_url to the LiteLLM Proxy server

import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:8000"
)

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])

print(response)

Debugging Proxy​

Run the proxy with --debug to easily view debug logs

litellm --model gpt-3.5-turbo --debug

When making requests you should see the POST request sent by LiteLLM to the LLM on the Terminal output

POST Request Sent from LiteLLM:
curl -X POST \
https://api.openai.com/v1/chat/completions \
-H 'content-type: application/json' -H 'Authorization: Bearer sk-qnWGUIW9****************************************' \
-d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "this is a test request, write a short poem"}]}'