Live Demo Environment

Unified AI Control Plane Platform

100+ models across 9 providers. Route to OpenAI, Anthropic, Google, xAI, DeepSeek, Bedrock, Vertex AI, Azure, and Ollama with intelligent routing and cost control.

100+
LLM Models
9
Providers
40%
Cost Savings
92%
Cache Hit Rate
5
Workflow Patterns

Interactive Playground

Chat with models, explore API capabilities, and get code samples

AI Control Plane Playground
Chat
API Demos
Code Samples
Metrics
GPT-5 Mini $0.10/1M
GPT-5 $2.00/1M
Claude Haiku 4.5 $0.20/1M
Claude Sonnet 4.5 $3.00/1M
Gemini 3 Flash $0.05/1M
Grok 3 $1.00/1M
DeepSeek V3 $0.07/1M
Llama 3.1 8B Free (local)

Cost Prediction

Estimated Input Tokens ~25
Estimated Output Tokens ~200
Direct API Cost $0.000150
With AI Control Plane $0.000124 Save 17%
GPT-5 Mini -
Input: 0 Output: 0 Cost: $0.00 Latency: 0ms
Cost Predictor
Policy Router
Semantic Cache
MCP Gateway

Cost Prediction API

Estimate the cost of a request before sending it. Get input/output token estimates and cost breakdown by model.

Intelligent Policy Routing

See how the gateway routes requests based on Cedar policies. Different prompts trigger different routing rules.

Semantic Cache

Test the semantic similarity cache. Similar prompts return cached responses instantly, saving cost and latency.

MCP Tool Federation

View available MCP servers and their tools. Agents can access databases, file systems, and external APIs through a unified interface.

PostgreSQL

Query database, list tables, execute SQL

Online
Filesystem

Read files, list directories, search

Online
GitHub

Create issues, PRs, manage repos

Online
Web Fetch

Fetch URLs, scrape content, API calls

Online
Python
JavaScript
Go
Java
cURL
Using OpenAI SDK
from openai import OpenAI

client = OpenAI(
    api_key="sk-litellm-master-key-dev",
    base_url="http://localhost:4000/v1"
)

response = client.chat.completions.create(
    model="gpt-5-mini",
    messages=[
        {"role": "user", "content": "Hello, world!"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Using Requests
import requests

response = requests.post(
    "http://localhost:4000/v1/chat/completions",
    headers={
        "Authorization": "Bearer sk-litellm-master-key-dev",
        "Content-Type": "application/json"
    },
    json={
        "model": "gpt-5-mini",
        "messages": [{"role": "user", "content": "Hello!"}],
        "max_tokens": 100
    }
)

print(response.json()["choices"][0]["message"]["content"])
Using OpenAI SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-litellm-master-key-dev',
  baseURL: 'http://localhost:4000/v1'
});

const stream = await client.chat.completions.create({
  model: 'gpt-5-mini',
  messages: [{ role: 'user', content: 'Hello, world!' }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
Using Fetch API
const response = await fetch('http://localhost:4000/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk-litellm-master-key-dev',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-5-mini',
    messages: [{ role: 'user', content: 'Hello!' }],
    stream: true
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  console.log(decoder.decode(value));
}
Using net/http
package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
)

func main() {
    payload := map[string]interface{}{
        "model": "gpt-5-mini",
        "messages": []map[string]string{
            {"role": "user", "content": "Hello!"},
        },
        "max_tokens": 100,
    }

    body, _ := json.Marshal(payload)
    req, _ := http.NewRequest("POST",
        "http://localhost:4000/v1/chat/completions",
        bytes.NewBuffer(body))

    req.Header.Set("Authorization", "Bearer sk-litellm-master-key-dev")
    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    var result map[string]interface{}
    json.NewDecoder(resp.Body).Decode(&result)
    fmt.Println(result)
}
Using HttpClient
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;

public class GatewayExample {
    public static void main(String[] args) throws Exception {
        String json = """
            {
                "model": "gpt-5-mini",
                "messages": [{"role": "user", "content": "Hello!"}],
                "max_tokens": 100
            }
            """;

        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create("http://localhost:4000/v1/chat/completions"))
            .header("Authorization", "Bearer sk-litellm-master-key-dev")
            .header("Content-Type", "application/json")
            .POST(HttpRequest.BodyPublishers.ofString(json))
            .build();

        HttpResponse<String> response = client.send(request,
            HttpResponse.BodyHandlers.ofString());

        System.out.println(response.body());
    }
}
Basic Request
curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-litellm-master-key-dev" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5-mini",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 100
  }'
Streaming Request
curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-litellm-master-key-dev" \
  -H "Content-Type: application/json" \
  -N --no-buffer \
  -d '{
    "model": "gpt-5-mini",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'
List Models
curl http://localhost:4000/v1/models \
  -H "Authorization: Bearer sk-litellm-master-key-dev"

Live Metrics Dashboard

Open Grafana
Overview
Cost Analysis
Latency
Providers
Requests/min
--
Success Rate
--
Avg Latency
--
Cost Today
--
Cost by Model (Today)
GPT-5 Mini
$12.45
Claude Sonnet
$8.20
Gemini Flash
$4.10
DeepSeek V3
$2.75
Total Today
$27.50
Savings
$11.20
P50 Latency
245ms
P95 Latency
890ms
P99 Latency
1.2s
Latency by Provider
OpenAI
320ms
Anthropic
420ms
Google
210ms
Local
85ms
OpenAI
Healthy
1.2k req/h 99.9%
Anthropic
Healthy
850 req/h 99.8%
Google
Healthy
620 req/h 99.7%
xAI
Healthy
340 req/h 99.5%
DeepSeek
Healthy
280 req/h 99.6%
Ollama
Healthy
150 req/h 100%
Auto-refresh: 30s
Session Metrics
Total Requests
0
This session
Money Saved
$0.00
vs direct API
Total Spent
$0.00
Through gateway
Avg Latency
0ms
Response time
Provider Usage
OpenAI
0%
Anthropic
0%
Google
0%
xAI
0%
DeepSeek
0%
Local
0%

Admin Dashboard

Manage policies, budgets, teams, and workflows

Grafana Analytics

Metrics, cost trends, and performance dashboards

Temporal Workflows

Monitor agent orchestration and execution history

Enterprise-Ready Features

Everything you need to manage AI at scale

Cedar Policy Routing

Intelligent model selection based on cost, latency SLAs, team quotas, and error rates. Hot-reload policies without restarts.

Semantic Caching

Cache responses by semantic similarity (92% threshold). Save 40%+ on repeated prompts. TTL-based expiration.

LangGraph Workflows

Pre-built templates: Research, Coding, Data Analysis. PostgreSQL checkpointing for resumable execution.

A2A Agent Orchestration

Temporal-powered agent coordination: Sequential, Parallel, Supervisor patterns. Human-in-the-loop approvals.

9 Provider Integrations

OpenAI, Anthropic, Google, xAI, DeepSeek, AWS Bedrock, Vertex AI, Azure OpenAI, Ollama. Unified interface for all.

Visual Admin UI

Policy editor, workflow designer, budget dashboards, team management. Configure everything through a modern React UI.

FinOps & Budgets

Per-request cost prediction, team budgets, soft/hard limits, alerts. Real-time spend tracking with full audit trail.

Full Observability

Prometheus metrics, Grafana dashboards, Jaeger tracing. OpenTelemetry instrumentation across all services.

MCP Tool Federation

Connect agents to databases, file systems, APIs via MCP. Circuit breakers, rate limiting, audit logging built-in.