AI Control Plane - Unified LLM Infrastructure

Interactive Playground

Chat with models, explore API capabilities, and get code samples

AI Control Plane Playground

Chat

API Demos

Code Samples

Metrics

Select Model

GPT-5 Mini $0.10/1M

GPT-5 $2.00/1M

Claude Haiku 4.5 $0.20/1M

Claude Sonnet 4.5 $3.00/1M

Gemini 3 Flash $0.05/1M

Grok 3 $1.00/1M

DeepSeek V3 $0.07/1M

Llama 3.1 8B Free (local)

Your Prompt

Cost Prediction

Estimated Input Tokens ~25

Estimated Output Tokens ~200

Direct API Cost $0.000150

With AI Control Plane $0.000124 Save 17%

GPT-5 Mini -

Input: 0 Output: 0 Cost: $0.00 Latency: 0ms

Cost Predictor

Policy Router

Semantic Cache

MCP Gateway

Cost Prediction API

Estimate the cost of a request before sending it. Get input/output token estimates and cost breakdown by model.

Prompt Text

Model

Intelligent Policy Routing

See how the gateway routes requests based on Cedar policies. Different prompts trigger different routing rules.

Test Scenario

Team ID

Semantic Cache

Test the semantic similarity cache. Similar prompts return cached responses instantly, saving cost and latency.

Original Query (cached)

Test Query

MCP Tool Federation

View available MCP servers and their tools. Agents can access databases, file systems, and external APIs through a unified interface.

PostgreSQL

Query database, list tables, execute SQL

Online

Filesystem

Read files, list directories, search

Online

GitHub

Create issues, PRs, manage repos

Online

Web Fetch

Fetch URLs, scrape content, API calls

Online

Python

JavaScript

Java

cURL

Using OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="sk-litellm-master-key-dev",
    base_url="http://localhost:4000/v1"
)

response = client.chat.completions.create(
    model="gpt-5-mini",
    messages=[
        {"role": "user", "content": "Hello, world!"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Using Requests

import requests

response = requests.post(
    "http://localhost:4000/v1/chat/completions",
    headers={
        "Authorization": "Bearer sk-litellm-master-key-dev",
        "Content-Type": "application/json"
    },
    json={
        "model": "gpt-5-mini",
        "messages": [{"role": "user", "content": "Hello!"}],
        "max_tokens": 100
    }
)

print(response.json()["choices"][0]["message"]["content"])

Using OpenAI SDK

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-litellm-master-key-dev',
  baseURL: 'http://localhost:4000/v1'
});

const stream = await client.chat.completions.create({
  model: 'gpt-5-mini',
  messages: [{ role: 'user', content: 'Hello, world!' }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Using Fetch API

const response = await fetch('http://localhost:4000/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk-litellm-master-key-dev',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-5-mini',
    messages: [{ role: 'user', content: 'Hello!' }],
    stream: true
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  console.log(decoder.decode(value));
}

Using net/http

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
)

func main() {
    payload := map[string]interface{}{
        "model": "gpt-5-mini",
        "messages": []map[string]string{
            {"role": "user", "content": "Hello!"},
        },
        "max_tokens": 100,
    }

    body, _ := json.Marshal(payload)
    req, _ := http.NewRequest("POST",
        "http://localhost:4000/v1/chat/completions",
        bytes.NewBuffer(body))

    req.Header.Set("Authorization", "Bearer sk-litellm-master-key-dev")
    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    var result map[string]interface{}
    json.NewDecoder(resp.Body).Decode(&result)
    fmt.Println(result)
}

Using HttpClient

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;

public class GatewayExample {
    public static void main(String[] args) throws Exception {
        String json = """
            {
                "model": "gpt-5-mini",
                "messages": [{"role": "user", "content": "Hello!"}],
                "max_tokens": 100
            }
            """;

        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create("http://localhost:4000/v1/chat/completions"))
            .header("Authorization", "Bearer sk-litellm-master-key-dev")
            .header("Content-Type", "application/json")
            .POST(HttpRequest.BodyPublishers.ofString(json))
            .build();

        HttpResponse<String> response = client.send(request,
            HttpResponse.BodyHandlers.ofString());

        System.out.println(response.body());
    }
}

Basic Request

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-litellm-master-key-dev" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5-mini",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 100
  }'

Streaming Request

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-litellm-master-key-dev" \
  -H "Content-Type: application/json" \
  -N --no-buffer \
  -d '{
    "model": "gpt-5-mini",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

List Models

curl http://localhost:4000/v1/models \
  -H "Authorization: Bearer sk-litellm-master-key-dev"

Live Metrics Dashboard

Open Grafana

Overview

Cost Analysis

Latency

Providers

Requests/min

Success Rate

Avg Latency

Cost Today

Cost by Model (Today)

GPT-5 Mini

$12.45

Claude Sonnet

$8.20

Gemini Flash

$4.10

DeepSeek V3

$2.75

Total Today

$27.50

Savings

$11.20

P50 Latency

245ms

P95 Latency

890ms

P99 Latency

1.2s

Latency by Provider

OpenAI

320ms

Anthropic

420ms

Google

210ms

Local

85ms

OpenAI

Healthy

1.2k req/h 99.9%

Anthropic

Healthy

850 req/h 99.8%

Google

Healthy

620 req/h 99.7%

xAI

Healthy

340 req/h 99.5%

DeepSeek

Healthy

280 req/h 99.6%

Ollama

Healthy

150 req/h 100%

Auto-refresh: 30s

Session Metrics

Total Requests

This session

Money Saved

$0.00

vs direct API

Total Spent

$0.00

Through gateway

Avg Latency

0ms

Response time

Provider Usage

OpenAI

Anthropic

Google

xAI

DeepSeek

Local

Admin Dashboard

Manage policies, budgets, teams, and workflows

Grafana Analytics

Metrics, cost trends, and performance dashboards

Temporal Workflows

Monitor agent orchestration and execution history

Unified AI Control Plane Platform

Interactive Playground

Cost Prediction

Cost Prediction API

Intelligent Policy Routing

Semantic Cache

MCP Tool Federation

PostgreSQL

Filesystem

GitHub

Web Fetch

Live Metrics Dashboard

Admin Dashboard

Grafana Analytics

Temporal Workflows

Enterprise-Ready Features

Cedar Policy Routing

Semantic Caching

LangGraph Workflows

A2A Agent Orchestration

9 Provider Integrations

Visual Admin UI

FinOps & Budgets

Full Observability

MCP Tool Federation