Complete Platform Documentation

Intelligent LLM Selection
Platform Documentation

Learn how Route42 uses machine learning to automatically route every prompt to the optimal model across seven preference modes—balancing quality, speed, cost, privacy, and coding performance with zero manual configuration.

Privacy & Terms Highlights

Before you dive into the docs, here are the key points from our Terms and Privacy policy:

  • Prompt/response handling: Prompts and AI responses are not persisted in local interaction history; routing metadata is stored for operations and improvement.
  • Local personalization: Training for personalization happens locally on your Windows machine; no training data is uploaded.
  • Windows app: Route42 is a Windows desktop application with local data control.
  • Model routing disclaimer: Automated model selection is probabilistic and not guaranteed optimal.
Executive Summary

What is Route42?

Route42 is an intelligent routing platform that automatically selects the optimal Large Language Model for each user query. The system employs machine learning-based complexity analysis, multi-criteria scoring algorithms, and user preference learning to route requests to the most appropriate model from a catalog of 870+ models across 12 providers.

870+
AI Models Supported
<50ms
Routing Decision Time
85%
Average Cost Reduction

The platform addresses a fundamental challenge in AI application development: different queries require different models. Simple questions benefit from fast, cost-effective models, while complex tasks demand high-capacity models. Route42 makes this decision automatically, optimizing for quality, speed, and cost based on each query's characteristics and user preferences.

The Problem We Solve

Developers waste thousands of dollars sending simple queries to premium models, while complex tasks fail when routed to insufficient models. Route42 eliminates manual model selection by using ML to match each prompt with its optimal executor.

Authentication & Security

API Authentication

Authentication is required for protected endpoints using a Bearer token. Read-only, health, and onboarding endpoints are intentionally public. Each user receives a unique API key upon registration, stored securely in the local SQLite database.

Endpoint Access Model

Public / Onboarding

Health checks, compatibility discovery, and selected onboarding-friendly routes are available without Bearer auth.

Authenticated

Chat/completions, provider key management, profile data, and user-specific routing actions require Bearer authentication.

Key Features

  • Per-user API key generation (32-character cryptographically secure tokens)
  • Bearer token authentication (Authorization: Bearer <api_key>)
  • Automatic user context injection for downstream request handling
  • Per-user provider API key management (maintain your own OpenAI, Anthropic, etc. credentials)
  • Local-only data storage with no external telemetry or key exposure
// Example authenticated request
POST /api/chat/completions
Authorization: Bearer a1b2c3d4e5f6...
Content-Type: application/json

The user ID is extracted from the Bearer token automatically—no need to pass it in the URL.

Core Features

Platform Capabilities

1. ML-Based Complexity Detection

The system employs trained machine learning models to analyze prompts and predict computational complexity on a 0-1 scale. The analysis examines textual, structural, and semantic features to determine whether a query requires a lightweight model or a high-capability model.

Low Complexity (0.0-0.3)
"What is 2+2?" → Routes to cost-effective models
High Complexity (0.7-1.0)
"Design a microservices architecture..." → Routes to premium models

2. Comprehensive Model Catalog

Route42 maintains a continuously updated database of 870+ language models across 12 providers with detailed performance metrics from industry benchmarks and provider APIs. Supported providers include:

OpenAI Anthropic Google/Gemini Groq Mistral DeepSeek OpenRouter Alibaba/DashScope Moonshot AI NVIDIA Z.ai (Zhipu) Ollama

Each model entry includes:

Quality scores from MMLU, HumanEval benchmarks
Performance metrics (latency, throughput)
Real-time pricing data
Capability flags (code, vision, context length)

3. Intelligent Multi-Criteria Ranking

Model selection employs a sophisticated scoring system that balances quality, speed, and cost according to user preferences and query complexity. The ranking algorithm:

1. Analyzes each prompt for complexity and domain
2. Filters candidates by quality thresholds
3. Adjusts scoring weights dynamically
4. Selects the optimal model for execution

4. User Preference System

Users select from seven preference modes that define routing behavior:

Balanced Mode

Automatically adapts routing based on query complexity

Quality Mode

Prioritizes model capability over cost

Fast Mode

Emphasizes response latency and throughput

Cheap Mode

Minimizes financial cost per request

Local-First Mode

Strongly prefers local-capable models before cloud escalation

Privacy Mode

Prioritizes privacy-preserving routes and stricter provider filtering

Coder Mode

Prioritizes code-specialized and tool-capable models for dev workflows

Additional Preference Controls

Max Response Tokens — Set a cap on response tokens to control output length and cost
Cost Limit per Request — Define maximum cost per request in cents to enforce budget constraints

5. Per-User API Key Management

Each user maintains individual API keys for preferred LLM providers. The system stores encrypted credentials in the local SQLite database and automatically filters available models based on which providers the user has configured.

Multi-tenant isolation in local deployment scenarios
Flexible provider adoption (choose which providers to enable)
Cost accountability (each user's usage bills to their own credentials)
Security through key compartmentalization
Live API key validation — Keys are tested against provider APIs before saving, showing validity status and latency

6. Privacy Judge (LLM-as-a-Judge)

Route42 includes an optional privacy judge step that evaluates routing/output behavior against privacy constraints. The judge prompt template is user-editable from settings, so teams can tune policy language to match internal standards.

Use this for higher-assurance workflows where privacy policy checks should be explicit, auditable, and configurable.

7. Local Model Integration

Route42 provides first-class Ollama integration for private inference. LM Studio and similar local runtimes can be used through OpenAI-compatible custom endpoints. Ollama models are auto-discovered and ranked with local-cost advantages.

Zero-Cost Inference

Run models on your hardware without API bills

Complete Privacy

No external API calls for local completions

Offline Operation

Works without internet connectivity

Performance Benchmarking

Integrate custom quality metrics

API Reference

Chat Completion API

The primary user-facing endpoint is the chat completion API, which automatically selects the optimal model and returns OpenAI-compatible responses with full streaming support.

Client Compatibility

Route42 supports both major client protocol styles:

  • • OpenAI-compatible chat endpoint: /api/chat/completions
  • • Anthropic/Claude-compatible messages endpoint: /v1/messages

Streaming Request

Request
POST /api/chat/completions
Authorization: Bearer a1b2c3d4e5f6...
Content-Type: application/json

{
"messages": [
{"role": "user", "content": "Explain quantum computing"}
],
"stream": true
}
Response
HTTP/1.1 200 OK
Content-Type: text/event-stream

data: {"id":"chatcmpl-123","model":"gemini-3-flash",...}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Imagine"}}]}
data: [DONE]

Non-Streaming Request

Request
POST /api/chat/completions
Authorization: Bearer a1b2c3d4e5f6...

{
"messages": [{"role": "user", "content": "What is 2+2?"}],
"stream": false
}
Response
{
"id": "chatcmpl-456",
"model": "gpt-4-mini",
"choices": [{
"message": {"content": "2+2 equals 4."}
}],
"usage": {"total_tokens": 18}
}

Python Integration Example

import requests
import json

API_KEY = "a1b2c3d4e5f6..."
BASE_URL = "http://localhost:4242"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}

# Streaming completion
response = requests.post(
f"{BASE_URL}/api/chat/completions",
headers=HEADERS,
json={"messages": [...], "stream": True},
stream=True
)

for line in response.iter_lines():
if line:
chunk = json.loads(line.decode('utf-8').removeprefix('data: '))
print(chunk['choices'][0]['delta'].get('content', ''), end='')
Claude Code Integration

Using Route42 with Claude Code

Claude Code is Anthropic's official CLI for agentic software development. Route42 natively supports Anthropic-compatible calling via /v1/messages, so Claude Code can call Route42 directly. Simple tasks (file edits, test generation, docstrings) run on free local models while complex multi-step reasoning automatically bursts to Claude Sonnet 4.5 or Claude Opus.

Simple Tasks → Free

Add tests, docstrings, format code → local DeepSeek-Coder or Llama (0 tokens spent)

Route42 Decides

ML scores prompt complexity 0–100 in milliseconds — invisible to Claude Code

Complex Tasks → Claude

Architecture, debugging, multi-file refactors → Claude Sonnet/Opus via Anthropic API

Setup: 2 Steps

Step 1 — Start Route42, then configure Claude Code env vars
# PowerShell (session only) — preferred
$env:ANTHROPIC_BASE_URL = "http://localhost:4242"
$env:ANTHROPIC_API_KEY = "sk-ant-..."
# Optional alternative to API key:
$env:ANTHROPIC_AUTH_TOKEN = "your-token"
$env:ANTHROPIC_MODEL = "auto"

# PowerShell (persistent)
setx ANTHROPIC_BASE_URL "http://localhost:4242"
setx ANTHROPIC_API_KEY "sk-ant-..."
setx ANTHROPIC_AUTH_TOKEN "your-token"
setx ANTHROPIC_MODEL "auto"

# Windows CMD (session only)
set ANTHROPIC_BASE_URL=http://localhost:4242
set ANTHROPIC_API_KEY=sk-ant-...
set ANTHROPIC_AUTH_TOKEN=your-token
set ANTHROPIC_MODEL=auto

# Bash/zsh (session only)
export ANTHROPIC_BASE_URL=http://localhost:4242
export ANTHROPIC_API_KEY=sk-ant-...
export ANTHROPIC_AUTH_TOKEN=your-token
export ANTHROPIC_MODEL=auto
Step 2 — Use Claude Code normally
# Works exactly as before — Route42 is transparent
claude "Add unit tests for src/auth/parser.go"
# → Complexity 18/100 → Local DeepSeek-Coder (free)

claude "Redesign the session store to support horizontal scaling"
# → Complexity 94/100 → Claude Sonnet 4.5 (auto-escalated)

Persistent Config (shell profile)

~/.bashrc or ~/.zshrc
# Route42 + Claude Code — auto-route every session
export ANTHROPIC_BASE_URL=http://localhost:4242
export ANTHROPIC_API_KEY=sk-ant-...
export ANTHROPIC_MODEL=auto

Route42 must be running locally (port 4242) before starting Claude Code. Correct variable names are ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN (optional bearer token), and ANTHROPIC_MODEL (set to auto by default). Do not use ANTHROPIC_TOKEN or MODLE.

What Gets Routed Where

Claude Code Task Complexity Routed To Cost
Add docstrings to a function 12/100 Local Llama-3 $0.00
Generate unit tests for a helper 21/100 Local DeepSeek-Coder $0.00
Explain a complex algorithm 55/100 Cloud (cheapest qualified) ~$0.002
Debug a distributed race condition 91/100 Claude Sonnet 4.5 ~$0.015
Architect a zero-trust auth system 97/100 Claude Opus ~$0.075
Selection Pipeline

How Route42 Selects Models

The completion API executes an intelligent model selection pipeline for each request:

1

Prompt Analysis

The system analyzes the incoming prompt to extract relevant features and characteristics that influence model selection.

2

Complexity Detection

Machine learning models predict task complexity (0-1 scale) and domain category (chat, code, math, analysis, general).

3

Quality Filtering

Applies minimum quality thresholds based on detected complexity—simple queries can use any model, complex tasks require high-capability models.

4

Dynamic Scoring

Scoring weights are adjusted based on user preference mode and query complexity, automatically balancing quality, speed, and cost.

5

Model Ranking

Each candidate receives a composite score combining quality, speed, and cost metrics. The highest-scoring model is selected.

6

Execution & Metadata Logging

The selected model processes the request, and routing metadata is logged for analytics and continuous improvement. Prompt/response content is not persisted in local interaction history.

Tool-Aware Routing

When tasks require tool use, Route42 can prioritize tool-capable models, filter out incompatible candidates, and retry with compatible alternatives. This behavior is especially useful for coding agents, structured tool calls, and function-calling workflows.

Direct Recommendation Access

Advanced users can access the recommendation engine directly via POST /api/recommend to retrieve ranked candidates without executing a completion.

Deployment & Operations

Running Route42

Local Deployment

Route42 runs as a standalone application on any machine with Go 1.21+ and Node.js 18+. No external service dependencies exist—all routing decisions execute locally. This enables full offline operation for users without internet connectivity.

✓ Fully Offline
No internet required for routing
✓ Local Processing
<50ms routing decisions
✓ Minimal Telemetry
Core desktop routing does not require external telemetry

Data Persistence

User configuration and routing metadata are stored in a local SQLite database, requiring no external database service for core desktop operation. Prompt/response content is not persisted in local interaction history.

Architecture Split: Local + Cloud

Route42 desktop uses a local backend and local database for runtime routing. In Pro/account contexts, cloud APIs are used for account features such as authentication, profile state, and model catalog synchronization.

Privacy & Security

No telemetry sent to external services
All user data remains on the local machine
API keys encrypted in the local database
No prompt logging to external analytics

Error Logging

Built-in error collection captures API failures, network issues, and application crashes automatically. Users can review errors through the dashboard and optionally submit them for remote analysis.

Integration Guide

LibreChat Integration

Use LibreChat, an open-source AI chat interface, as a self-hosted chat UI for Route42. LibreChat sends all requests using a single static model identifier (route42), and Route42 handles intelligent model selection automatically.

LibreChat UI
Route42 API (model: "route42")
Best Model Selected

Prerequisites

Windows 10/11 with WSL2 enabled
Docker Desktop installed
Route42 API running locally or accessible via network
A valid Route42 API key

Step 1 — Install Docker & Enable WSL2 Integration

1. Install Docker Desktop
2. Open Docker Desktop → SettingsResourcesWSL Integration
3. Enable integration for your Ubuntu WSL distribution
4. Restart Docker Desktop

Step 2 — Clone LibreChat

Open your Ubuntu WSL terminal:

$ git clone https://github.com/danny-avila/LibreChat.git
$ cd LibreChat

Step 3 — Enable Docker Permissions (WSL)

$ sudo usermod -aG docker $USER
$ exit

Reopen the terminal and verify with docker ps

Step 4 — Configure LibreChat for Route42

Create librechat.yaml in the LibreChat root directory:

librechat.yaml
version: 1.2.1
cache: true

endpoints:
  custom:
    - name: "Route42"
      baseURL: "http://host.docker.internal:4242/api"
      apiKey: "${ROUTE42_API_KEY}"

      models:
        default:
          - "route42"
        fetch: false

      titleConvo: true
      titleModel: "route42"
      modelDisplayLabel: "Route42"
Field Purpose
baseURL Route42 API base URL. Uses host.docker.internal to reach the host from Docker.
apiKey Loaded securely from environment variables.
models.fetch: false Prevents LibreChat from calling /models — not needed since Route42 handles model selection.
route42 Static model identifier used for all requests.

Step 5 — Set Environment Variables

Create or edit .env in the LibreChat root:

.env
ROUTE42_API_KEY=your_route42_api_key_here
ENDPOINTS=custom

Remove any OPENAI_* variables if present to avoid conflicts.

Step 6 — Mount the Configuration

Create docker-compose.override.yml in the LibreChat root:

docker-compose.override.yml
services:
  api:
    volumes:
      - ./librechat.yaml:/app/librechat.yaml

Step 7 — Start LibreChat

$ docker compose down
$ docker compose up -d

Open your browser at http://localhost:3080, select Route42 as the provider, and start chatting.

What This Integration Provides

Self-hosted chat UI with full conversation history
OpenAI-compatible API connection to Route42
Static model identifier — no manual model selection
Intelligent routing handled entirely by Route42

Why Route42?

Route42 transforms model selection from a manual, one-size-fits-all decision into an intelligent, automatic process optimized for each individual task and user constraint.

ML-Powered Intelligence

Real-time complexity analysis ensures optimal model selection every time

Maximize ROI

Save 85% on API costs by routing locally when possible

Privacy-First

Core routing executes locally; account/pro features may use cloud APIs for auth, profile, and model sync