Intelligent LLM Selection
Platform Documentation
Learn how Route42 uses machine learning to automatically route every prompt to the optimal model across seven preference modes—balancing quality, speed, cost, privacy, and coding performance with zero manual configuration.
Before you dive into the docs, here are the key points from our Terms and Privacy policy:
- • Prompt/response handling: Prompts and AI responses are not persisted in local interaction history; routing metadata is stored for operations and improvement.
- • Local personalization: Training for personalization happens locally on your Windows machine; no training data is uploaded.
- • Windows app: Route42 is a Windows desktop application with local data control.
- • Model routing disclaimer: Automated model selection is probabilistic and not guaranteed optimal.
What is Route42?
Route42 is an intelligent routing platform that automatically selects the optimal Large Language Model for each user query. The system employs machine learning-based complexity analysis, multi-criteria scoring algorithms, and user preference learning to route requests to the most appropriate model from a catalog of 870+ models across 12 providers.
The platform addresses a fundamental challenge in AI application development: different queries require different models. Simple questions benefit from fast, cost-effective models, while complex tasks demand high-capacity models. Route42 makes this decision automatically, optimizing for quality, speed, and cost based on each query's characteristics and user preferences.
The Problem We Solve
Developers waste thousands of dollars sending simple queries to premium models, while complex tasks fail when routed to insufficient models. Route42 eliminates manual model selection by using ML to match each prompt with its optimal executor.
API Authentication
Authentication is required for protected endpoints using a Bearer token. Read-only, health, and onboarding endpoints are intentionally public. Each user receives a unique API key upon registration, stored securely in the local SQLite database.
Endpoint Access Model
Health checks, compatibility discovery, and selected onboarding-friendly routes are available without Bearer auth.
Chat/completions, provider key management, profile data, and user-specific routing actions require Bearer authentication.
Key Features
- Per-user API key generation (32-character cryptographically secure tokens)
- Bearer token authentication (Authorization: Bearer <api_key>)
- Automatic user context injection for downstream request handling
- Per-user provider API key management (maintain your own OpenAI, Anthropic, etc. credentials)
- Local-only data storage with no external telemetry or key exposure
POST /api/chat/completions
Authorization: Bearer a1b2c3d4e5f6...
Content-Type: application/json
The user ID is extracted from the Bearer token automatically—no need to pass it in the URL.
Platform Capabilities
1. ML-Based Complexity Detection
The system employs trained machine learning models to analyze prompts and predict computational complexity on a 0-1 scale. The analysis examines textual, structural, and semantic features to determine whether a query requires a lightweight model or a high-capability model.
2. Comprehensive Model Catalog
Route42 maintains a continuously updated database of 870+ language models across 12 providers with detailed performance metrics from industry benchmarks and provider APIs. Supported providers include:
Each model entry includes:
3. Intelligent Multi-Criteria Ranking
Model selection employs a sophisticated scoring system that balances quality, speed, and cost according to user preferences and query complexity. The ranking algorithm:
4. User Preference System
Users select from seven preference modes that define routing behavior:
Automatically adapts routing based on query complexity
Prioritizes model capability over cost
Emphasizes response latency and throughput
Minimizes financial cost per request
Strongly prefers local-capable models before cloud escalation
Prioritizes privacy-preserving routes and stricter provider filtering
Prioritizes code-specialized and tool-capable models for dev workflows
Additional Preference Controls
5. Per-User API Key Management
Each user maintains individual API keys for preferred LLM providers. The system stores encrypted credentials in the local SQLite database and automatically filters available models based on which providers the user has configured.
6. Privacy Judge (LLM-as-a-Judge)
Route42 includes an optional privacy judge step that evaluates routing/output behavior against privacy constraints. The judge prompt template is user-editable from settings, so teams can tune policy language to match internal standards.
7. Local Model Integration
Route42 provides first-class Ollama integration for private inference. LM Studio and similar local runtimes can be used through OpenAI-compatible custom endpoints. Ollama models are auto-discovered and ranked with local-cost advantages.
Run models on your hardware without API bills
No external API calls for local completions
Works without internet connectivity
Integrate custom quality metrics
Chat Completion API
The primary user-facing endpoint is the chat completion API, which automatically selects the optimal model and returns OpenAI-compatible responses with full streaming support.
Client Compatibility
Route42 supports both major client protocol styles:
- • OpenAI-compatible chat endpoint:
/api/chat/completions - • Anthropic/Claude-compatible messages endpoint:
/v1/messages
Streaming Request
POST /api/chat/completions
Authorization: Bearer a1b2c3d4e5f6...
Content-Type: application/json
{
"messages": [
{"role": "user", "content": "Explain quantum computing"}
],
"stream": true
}
HTTP/1.1 200 OK
Content-Type: text/event-stream
data: {"id":"chatcmpl-123","model":"gemini-3-flash",...}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Imagine"}}]}
data: [DONE]
Non-Streaming Request
POST /api/chat/completions
Authorization: Bearer a1b2c3d4e5f6...
{
"messages": [{"role": "user", "content": "What is 2+2?"}],
"stream": false
}
{
"id": "chatcmpl-456",
"model": "gpt-4-mini",
"choices": [{
"message": {"content": "2+2 equals 4."}
}],
"usage": {"total_tokens": 18}
}
Python Integration Example
import requests
import json
API_KEY = "a1b2c3d4e5f6..."
BASE_URL = "http://localhost:4242"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
# Streaming completion
response = requests.post(
f"{BASE_URL}/api/chat/completions",
headers=HEADERS,
json={"messages": [...], "stream": True},
stream=True
)
for line in response.iter_lines():
if line:
chunk = json.loads(line.decode('utf-8').removeprefix('data: '))
print(chunk['choices'][0]['delta'].get('content', ''), end='')
Using Route42 with Claude Code
Claude Code is Anthropic's official CLI for agentic software development. Route42 natively supports Anthropic-compatible calling via /v1/messages, so Claude Code can call Route42 directly. Simple tasks (file edits, test generation, docstrings) run on free local models while complex multi-step reasoning automatically bursts to Claude Sonnet 4.5 or Claude Opus.
Simple Tasks → Free
Add tests, docstrings, format code → local DeepSeek-Coder or Llama (0 tokens spent)
Route42 Decides
ML scores prompt complexity 0–100 in milliseconds — invisible to Claude Code
Complex Tasks → Claude
Architecture, debugging, multi-file refactors → Claude Sonnet/Opus via Anthropic API
Setup: 2 Steps
# PowerShell (session only) — preferred
$env:ANTHROPIC_BASE_URL = "http://localhost:4242"
$env:ANTHROPIC_API_KEY = "sk-ant-..."
# Optional alternative to API key:
$env:ANTHROPIC_AUTH_TOKEN = "your-token"
$env:ANTHROPIC_MODEL = "auto"
# PowerShell (persistent)
setx ANTHROPIC_BASE_URL "http://localhost:4242"
setx ANTHROPIC_API_KEY "sk-ant-..."
setx ANTHROPIC_AUTH_TOKEN "your-token"
setx ANTHROPIC_MODEL "auto"
# Windows CMD (session only)
set ANTHROPIC_BASE_URL=http://localhost:4242
set ANTHROPIC_API_KEY=sk-ant-...
set ANTHROPIC_AUTH_TOKEN=your-token
set ANTHROPIC_MODEL=auto
# Bash/zsh (session only)
export ANTHROPIC_BASE_URL=http://localhost:4242
export ANTHROPIC_API_KEY=sk-ant-...
export ANTHROPIC_AUTH_TOKEN=your-token
export ANTHROPIC_MODEL=auto
# Works exactly as before — Route42 is transparent
claude "Add unit tests for src/auth/parser.go"
# → Complexity 18/100 → Local DeepSeek-Coder (free)
claude "Redesign the session store to support horizontal scaling"
# → Complexity 94/100 → Claude Sonnet 4.5 (auto-escalated)
Persistent Config (shell profile)
# Route42 + Claude Code — auto-route every session
export ANTHROPIC_BASE_URL=http://localhost:4242
export ANTHROPIC_API_KEY=sk-ant-...
export ANTHROPIC_MODEL=auto
Route42 must be running locally (port 4242) before starting Claude Code. Correct variable names are ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN (optional bearer token), and ANTHROPIC_MODEL (set to auto by default). Do not use ANTHROPIC_TOKEN or MODLE.
What Gets Routed Where
| Claude Code Task | Complexity | Routed To | Cost |
|---|---|---|---|
| Add docstrings to a function | 12/100 | Local Llama-3 | $0.00 |
| Generate unit tests for a helper | 21/100 | Local DeepSeek-Coder | $0.00 |
| Explain a complex algorithm | 55/100 | Cloud (cheapest qualified) | ~$0.002 |
| Debug a distributed race condition | 91/100 | Claude Sonnet 4.5 | ~$0.015 |
| Architect a zero-trust auth system | 97/100 | Claude Opus | ~$0.075 |
How Route42 Selects Models
The completion API executes an intelligent model selection pipeline for each request:
Prompt Analysis
The system analyzes the incoming prompt to extract relevant features and characteristics that influence model selection.
Complexity Detection
Machine learning models predict task complexity (0-1 scale) and domain category (chat, code, math, analysis, general).
Quality Filtering
Applies minimum quality thresholds based on detected complexity—simple queries can use any model, complex tasks require high-capability models.
Dynamic Scoring
Scoring weights are adjusted based on user preference mode and query complexity, automatically balancing quality, speed, and cost.
Model Ranking
Each candidate receives a composite score combining quality, speed, and cost metrics. The highest-scoring model is selected.
Execution & Metadata Logging
The selected model processes the request, and routing metadata is logged for analytics and continuous improvement. Prompt/response content is not persisted in local interaction history.
Tool-Aware Routing
When tasks require tool use, Route42 can prioritize tool-capable models, filter out incompatible candidates, and retry with compatible alternatives. This behavior is especially useful for coding agents, structured tool calls, and function-calling workflows.
Direct Recommendation Access
Advanced users can access the recommendation engine directly via POST /api/recommend to retrieve ranked candidates without executing a completion.
Running Route42
Local Deployment
Route42 runs as a standalone application on any machine with Go 1.21+ and Node.js 18+. No external service dependencies exist—all routing decisions execute locally. This enables full offline operation for users without internet connectivity.
Data Persistence
User configuration and routing metadata are stored in a local SQLite database, requiring no external database service for core desktop operation. Prompt/response content is not persisted in local interaction history.
Architecture Split: Local + Cloud
Route42 desktop uses a local backend and local database for runtime routing. In Pro/account contexts, cloud APIs are used for account features such as authentication, profile state, and model catalog synchronization.
Privacy & Security
Error Logging
Built-in error collection captures API failures, network issues, and application crashes automatically. Users can review errors through the dashboard and optionally submit them for remote analysis.
LibreChat Integration
Use LibreChat, an open-source AI chat interface, as a self-hosted chat UI for Route42. LibreChat sends all requests using a single static model identifier (route42), and Route42 handles intelligent model selection automatically.
Prerequisites
Step 1 — Install Docker & Enable WSL2 Integration
Step 2 — Clone LibreChat
Open your Ubuntu WSL terminal:
$ git clone https://github.com/danny-avila/LibreChat.git
$ cd LibreChat
Step 3 — Enable Docker Permissions (WSL)
$ sudo usermod -aG docker $USER
$ exit
Reopen the terminal and verify with docker ps
Step 4 — Configure LibreChat for Route42
Create librechat.yaml in the LibreChat root directory:
version: 1.2.1
cache: true
endpoints:
custom:
- name: "Route42"
baseURL: "http://host.docker.internal:4242/api"
apiKey: "${ROUTE42_API_KEY}"
models:
default:
- "route42"
fetch: false
titleConvo: true
titleModel: "route42"
modelDisplayLabel: "Route42"
| Field | Purpose |
|---|---|
baseURL |
Route42 API base URL. Uses host.docker.internal to reach the host from Docker. |
apiKey |
Loaded securely from environment variables. |
models.fetch: false |
Prevents LibreChat from calling /models — not needed since Route42 handles model selection. |
route42 |
Static model identifier used for all requests. |
Step 5 — Set Environment Variables
Create or edit .env in the LibreChat root:
ROUTE42_API_KEY=your_route42_api_key_here
ENDPOINTS=custom
Remove any OPENAI_* variables if present to avoid conflicts.
Step 6 — Mount the Configuration
Create docker-compose.override.yml in the LibreChat root:
services:
api:
volumes:
- ./librechat.yaml:/app/librechat.yaml
Step 7 — Start LibreChat
$ docker compose down
$ docker compose up -d
Open your browser at http://localhost:3080, select Route42 as the provider, and start chatting.
What This Integration Provides
Why Route42?
Route42 transforms model selection from a manual, one-size-fits-all decision into an intelligent, automatic process optimized for each individual task and user constraint.
ML-Powered Intelligence
Real-time complexity analysis ensures optimal model selection every time
Maximize ROI
Save 85% on API costs by routing locally when possible
Privacy-First
Core routing executes locally; account/pro features may use cloud APIs for auth, profile, and model sync