Complete Platform Documentation

Intelligent LLM Selection
Platform Documentation

Learn how Route42 uses machine learning to automatically route every prompt to the optimal model—balancing quality, speed, and cost with zero manual configuration.

Privacy & Terms Highlights

Before you dive into the docs, here are the key points from our Terms and Privacy policy:

  • No prompt storage: Prompts and AI responses are never saved or logged—Route42 is pass-through only.
  • Local personalization: Training for personalization happens locally on your Windows machine; no training data is uploaded.
  • Windows app: Route42 is a Windows desktop application with local data control.
  • Model routing disclaimer: Automated model selection is probabilistic and not guaranteed optimal.
Executive Summary

What is Route42?

Route42 is an intelligent routing platform that automatically selects the optimal Large Language Model for each user query. The system employs machine learning-based complexity analysis, multi-criteria scoring algorithms, and user preference learning to route requests to the most appropriate model from a catalog of 870+ models across 12 providers.

870+
AI Models Supported
<50ms
Routing Decision Time
85%
Average Cost Reduction

The platform addresses a fundamental challenge in AI application development: different queries require different models. Simple questions benefit from fast, cost-effective models, while complex tasks demand high-capacity models. Route42 makes this decision automatically, optimizing for quality, speed, and cost based on each query's characteristics and user preferences.

The Problem We Solve

Developers waste thousands of dollars sending simple queries to premium models, while complex tasks fail when routed to insufficient models. Route42 eliminates manual model selection by using ML to match each prompt with its optimal executor.

Authentication & Security

API Authentication

All API endpoints require authentication using a Bearer token. Each user receives a unique API key upon registration, stored securely in the local SQLite database.

Key Features

  • Per-user API key generation (32-character cryptographically secure tokens)
  • Bearer token authentication (Authorization: Bearer <api_key>)
  • Automatic user context injection for downstream request handling
  • Per-user provider API key management (maintain your own OpenAI, Anthropic, etc. credentials)
  • Local-only data storage with no external telemetry or key exposure
// Example authenticated request
POST /api/chat/completions
Authorization: Bearer a1b2c3d4e5f6...
Content-Type: application/json

The user ID is extracted from the Bearer token automatically—no need to pass it in the URL.

Core Features

Platform Capabilities

1. ML-Based Complexity Detection

The system employs trained machine learning models to analyze prompts and predict computational complexity on a 0-1 scale. The analysis examines textual, structural, and semantic features to determine whether a query requires a lightweight model or a high-capability model.

Low Complexity (0.0-0.3)
"What is 2+2?" → Routes to cost-effective models
High Complexity (0.7-1.0)
"Design a microservices architecture..." → Routes to premium models

2. Comprehensive Model Catalog

Route42 maintains a continuously updated database of 870+ language models across 12 providers with detailed performance metrics from industry benchmarks and provider APIs. Supported providers include:

OpenAI Anthropic Google/Gemini Groq Mistral DeepSeek OpenRouter Alibaba/DashScope Moonshot AI NVIDIA Z.ai (Zhipu) Ollama

Each model entry includes:

Quality scores from MMLU, HumanEval benchmarks
Performance metrics (latency, throughput)
Real-time pricing data
Capability flags (code, vision, context length)

3. Intelligent Multi-Criteria Ranking

Model selection employs a sophisticated scoring system that balances quality, speed, and cost according to user preferences and query complexity. The ranking algorithm:

1. Analyzes each prompt for complexity and domain
2. Filters candidates by quality thresholds
3. Adjusts scoring weights dynamically
4. Selects the optimal model for execution

4. User Preference System

Users select from four preference modes that define routing behavior:

Balanced Mode

Automatically adapts routing based on query complexity

Quality Mode

Prioritizes model capability over cost

Fast Mode

Emphasizes response latency and throughput

Cheap Mode

Minimizes financial cost per request

Additional Preference Controls

Max Response Tokens — Set a cap on response tokens to control output length and cost
Cost Limit per Request — Define maximum cost per request in cents to enforce budget constraints

5. Per-User API Key Management

Each user maintains individual API keys for preferred LLM providers. The system stores encrypted credentials in the local SQLite database and automatically filters available models based on which providers the user has configured.

Multi-tenant isolation in local deployment scenarios
Flexible provider adoption (choose which providers to enable)
Cost accountability (each user's usage bills to their own credentials)
Security through key compartmentalization
Live API key validation — Keys are tested against provider APIs before saving, showing validity status and latency

6. Local Model Integration

Route42 supports fully private inference through Ollama integration. The system automatically discovers locally installed Ollama models and incorporates them into the ranking algorithm with appropriate cost scoring (free = highest score).

Zero-Cost Inference

Run models on your hardware without API bills

Complete Privacy

No external API calls for local completions

Offline Operation

Works without internet connectivity

Performance Benchmarking

Integrate custom quality metrics

API Reference

Chat Completion API

The primary user-facing endpoint is the chat completion API, which automatically selects the optimal model and returns OpenAI-compatible responses with full streaming support.

Streaming Request

Request
POST /api/chat/completions
Authorization: Bearer a1b2c3d4e5f6...
Content-Type: application/json

{
"messages": [
{"role": "user", "content": "Explain quantum computing"}
],
"stream": true
}
Response
HTTP/1.1 200 OK
Content-Type: text/event-stream

data: {"id":"chatcmpl-123","model":"gemini-3-flash",...}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Imagine"}}]}
data: [DONE]

Non-Streaming Request

Request
POST /api/chat/completions
Authorization: Bearer a1b2c3d4e5f6...

{
"messages": [{"role": "user", "content": "What is 2+2?"}],
"stream": false
}
Response
{
"id": "chatcmpl-456",
"model": "gpt-4-mini",
"choices": [{
"message": {"content": "2+2 equals 4."}
}],
"usage": {"total_tokens": 18}
}

Python Integration Example

import requests
import json

API_KEY = "a1b2c3d4e5f6..."
BASE_URL = "http://localhost:8080"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}

# Streaming completion
response = requests.post(
f"{BASE_URL}/api/chat/completions",
headers=HEADERS,
json={"messages": [...], "stream": True},
stream=True
)

for line in response.iter_lines():
if line:
chunk = json.loads(line.decode('utf-8').removeprefix('data: '))
print(chunk['choices'][0]['delta'].get('content', ''), end='')
Selection Pipeline

How Route42 Selects Models

The completion API executes an intelligent model selection pipeline for each request:

1

Prompt Analysis

The system analyzes the incoming prompt to extract relevant features and characteristics that influence model selection.

2

Complexity Detection

Machine learning models predict task complexity (0-1 scale) and domain category (chat, code, math, analysis, general).

3

Quality Filtering

Applies minimum quality thresholds based on detected complexity—simple queries can use any model, complex tasks require high-capability models.

4

Dynamic Scoring

Scoring weights are adjusted based on user preference mode and query complexity, automatically balancing quality, speed, and cost.

5

Model Ranking

Each candidate receives a composite score combining quality, speed, and cost metrics. The highest-scoring model is selected.

6

Execution & Logging

The selected model processes the request, and the complete interaction is logged for analytics and continuous improvement.

Direct Recommendation Access

Advanced users can access the recommendation engine directly via POST /api/recommend to retrieve ranked candidates without executing a completion.

Deployment & Operations

Running Route42

Local Deployment

Route42 runs as a standalone application on any machine with Go 1.21+ and Node.js 18+. No external service dependencies exist—all routing decisions execute locally. This enables full offline operation for users without internet connectivity.

✓ Fully Offline
No internet required for routing
✓ Local Processing
<50ms routing decisions
✓ Zero Telemetry
No data sent externally

Data Persistence

All user configuration and historical interaction logs are stored in a local SQLite database, requiring no external database service. Data remains on your machine and is never transmitted to external servers.

Privacy & Security

No telemetry sent to external services
All user data remains on the local machine
API keys encrypted in the local database
No prompt logging to external analytics

Error Logging

Built-in error collection captures API failures, network issues, and application crashes automatically. Users can review errors through the dashboard and optionally submit them for remote analysis.

Integration Guide

LibreChat Integration

Use LibreChat, an open-source AI chat interface, as a self-hosted chat UI for Route42. LibreChat sends all requests using a single static model identifier (route42), and Route42 handles intelligent model selection automatically.

LibreChat UI
Route42 API (model: "route42")
Best Model Selected

Prerequisites

Windows 10/11 with WSL2 enabled
Docker Desktop installed
Route42 API running locally or accessible via network
A valid Route42 API key

Step 1 — Install Docker & Enable WSL2 Integration

1. Install Docker Desktop
2. Open Docker Desktop → SettingsResourcesWSL Integration
3. Enable integration for your Ubuntu WSL distribution
4. Restart Docker Desktop

Step 2 — Clone LibreChat

Open your Ubuntu WSL terminal:

$ git clone https://github.com/danny-avila/LibreChat.git
$ cd LibreChat

Step 3 — Enable Docker Permissions (WSL)

$ sudo usermod -aG docker $USER
$ exit

Reopen the terminal and verify with docker ps

Step 4 — Configure LibreChat for Route42

Create librechat.yaml in the LibreChat root directory:

librechat.yaml
version: 1.2.1
cache: true

endpoints:
  custom:
    - name: "Route42"
      baseURL: "http://host.docker.internal:8080/api"
      apiKey: "${ROUTE42_API_KEY}"

      models:
        default:
          - "route42"
        fetch: false

      titleConvo: true
      titleModel: "route42"
      modelDisplayLabel: "Route42"
Field Purpose
baseURL Route42 API base URL. Uses host.docker.internal to reach the host from Docker.
apiKey Loaded securely from environment variables.
models.fetch: false Prevents LibreChat from calling /models — not needed since Route42 handles model selection.
route42 Static model identifier used for all requests.

Step 5 — Set Environment Variables

Create or edit .env in the LibreChat root:

.env
ROUTE42_API_KEY=your_route42_api_key_here
ENDPOINTS=custom

Remove any OPENAI_* variables if present to avoid conflicts.

Step 6 — Mount the Configuration

Create docker-compose.override.yml in the LibreChat root:

docker-compose.override.yml
services:
  api:
    volumes:
      - ./librechat.yaml:/app/librechat.yaml

Step 7 — Start LibreChat

$ docker compose down
$ docker compose up -d

Open your browser at http://localhost:3080, select Route42 as the provider, and start chatting.

What This Integration Provides

Self-hosted chat UI with full conversation history
OpenAI-compatible API connection to Route42
Static model identifier — no manual model selection
Intelligent routing handled entirely by Route42

Why Route42?

Route42 transforms model selection from a manual, one-size-fits-all decision into an intelligent, automatic process optimized for each individual task and user constraint.

ML-Powered Intelligence

Real-time complexity analysis ensures optimal model selection every time

Maximize ROI

Save 85% on API costs by routing locally when possible

Privacy-First

All routing decisions happen locally—zero telemetry