Intelligent LLM Selection
Platform Documentation
Learn how Route42 uses machine learning to automatically route every prompt to the optimal model—balancing quality, speed, and cost with zero manual configuration.
Before you dive into the docs, here are the key points from our Terms and Privacy policy:
- • No prompt storage: Prompts and AI responses are never saved or logged—Route42 is pass-through only.
- • Local personalization: Training for personalization happens locally on your Windows machine; no training data is uploaded.
- • Windows app: Route42 is a Windows desktop application with local data control.
- • Model routing disclaimer: Automated model selection is probabilistic and not guaranteed optimal.
What is Route42?
Route42 is an intelligent routing platform that automatically selects the optimal Large Language Model for each user query. The system employs machine learning-based complexity analysis, multi-criteria scoring algorithms, and user preference learning to route requests to the most appropriate model from a catalog of 870+ models across 12 providers.
The platform addresses a fundamental challenge in AI application development: different queries require different models. Simple questions benefit from fast, cost-effective models, while complex tasks demand high-capacity models. Route42 makes this decision automatically, optimizing for quality, speed, and cost based on each query's characteristics and user preferences.
The Problem We Solve
Developers waste thousands of dollars sending simple queries to premium models, while complex tasks fail when routed to insufficient models. Route42 eliminates manual model selection by using ML to match each prompt with its optimal executor.
API Authentication
All API endpoints require authentication using a Bearer token. Each user receives a unique API key upon registration, stored securely in the local SQLite database.
Key Features
- Per-user API key generation (32-character cryptographically secure tokens)
- Bearer token authentication (Authorization: Bearer <api_key>)
- Automatic user context injection for downstream request handling
- Per-user provider API key management (maintain your own OpenAI, Anthropic, etc. credentials)
- Local-only data storage with no external telemetry or key exposure
POST /api/chat/completions
Authorization: Bearer a1b2c3d4e5f6...
Content-Type: application/json
The user ID is extracted from the Bearer token automatically—no need to pass it in the URL.
Platform Capabilities
1. ML-Based Complexity Detection
The system employs trained machine learning models to analyze prompts and predict computational complexity on a 0-1 scale. The analysis examines textual, structural, and semantic features to determine whether a query requires a lightweight model or a high-capability model.
2. Comprehensive Model Catalog
Route42 maintains a continuously updated database of 870+ language models across 12 providers with detailed performance metrics from industry benchmarks and provider APIs. Supported providers include:
Each model entry includes:
3. Intelligent Multi-Criteria Ranking
Model selection employs a sophisticated scoring system that balances quality, speed, and cost according to user preferences and query complexity. The ranking algorithm:
4. User Preference System
Users select from four preference modes that define routing behavior:
Automatically adapts routing based on query complexity
Prioritizes model capability over cost
Emphasizes response latency and throughput
Minimizes financial cost per request
Additional Preference Controls
5. Per-User API Key Management
Each user maintains individual API keys for preferred LLM providers. The system stores encrypted credentials in the local SQLite database and automatically filters available models based on which providers the user has configured.
6. Local Model Integration
Route42 supports fully private inference through Ollama integration. The system automatically discovers locally installed Ollama models and incorporates them into the ranking algorithm with appropriate cost scoring (free = highest score).
Run models on your hardware without API bills
No external API calls for local completions
Works without internet connectivity
Integrate custom quality metrics
Chat Completion API
The primary user-facing endpoint is the chat completion API, which automatically selects the optimal model and returns OpenAI-compatible responses with full streaming support.
Streaming Request
POST /api/chat/completions
Authorization: Bearer a1b2c3d4e5f6...
Content-Type: application/json
{
"messages": [
{"role": "user", "content": "Explain quantum computing"}
],
"stream": true
}
HTTP/1.1 200 OK
Content-Type: text/event-stream
data: {"id":"chatcmpl-123","model":"gemini-3-flash",...}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Imagine"}}]}
data: [DONE]
Non-Streaming Request
POST /api/chat/completions
Authorization: Bearer a1b2c3d4e5f6...
{
"messages": [{"role": "user", "content": "What is 2+2?"}],
"stream": false
}
{
"id": "chatcmpl-456",
"model": "gpt-4-mini",
"choices": [{
"message": {"content": "2+2 equals 4."}
}],
"usage": {"total_tokens": 18}
}
Python Integration Example
import requests
import json
API_KEY = "a1b2c3d4e5f6..."
BASE_URL = "http://localhost:8080"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
# Streaming completion
response = requests.post(
f"{BASE_URL}/api/chat/completions",
headers=HEADERS,
json={"messages": [...], "stream": True},
stream=True
)
for line in response.iter_lines():
if line:
chunk = json.loads(line.decode('utf-8').removeprefix('data: '))
print(chunk['choices'][0]['delta'].get('content', ''), end='')
How Route42 Selects Models
The completion API executes an intelligent model selection pipeline for each request:
Prompt Analysis
The system analyzes the incoming prompt to extract relevant features and characteristics that influence model selection.
Complexity Detection
Machine learning models predict task complexity (0-1 scale) and domain category (chat, code, math, analysis, general).
Quality Filtering
Applies minimum quality thresholds based on detected complexity—simple queries can use any model, complex tasks require high-capability models.
Dynamic Scoring
Scoring weights are adjusted based on user preference mode and query complexity, automatically balancing quality, speed, and cost.
Model Ranking
Each candidate receives a composite score combining quality, speed, and cost metrics. The highest-scoring model is selected.
Execution & Logging
The selected model processes the request, and the complete interaction is logged for analytics and continuous improvement.
Direct Recommendation Access
Advanced users can access the recommendation engine directly via POST /api/recommend to retrieve ranked candidates without executing a completion.
Running Route42
Local Deployment
Route42 runs as a standalone application on any machine with Go 1.21+ and Node.js 18+. No external service dependencies exist—all routing decisions execute locally. This enables full offline operation for users without internet connectivity.
Data Persistence
All user configuration and historical interaction logs are stored in a local SQLite database, requiring no external database service. Data remains on your machine and is never transmitted to external servers.
Privacy & Security
Error Logging
Built-in error collection captures API failures, network issues, and application crashes automatically. Users can review errors through the dashboard and optionally submit them for remote analysis.
LibreChat Integration
Use LibreChat, an open-source AI chat interface, as a self-hosted chat UI for Route42. LibreChat sends all requests using a single static model identifier (route42), and Route42 handles intelligent model selection automatically.
Prerequisites
Step 1 — Install Docker & Enable WSL2 Integration
Step 2 — Clone LibreChat
Open your Ubuntu WSL terminal:
$ git clone https://github.com/danny-avila/LibreChat.git
$ cd LibreChat
Step 3 — Enable Docker Permissions (WSL)
$ sudo usermod -aG docker $USER
$ exit
Reopen the terminal and verify with docker ps
Step 4 — Configure LibreChat for Route42
Create librechat.yaml in the LibreChat root directory:
version: 1.2.1
cache: true
endpoints:
custom:
- name: "Route42"
baseURL: "http://host.docker.internal:8080/api"
apiKey: "${ROUTE42_API_KEY}"
models:
default:
- "route42"
fetch: false
titleConvo: true
titleModel: "route42"
modelDisplayLabel: "Route42"
| Field | Purpose |
|---|---|
baseURL |
Route42 API base URL. Uses host.docker.internal to reach the host from Docker. |
apiKey |
Loaded securely from environment variables. |
models.fetch: false |
Prevents LibreChat from calling /models — not needed since Route42 handles model selection. |
route42 |
Static model identifier used for all requests. |
Step 5 — Set Environment Variables
Create or edit .env in the LibreChat root:
ROUTE42_API_KEY=your_route42_api_key_here
ENDPOINTS=custom
Remove any OPENAI_* variables if present to avoid conflicts.
Step 6 — Mount the Configuration
Create docker-compose.override.yml in the LibreChat root:
services:
api:
volumes:
- ./librechat.yaml:/app/librechat.yaml
Step 7 — Start LibreChat
$ docker compose down
$ docker compose up -d
Open your browser at http://localhost:3080, select Route42 as the provider, and start chatting.
What This Integration Provides
Why Route42?
Route42 transforms model selection from a manual, one-size-fits-all decision into an intelligent, automatic process optimized for each individual task and user constraint.
ML-Powered Intelligence
Real-time complexity analysis ensures optimal model selection every time
Maximize ROI
Save 85% on API costs by routing locally when possible
Privacy-First
All routing decisions happen locally—zero telemetry