Route42 is an intelligent routing platform that automatically selects the optimal Large Language Model for each user query. It uses machine learning-based complexity analysis and multi-criteria scoring to route requests to the best model from a catalog of 870+ models across 12 providers.

How does Route42 select the best model?

Route42 analyzes each prompt for complexity and domain, filters candidates by quality thresholds, adjusts scoring weights dynamically based on user preferences, and selects the highest-scoring model balancing quality, speed, and cost.

Does Route42 store my prompts?

No. Route42 operates as a pass-through proxy. Prompts are analyzed in real-time for routing decisions and immediately forwarded to the selected LLM. Prompt content is never written to disk, logged to databases, or stored in memory beyond the milliseconds required for routing.

Documentation | Route42 - Intelligent LLM Selection Platform

Q: What LLM providers does Route42 support?

Route42 supports 12 providers including OpenAI, Anthropic, Google/Gemini, Groq, Mistral, DeepSeek, OpenRouter, Alibaba/DashScope, Moonshot AI, NVIDIA, Z.ai (Zhipu), and Ollama for local models.

Privacy & Terms Highlights

Before you dive into the docs, here are the key points from our Terms and Privacy policy:

• No prompt storage: Prompts and AI responses are never saved or logged—Route42 is pass-through only.
• Local personalization: Training for personalization happens locally on your Windows machine; no training data is uploaded.
• Windows app: Route42 is a Windows desktop application with local data control.
• Model routing disclaimer: Automated model selection is probabilistic and not guaranteed optimal.

Read Terms of Service Read Privacy Policy

Executive Summary

What is Route42?

Route42 is an intelligent routing platform that automatically selects the optimal Large Language Model for each user query. The system employs machine learning-based complexity analysis, multi-criteria scoring algorithms, and user preference learning to route requests to the most appropriate model from a catalog of 870+ models across 12 providers.

870+

AI Models Supported

<50ms

Routing Decision Time

85%

Average Cost Reduction

The platform addresses a fundamental challenge in AI application development: different queries require different models. Simple questions benefit from fast, cost-effective models, while complex tasks demand high-capacity models. Route42 makes this decision automatically, optimizing for quality, speed, and cost based on each query's characteristics and user preferences.

The Problem We Solve

Developers waste thousands of dollars sending simple queries to premium models, while complex tasks fail when routed to insufficient models. Route42 eliminates manual model selection by using ML to match each prompt with its optimal executor.

Authentication & Security

API Authentication

All API endpoints require authentication using a Bearer token. Each user receives a unique API key upon registration, stored securely in the local SQLite database.

Key Features

Per-user API key generation (32-character cryptographically secure tokens)
Bearer token authentication (Authorization: Bearer <api_key>)
Automatic user context injection for downstream request handling
Per-user provider API key management (maintain your own OpenAI, Anthropic, etc. credentials)
Local-only data storage with no external telemetry or key exposure

// Example authenticated request

POST /api/chat/completions

Authorization: Bearer a1b2c3d4e5f6...

Content-Type: application/json

The user ID is extracted from the Bearer token automatically—no need to pass it in the URL.

Core Features

Platform Capabilities

1. ML-Based Complexity Detection

The system employs trained machine learning models to analyze prompts and predict computational complexity on a 0-1 scale. The analysis examines textual, structural, and semantic features to determine whether a query requires a lightweight model or a high-capability model.

Low Complexity (0.0-0.3)

"What is 2+2?" → Routes to cost-effective models

High Complexity (0.7-1.0)

"Design a microservices architecture..." → Routes to premium models

2. Comprehensive Model Catalog

Route42 maintains a continuously updated database of 870+ language models across 12 providers with detailed performance metrics from industry benchmarks and provider APIs. Supported providers include:

OpenAI Anthropic Google/Gemini Groq Mistral DeepSeek OpenRouter Alibaba/DashScope Moonshot AI NVIDIA Z.ai (Zhipu) Ollama

Each model entry includes:

Quality scores from MMLU, HumanEval benchmarks

Performance metrics (latency, throughput)

Real-time pricing data

Capability flags (code, vision, context length)

3. Intelligent Multi-Criteria Ranking

Model selection employs a sophisticated scoring system that balances quality, speed, and cost according to user preferences and query complexity. The ranking algorithm:

1. Analyzes each prompt for complexity and domain

2. Filters candidates by quality thresholds

3. Adjusts scoring weights dynamically

4. Selects the optimal model for execution

4. User Preference System

Users select from four preference modes that define routing behavior:

Balanced Mode

Automatically adapts routing based on query complexity

Quality Mode

Prioritizes model capability over cost

Fast Mode

Emphasizes response latency and throughput

Cheap Mode

Minimizes financial cost per request

Additional Preference Controls

Max Response Tokens — Set a cap on response tokens to control output length and cost

Cost Limit per Request — Define maximum cost per request in cents to enforce budget constraints

5. Per-User API Key Management

Each user maintains individual API keys for preferred LLM providers. The system stores encrypted credentials in the local SQLite database and automatically filters available models based on which providers the user has configured.

Multi-tenant isolation in local deployment scenarios

Flexible provider adoption (choose which providers to enable)

Cost accountability (each user's usage bills to their own credentials)

Security through key compartmentalization

Live API key validation — Keys are tested against provider APIs before saving, showing validity status and latency

6. Local Model Integration

Route42 supports fully private inference through Ollama integration. The system automatically discovers locally installed Ollama models and incorporates them into the ranking algorithm with appropriate cost scoring (free = highest score).

Zero-Cost Inference

Run models on your hardware without API bills

Complete Privacy

No external API calls for local completions

Offline Operation

Works without internet connectivity

Performance Benchmarking

Integrate custom quality metrics

API Reference

Chat Completion API

The primary user-facing endpoint is the chat completion API, which automatically selects the optimal model and returns OpenAI-compatible responses with full streaming support.

Streaming Request

Request

POST /api/chat/completions

Authorization: Bearer a1b2c3d4e5f6...

Content-Type: application/json

{

  "messages": [

    {"role": "user", "content": "Explain quantum computing"}

  ],

  "stream": true

}

Response

HTTP/1.1 200 OK

Content-Type: text/event-stream

data: {"id":"chatcmpl-123","model":"gemini-3-flash",...}

data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Imagine"}}]}

data: [DONE]

Non-Streaming Request

Request

POST /api/chat/completions

Authorization: Bearer a1b2c3d4e5f6...

{

  "messages": [{"role": "user", "content": "What is 2+2?"}],

  "stream": false

}

Response

{

  "id": "chatcmpl-456",

  "model": "gpt-4-mini",

  "choices": [{

    "message": {"content": "2+2 equals 4."}

  }],

  "usage": {"total_tokens": 18}

}

Python Integration Example

                        
import requests

import json

API_KEY = "a1b2c3d4e5f6..."

BASE_URL = "http://localhost:8080"

HEADERS = {

  "Authorization": f"Bearer {API_KEY}",

  "Content-Type": "application/json"

}

# Streaming completion

response = requests.post(

  f"{BASE_URL}/api/chat/completions",

  headers=HEADERS,

  json={"messages": [...], "stream": True},

  stream=True

)

for line in response.iter_lines():

  if line:

    chunk = json.loads(line.decode('utf-8').removeprefix('data: '))

    print(chunk['choices'][0]['delta'].get('content', ''), end='')

Selection Pipeline

How Route42 Selects Models

The completion API executes an intelligent model selection pipeline for each request:

1

Prompt Analysis

The system analyzes the incoming prompt to extract relevant features and characteristics that influence model selection.

2

Complexity Detection

Machine learning models predict task complexity (0-1 scale) and domain category (chat, code, math, analysis, general).

3

Quality Filtering

Applies minimum quality thresholds based on detected complexity—simple queries can use any model, complex tasks require high-capability models.

4

Dynamic Scoring

Scoring weights are adjusted based on user preference mode and query complexity, automatically balancing quality, speed, and cost.

5

Model Ranking

Each candidate receives a composite score combining quality, speed, and cost metrics. The highest-scoring model is selected.

6

Execution & Logging

The selected model processes the request, and the complete interaction is logged for analytics and continuous improvement.

Direct Recommendation Access

Advanced users can access the recommendation engine directly via POST /api/recommend to retrieve ranked candidates without executing a completion.

Deployment & Operations

Running Route42

Local Deployment

Route42 runs as a standalone application on any machine with Go 1.21+ and Node.js 18+. No external service dependencies exist—all routing decisions execute locally. This enables full offline operation for users without internet connectivity.

✓ Fully Offline

No internet required for routing

✓ Local Processing

<50ms routing decisions

✓ Zero Telemetry

No data sent externally

Data Persistence

All user configuration and historical interaction logs are stored in a local SQLite database, requiring no external database service. Data remains on your machine and is never transmitted to external servers.

Privacy & Security

No telemetry sent to external services

All user data remains on the local machine

API keys encrypted in the local database

No prompt logging to external analytics

Error Logging

Built-in error collection captures API failures, network issues, and application crashes automatically. Users can review errors through the dashboard and optionally submit them for remote analysis.

Integration Guide

LibreChat Integration

Use LibreChat, an open-source AI chat interface, as a self-hosted chat UI for Route42. LibreChat sends all requests using a single static model identifier (route42), and Route42 handles intelligent model selection automatically.

LibreChat UI

Route42 API (model: "route42")

Best Model Selected

Prerequisites

Windows 10/11 with WSL2 enabled

Docker Desktop installed

Route42 API running locally or accessible via network

A valid Route42 API key

Step 1 — Install Docker & Enable WSL2 Integration

1. Install Docker Desktop

2. Open Docker Desktop → Settings → Resources → WSL Integration

3. Enable integration for your Ubuntu WSL distribution

4. Restart Docker Desktop

Step 2 — Clone LibreChat

Open your Ubuntu WSL terminal:

                        
$ git clone https://github.com/danny-avila/LibreChat.git

$ cd LibreChat

Step 3 — Enable Docker Permissions (WSL)

                        
$ sudo usermod -aG docker $USER

$ exit

Reopen the terminal and verify with docker ps

Step 4 — Configure LibreChat for Route42

Create librechat.yaml in the LibreChat root directory:

librechat.yaml

version: 1.2.1

cache: true

endpoints:

  custom:

    - name: "Route42"

      baseURL: "http://host.docker.internal:8080/api"

      apiKey: "${ROUTE42_API_KEY}"

      models:

        default:

          - "route42"

        fetch: false

      titleConvo: true

      titleModel: "route42"

      modelDisplayLabel: "Route42"

Field	Purpose
`baseURL`	Route42 API base URL. Uses `host.docker.internal` to reach the host from Docker.
`apiKey`	Loaded securely from environment variables.
`models.fetch: false`	Prevents LibreChat from calling `/models` — not needed since Route42 handles model selection.
`route42`	Static model identifier used for all requests.

Step 5 — Set Environment Variables

Create or edit .env in the LibreChat root:

.env

ROUTE42_API_KEY=your_route42_api_key_here

ENDPOINTS=custom

Remove any OPENAI_* variables if present to avoid conflicts.

Step 6 — Mount the Configuration

Create docker-compose.override.yml in the LibreChat root:

docker-compose.override.yml

services:

  api:

    volumes:

      - ./librechat.yaml:/app/librechat.yaml

Step 7 — Start LibreChat

                        
$ docker compose down

$ docker compose up -d

Open your browser at http://localhost:3080, select Route42 as the provider, and start chatting.

What This Integration Provides

Self-hosted chat UI with full conversation history

OpenAI-compatible API connection to Route42

Static model identifier — no manual model selection

Intelligent routing handled entirely by Route42

Why Route42?

Route42 transforms model selection from a manual, one-size-fits-all decision into an intelligent, automatic process optimized for each individual task and user constraint.

ML-Powered Intelligence

Real-time complexity analysis ensures optimal model selection every time

Maximize ROI

Save 85% on API costs by routing locally when possible

Privacy-First

All routing decisions happen locally—zero telemetry

Get Started with Route42

Intelligent LLM SelectionPlatform Documentation