Why We Built Our Own Model Router Instead of Buying — And When You Shouldn't

Tool Crucible evaluation of Why We Built Our Own Model Router Instead of Buying — And When You Shouldn't — real-world testing, tradeoffs, and current stack.

Published 2026-06-07

TL;DR: Multi-model routing saves 60-80% on AI spend, but the hidden cost is eval infrastructure — we open-sourced our router (TypeScript, 400 LOC) after spending 3 weeks building confidence scoring; buy if you lack eval capacity — full comparison.

The Context

Our stack hit 9 tools (Copilot, Cursor, Claude, GPT-4o, Perplexity, v0, Replit, Warp, local Ollama) at $435/mo. 70% of queries were simple (syntax, boilerplate, docs lookup) but routed to $15/M token models. We needed: route by complexity → cheapest capable model → fallback on failure → log everything for eval.

What We Tested

ToolUse CaseVerdictWhy
Alibaba $3/mo multi-model keyAll queries via single endpointNo routing logic; sends everything to strongest model; privacy red flags for client code
OpenRouterUnified API, 100+ models⚠️Good for prototyping; latency variance (200ms-8s); no built-in complexity detection
LiteLLM (self-hosted)Proxy with routing rules, fallbacksYAML config for routing; supports budget limits; adds 50ms latency
Custom TypeScript routerComplexity scoring → model selectionFull control; 400 LOC; integrates with our eval harness; zero marginal cost
PortkeyEnterprise gateway, analytics⚠️Great observability; pricing scales with team; overkill for <10 devs

The Pivot Point

We logged 2,000 real queries with human-rated complexity (1-5). A 7B local model handled 68% of complexity 1-2 queries at GPT-4o quality. The router paid for its dev time in week 2.

What We Use Now

Custom router + LiteLLM fallback. Router scores query: token count + file count + keyword density (refactor, architect, debug) → complexity 1-5. Maps: 1-2 → Ollama (qwen2.5-coder), 3 → Qwen API, 4-5 → Claude 3.5 Sonnet. All calls logged to SQLite for weekly eval.

When You’d Choose Differently

  • No eval infrastructure: OpenRouter/Portkey give you routing without building scoring/feedback loops
  • Compliance-heavy: Portkey’s audit trails and data residency options matter
  • Team <3 devs: ROI on custom router doesn’t pencil out — just use OpenRouter with manual model selection

Tool Crucible Rating

OverallEaseValueSupport
4.3/53.0/55/52.5/5

This is part of our AI infrastructure evaluation series. See full comparison: Multi-Model Routing 2026

Last reviewed 2026-06-07. See our methodology and affiliate policy.