Why We Built Our Own Model Router Instead of Buying — And When You Shouldn't
Tool Crucible evaluation of Why We Built Our Own Model Router Instead of Buying — And When You Shouldn't — real-world testing, tradeoffs, and current stack.
Published 2026-06-07
TL;DR: Multi-model routing saves 60-80% on AI spend, but the hidden cost is eval infrastructure — we open-sourced our router (TypeScript, 400 LOC) after spending 3 weeks building confidence scoring; buy if you lack eval capacity — full comparison.
The Context
Our stack hit 9 tools (Copilot, Cursor, Claude, GPT-4o, Perplexity, v0, Replit, Warp, local Ollama) at $435/mo. 70% of queries were simple (syntax, boilerplate, docs lookup) but routed to $15/M token models. We needed: route by complexity → cheapest capable model → fallback on failure → log everything for eval.
What We Tested
| Tool | Use Case | Verdict | Why |
|---|---|---|---|
| Alibaba $3/mo multi-model key | All queries via single endpoint | ❌ | No routing logic; sends everything to strongest model; privacy red flags for client code |
| OpenRouter | Unified API, 100+ models | ⚠️ | Good for prototyping; latency variance (200ms-8s); no built-in complexity detection |
| LiteLLM (self-hosted) | Proxy with routing rules, fallbacks | ✅ | YAML config for routing; supports budget limits; adds 50ms latency |
| Custom TypeScript router | Complexity scoring → model selection | ✅ | Full control; 400 LOC; integrates with our eval harness; zero marginal cost |
| Portkey | Enterprise gateway, analytics | ⚠️ | Great observability; pricing scales with team; overkill for <10 devs |
The Pivot Point
We logged 2,000 real queries with human-rated complexity (1-5). A 7B local model handled 68% of complexity 1-2 queries at GPT-4o quality. The router paid for its dev time in week 2.
What We Use Now
Custom router + LiteLLM fallback. Router scores query: token count + file count + keyword density (refactor, architect, debug) → complexity 1-5. Maps: 1-2 → Ollama (qwen2.5-coder), 3 → Qwen API, 4-5 → Claude 3.5 Sonnet. All calls logged to SQLite for weekly eval.
When You’d Choose Differently
- No eval infrastructure: OpenRouter/Portkey give you routing without building scoring/feedback loops
- Compliance-heavy: Portkey’s audit trails and data residency options matter
- Team <3 devs: ROI on custom router doesn’t pencil out — just use OpenRouter with manual model selection
Tool Crucible Rating
| Overall | Ease | Value | Support |
|---|---|---|---|
| 4.3/5 | 3.0/5 | 5/5 | 2.5/5 |
This is part of our AI infrastructure evaluation series. See full comparison: Multi-Model Routing 2026
Last reviewed 2026-06-07. See our methodology and affiliate policy.