Why We Built Our Own Model Router Instead of Buying — And When You Shouldn't

Tool Crucible evaluation of Why We Built Our Own Model Router Instead of Buying — And When You Shouldn't — real-world testing, tradeoffs, and current stack.

Published 2026-06-07

TL;DR: Multi-model routing saves 60-80% on AI spend, but the hidden cost is eval infrastructure — we open-sourced our router (TypeScript, 400 LOC) after spending 3 weeks building confidence scoring; buy if you lack eval capacity — full comparison.

The Context

Our stack hit 9 tools (Copilot, Cursor, Claude, GPT-4o, Perplexity, v0, Replit, Warp, local Ollama) at $435/mo. 70% of queries were simple (syntax, boilerplate, docs lookup) but routed to $15/M token models. We needed: route by complexity → cheapest capable model → fallback on failure → log everything for eval.

What We Tested

Tool	Use Case	Verdict	Why
Alibaba $3/mo multi-model key	All queries via single endpoint	❌	No routing logic; sends everything to strongest model; privacy red flags for client code
OpenRouter	Unified API, 100+ models	⚠️	Good for prototyping; latency variance (200ms-8s); no built-in complexity detection
LiteLLM (self-hosted)	Proxy with routing rules, fallbacks	✅	YAML config for routing; supports budget limits; adds 50ms latency
Custom TypeScript router	Complexity scoring → model selection	✅	Full control; 400 LOC; integrates with our eval harness; zero marginal cost
Portkey	Enterprise gateway, analytics	⚠️	Great observability; pricing scales with team; overkill for <10 devs

The Pivot Point

We logged 2,000 real queries with human-rated complexity (1-5). A 7B local model handled 68% of complexity 1-2 queries at GPT-4o quality. The router paid for its dev time in week 2.

What We Use Now

Custom router + LiteLLM fallback. Router scores query: token count + file count + keyword density (refactor, architect, debug) → complexity 1-5. Maps: 1-2 → Ollama (qwen2.5-coder), 3 → Qwen API, 4-5 → Claude 3.5 Sonnet. All calls logged to SQLite for weekly eval.

When You’d Choose Differently

No eval infrastructure: OpenRouter/Portkey give you routing without building scoring/feedback loops
Compliance-heavy: Portkey’s audit trails and data residency options matter
Team <3 devs: ROI on custom router doesn’t pencil out — just use OpenRouter with manual model selection

Tool Crucible Rating

Overall	Ease	Value	Support
4.3/5	3.0/5	5/5	2.5/5

This is part of our AI infrastructure evaluation series. See full comparison: Multi-Model Routing 2026

Last reviewed 2026-06-07. See our methodology and affiliate policy.