Why We Stopped Recommending LangChain for Production RAG — and What We Use Instead
LangChain's abstraction layer adds complexity without reliability; we switched to custom RAG pipelines with direct vector DB + LLM calls for production workloads.
Published 2026-06-10
Why We Stopped Recommending LangChain for Production RAG — and What We Use Instead
TL;DR: LangChain’s abstractions leak at scale — we moved our Hermes cron agents to custom RAG pipelines (Qdrant + direct Anthropic/OpenAI calls) and cut debugging time by 80%. Full comparison →
The Context
Hermes runs 18 scheduled cron jobs on a single Mac, each needing reliable retrieval over our brain/ documentation corpus (50+ markdown files, ~200k tokens). Our agent runs were failing silently when LangChain’s RetrievalQA chain swallowed vector DB timeouts, returned empty contexts, or hallucinated citations. Team: 1 operator. Constraint: local-first, zero external dependencies, debuggable in <5 minutes via local logs.
What We Tested
| Tool / Approach | Use Case | Verdict | Why |
|---|---|---|---|
LangChain RetrievalQA + Chroma | Production RAG for cron agents | ❌ | Silent failures on timeout; abstraction hides retry logic; version churn breaks chains monthly |
| LangGraph (agent orchestration) | Multi-step agent workflows | ❌ | Over-engineered for linear cron jobs; state management adds latency; debug complexity high |
| Custom pipeline: Qdrant + direct API calls | Production RAG for cron agents | ✅ | Explicit retry/timeout control; <100 lines Python; full visibility into every retrieval + generation step |
| LlamaIndex (lightweight) | Alternative framework test | 🟡 | Better than LangChain but still adds abstraction layer we don’t need for fixed schema |
The Pivot Point
Three consecutive weekly synthesis runs (cron job weekly-synthesis-001) returned empty answers despite relevant docs in Chroma. LangChain’s RetrievalQA caught the vector DB timeout, logged nothing, and returned a generic “I don’t know” response. We added debug prints inside the chain — 4 hours later — and found the retriever was returning [] on transient network blips. No retry, no circuit breaker, no observability. That’s when we stripped it out.
What We Use Now
Custom RAG pipeline (~80 lines Python):
# rag_pipeline.py
from qdrant_client import QdrantClient
from anthropic import Anthropic
def retrieve_and_generate(query: str, collection: str = "hermes-brain") -> str:
client = QdrantClient(path="./qdrant")
hits = client.search(collection, query_vector=embed(query), limit=5)
context = "\n---\n".join([h.payload["text"] for h in hits])
response = Anthropic().messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2000,
system="Answer using ONLY the provided context. Cite sources inline.",
messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}]
)
return response.content[0].text
- Exponential backoff retry on Qdrant (3 attempts, 2s base)
- Circuit breaker trips after 3 consecutive failures
- Every call logs: query, retrieved chunk IDs, token counts, latency, errors
hermes debug-rag <job-id>shows last 20 retrievals with scores
When You’d Choose Differently
- Use LangChain/LangGraph if: building a customer-facing chat product with non-technical PMs who need visual flow builders, or need rapid prototyping with 50+ integrations (Slack, Notion, SQL, etc.) out of the box.
- Use LlamaIndex if: you want a lighter framework with better structured data handling (SQL, Pandas) and can accept some abstraction leakage.
- Stay custom if: you control the full stack, retrieval schema is fixed, and observability > velocity.
Tool Crucible Rating
| Dimension | Score (1–5) | Notes |
|---|---|---|
| Overall | 2 | Works for demos; fails silently in production; version churn is hostile to stability |
| Ease of Adoption | 4 | Excellent docs, huge ecosystem, fast to “hello world” |
| Value | 2 | Negative value at scale — debugging abstraction leaks costs more than building custom |
| Support/Ecosystem | 5 | Largest community; but GitHub issues show same production gaps unreplied for months |
This is part of our AI Agent Frameworks evaluation series. See full comparison: Tool Crucible Agent Framework Comparison
Last reviewed 2026-06-10. See our methodology and affiliate policy.