Why We Stopped Recommending LangChain for Production RAG — and What We Use Instead

LangChain's abstraction layer adds complexity without reliability; we switched to custom RAG pipelines with direct vector DB + LLM calls for production workloads.

Published 2026-06-10

Why We Stopped Recommending LangChain for Production RAG — and What We Use Instead

TL;DR: LangChain’s abstractions leak at scale — we moved our Hermes cron agents to custom RAG pipelines (Qdrant + direct Anthropic/OpenAI calls) and cut debugging time by 80%. Full comparison →

The Context

Hermes runs 18 scheduled cron jobs on a single Mac, each needing reliable retrieval over our brain/ documentation corpus (50+ markdown files, ~200k tokens). Our agent runs were failing silently when LangChain’s RetrievalQA chain swallowed vector DB timeouts, returned empty contexts, or hallucinated citations. Team: 1 operator. Constraint: local-first, zero external dependencies, debuggable in <5 minutes via local logs.

What We Tested

Tool / Approach	Use Case	Verdict	Why
LangChain `RetrievalQA` + Chroma	Production RAG for cron agents	❌	Silent failures on timeout; abstraction hides retry logic; version churn breaks chains monthly
LangGraph (agent orchestration)	Multi-step agent workflows	❌	Over-engineered for linear cron jobs; state management adds latency; debug complexity high
Custom pipeline: Qdrant + direct API calls	Production RAG for cron agents	✅	Explicit retry/timeout control; <100 lines Python; full visibility into every retrieval + generation step
LlamaIndex (lightweight)	Alternative framework test	🟡	Better than LangChain but still adds abstraction layer we don’t need for fixed schema

The Pivot Point

Three consecutive weekly synthesis runs (cron job weekly-synthesis-001) returned empty answers despite relevant docs in Chroma. LangChain’s RetrievalQA caught the vector DB timeout, logged nothing, and returned a generic “I don’t know” response. We added debug prints inside the chain — 4 hours later — and found the retriever was returning [] on transient network blips. No retry, no circuit breaker, no observability. That’s when we stripped it out.

What We Use Now

Custom RAG pipeline (~80 lines Python):

# rag_pipeline.py
from qdrant_client import QdrantClient
from anthropic import Anthropic

def retrieve_and_generate(query: str, collection: str = "hermes-brain") -> str:
    client = QdrantClient(path="./qdrant")
    hits = client.search(collection, query_vector=embed(query), limit=5)
    context = "\n---\n".join([h.payload["text"] for h in hits])
    
    response = Anthropic().messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=2000,
        system="Answer using ONLY the provided context. Cite sources inline.",
        messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}]
    )
    return response.content[0].text

Exponential backoff retry on Qdrant (3 attempts, 2s base)
Circuit breaker trips after 3 consecutive failures
Every call logs: query, retrieved chunk IDs, token counts, latency, errors
hermes debug-rag <job-id> shows last 20 retrievals with scores

When You’d Choose Differently

Use LangChain/LangGraph if: building a customer-facing chat product with non-technical PMs who need visual flow builders, or need rapid prototyping with 50+ integrations (Slack, Notion, SQL, etc.) out of the box.
Use LlamaIndex if: you want a lighter framework with better structured data handling (SQL, Pandas) and can accept some abstraction leakage.
Stay custom if: you control the full stack, retrieval schema is fixed, and observability > velocity.

Tool Crucible Rating

Dimension	Score (1–5)	Notes
Overall	2	Works for demos; fails silently in production; version churn is hostile to stability
Ease of Adoption	4	Excellent docs, huge ecosystem, fast to “hello world”
Value	2	Negative value at scale — debugging abstraction leaks costs more than building custom
Support/Ecosystem	5	Largest community; but GitHub issues show same production gaps unreplied for months

This is part of our AI Agent Frameworks evaluation series. See full comparison: Tool Crucible Agent Framework Comparison

Last reviewed 2026-06-10. See our methodology and affiliate policy.