Why We Chose Cron + File State Over LangGraph/Temporal for Agent Orchestration
Our 18 cron jobs need deterministic scheduling, not DAGs. LangGraph and Temporal add complexity for problems we don't have — cron + terminal + JSONL logs handles 95% of needs with zero infrastructure.
Published 2026-06-10
Why We Chose Cron + File State Over LangGraph/Temporal for Agent Orchestration
TL;DR: LangGraph and Temporal solve distributed, multi-actor, long-running workflow problems. Our 18 single-machine cron jobs need scheduling + isolation + audit logs — not a workflow engine. We kept cron and hardened it. Full comparison →
The Context
Hermes runs 18 scheduled cron jobs on a single Mac (default profile). Jobs: 3 research (15-30 min), 4 synthesis (10-20 min), 6 publishing (2-5 min), 5 maintenance (1-3 min). Each job spawns an agent session that calls tools, writes files, updates shared memory. No human in the loop. No multi-machine. No complex DAGs (jobs are independent, scheduled at different times). Team: 1 operator. Constraint: local-first, zero containers, debuggable in <5 min via local logs, zero external dependencies.
What We Tested
| Tool / Approach | Use Case | Verdict | Why |
|---|---|---|---|
| LangGraph (LangChain) | Agent workflow orchestration | ❌ | Designed for conversational agents with human-in-loop; state graph adds latency; overkill for linear cron |
| Temporal.io | Durable execution platform | ❌ | Requires cluster (even dev server); 50MB+ binary; designed for microservices; not for 18 cron jobs |
| Prefect | Workflow orchestration | ❌ | Cloud-first; server + agent model; adds scheduling layer we already have (cron) |
| Custom Python orchestrator (2025) | Full agent platform | ❌ Abandoned | Reinvented cron poorly; added failure modes; cron + terminal covers 95% |
| Cron + background terminal + file state | Production | ✅ Current | Native scheduling; zero deps; isolation via workspaces; JSONL logs = audit trail |
| Grok Build v0.2.20 (worktrees + MCP + compaction) | Reference architecture | 🟡 | Validates our direction: worktrees = isolation, MCP = tool protocol, compaction = long runs |
The Pivot Point
We built a custom Python orchestrator in 2025: job registry, DAG runner, retry policies, state persistence to SQLite, webhook callbacks. It worked — until a cron job triggered the orchestrator which triggered an agent which called a tool that wrote a file the orchestrator tried to read but the file lock failed because the agent hadn’t released it. Three layers of scheduling (cron → orchestrator → agent)created a deadlock. We deleted 2,000 lines of orchestrator code and went back to cron → terminal(background=true) → agent. Zero deadlocks since.
What We Use Now
Cron + terminal + file state — hardened:
- Scheduling: System cron (launchd on macOS) — battle-tested, handles sleep/wake, timezone, conflicts
- Isolation: Per-job
HERMES_WORKSPACE=/tmp/hermes-<job-id>; writes private, readsbrain/read-only - Execution:
terminal(background=true, notify_on_complete=true)— Hermes session per job - State: File-based. Shared memory =
brain/memory/<profile>.json; job output =artifacts/ - Observability: JSONL logs per session (
logs/hermes-<job-id>-<timestamp>.jsonl) with tool calls, durations, errors, token attribution - Verification:
sigstorekeyless signing on logs;hermes verify-logs <job-id>checks integrity - Policy: Per-job allowlist (
config/hermes-policy/<job-id>.yaml) — paths, commands, APIs, limits - Health: Pre-flight checks (npm audit, pip-audit, ollama verify); post-flight verification
What we borrow from “real” orchestrators:
- Idempotency keys: Every job run gets
IDEMPOTENCY_KEY=<job-id>-<timestamp>; re-run safe - Dead letter queue: Failed jobs write to
artifacts/failed/<job-id>/with full context for manual replay - SLA monitoring:
hermes sla-report --since 7dshows success rate, avg duration, p95 latency per job
When You’d Choose Differently
- Use LangGraph if: building a conversational product with human-in-loop, branching conversations, need visual debugging, team of non-technical PMs designing flows.
- Use Temporal if: multi-service architecture, need exactly-once semantics, long-running (days/weeks) workflows, team manages Kubernetes, SLA requires durability guarantees.
- Use Prefect/Dagster if: data pipeline workflows, complex DAGs with branching/merging, need web UI for monitoring, team collaboration on workflow definitions.
- Stay with cron + files if: single machine, <50 jobs, independent scheduled tasks, operator is technical, debuggability > features, zero external deps.
Tool Crucible Rating
| Dimension | Score (1–5) | Notes |
|---|---|---|
| Overall | 2 (for our use case) | Excellent tools for their target problems; our problem is simpler |
| Ease of Adoption | 2 | Significant learning curve; infrastructure overhead; not “add to existing stack” |
| Value | 1 (for us) | Negative — solving problems we don’t have; adding failure modes we didn’t have |
| Support/Ecosystem | 5 | All three are production-grade with strong communities; just not for cron operators |
This is part of our Agent Orchestration evaluation series. See full comparison: Tool Crucible Orchestration Patterns 2026
Last reviewed 2026-06-10. See our methodology and affiliate policy.