Why We Chose Cron + File State Over LangGraph/Temporal for Agent Orchestration

Our 18 cron jobs need deterministic scheduling, not DAGs. LangGraph and Temporal add complexity for problems we don't have — cron + terminal + JSONL logs handles 95% of needs with zero infrastructure.

Published 2026-06-10

Why We Chose Cron + File State Over LangGraph/Temporal for Agent Orchestration

TL;DR: LangGraph and Temporal solve distributed, multi-actor, long-running workflow problems. Our 18 single-machine cron jobs need scheduling + isolation + audit logs — not a workflow engine. We kept cron and hardened it. Full comparison →

The Context

Hermes runs 18 scheduled cron jobs on a single Mac (default profile). Jobs: 3 research (15-30 min), 4 synthesis (10-20 min), 6 publishing (2-5 min), 5 maintenance (1-3 min). Each job spawns an agent session that calls tools, writes files, updates shared memory. No human in the loop. No multi-machine. No complex DAGs (jobs are independent, scheduled at different times). Team: 1 operator. Constraint: local-first, zero containers, debuggable in <5 min via local logs, zero external dependencies.

What We Tested

Tool / Approach	Use Case	Verdict	Why
LangGraph (LangChain)	Agent workflow orchestration	❌	Designed for conversational agents with human-in-loop; state graph adds latency; overkill for linear cron
Temporal.io	Durable execution platform	❌	Requires cluster (even dev server); 50MB+ binary; designed for microservices; not for 18 cron jobs
Prefect	Workflow orchestration	❌	Cloud-first; server + agent model; adds scheduling layer we already have (cron)
Custom Python orchestrator (2025)	Full agent platform	❌ Abandoned	Reinvented cron poorly; added failure modes; cron + terminal covers 95%
Cron + background terminal + file state	Production	✅ Current	Native scheduling; zero deps; isolation via workspaces; JSONL logs = audit trail
Grok Build v0.2.20 (worktrees + MCP + compaction)	Reference architecture	🟡	Validates our direction: worktrees = isolation, MCP = tool protocol, compaction = long runs

The Pivot Point

We built a custom Python orchestrator in 2025: job registry, DAG runner, retry policies, state persistence to SQLite, webhook callbacks. It worked — until a cron job triggered the orchestrator which triggered an agent which called a tool that wrote a file the orchestrator tried to read but the file lock failed because the agent hadn’t released it. Three layers of scheduling (cron → orchestrator → agent)created a deadlock. We deleted 2,000 lines of orchestrator code and went back to cron → terminal(background=true) → agent. Zero deadlocks since.

What We Use Now

Cron + terminal + file state — hardened:

Scheduling: System cron (launchd on macOS) — battle-tested, handles sleep/wake, timezone, conflicts
Isolation: Per-job HERMES_WORKSPACE=/tmp/hermes-<job-id>; writes private, reads brain/ read-only
Execution: terminal(background=true, notify_on_complete=true) — Hermes session per job
State: File-based. Shared memory = brain/memory/<profile>.json; job output = artifacts/
Observability: JSONL logs per session (logs/hermes-<job-id>-<timestamp>.jsonl) with tool calls, durations, errors, token attribution
Verification: sigstore keyless signing on logs; hermes verify-logs <job-id> checks integrity
Policy: Per-job allowlist (config/hermes-policy/<job-id>.yaml) — paths, commands, APIs, limits
Health: Pre-flight checks (npm audit, pip-audit, ollama verify); post-flight verification

What we borrow from “real” orchestrators:

Idempotency keys: Every job run gets IDEMPOTENCY_KEY=<job-id>-<timestamp>; re-run safe
Dead letter queue: Failed jobs write to artifacts/failed/<job-id>/ with full context for manual replay
SLA monitoring: hermes sla-report --since 7d shows success rate, avg duration, p95 latency per job

When You’d Choose Differently

Use LangGraph if: building a conversational product with human-in-loop, branching conversations, need visual debugging, team of non-technical PMs designing flows.
Use Temporal if: multi-service architecture, need exactly-once semantics, long-running (days/weeks) workflows, team manages Kubernetes, SLA requires durability guarantees.
Use Prefect/Dagster if: data pipeline workflows, complex DAGs with branching/merging, need web UI for monitoring, team collaboration on workflow definitions.
Stay with cron + files if: single machine, <50 jobs, independent scheduled tasks, operator is technical, debuggability > features, zero external deps.

Tool Crucible Rating

Dimension	Score (1–5)	Notes
Overall	2 (for our use case)	Excellent tools for their target problems; our problem is simpler
Ease of Adoption	2	Significant learning curve; infrastructure overhead; not “add to existing stack”
Value	1 (for us)	Negative — solving problems we don’t have; adding failure modes we didn’t have
Support/Ecosystem	5	All three are production-grade with strong communities; just not for cron operators

This is part of our Agent Orchestration evaluation series. See full comparison: Tool Crucible Orchestration Patterns 2026

Last reviewed 2026-06-10. See our methodology and affiliate policy.