Why We're Building Custom Agent Orchestration Instead of Using Cursor's Native Autonomous Mode

Cursor's autonomous workflows work for interactive coding but lack the durability, observability, and policy controls our cron agents need — we're keeping our terminal+file-state architecture with incremental hardening.

Published 2026-06-10

Why We’re Building Custom Agent Orchestration Instead of Using Cursor’s Native Autonomous Mode

TL;DR: Cursor’s auto-mode is optimized for human-in-the-loop IDE work — our 18 cron agents need deterministic scheduling, per-job allowlists, tamper-evident logs, and zero GUI dependency. We’re hardening our cron+terminal stack instead. Full comparison →

The Context

Hermes runs 18 scheduled cron jobs on a single Mac (default profile), each spawning agent sessions that run 5-30 minutes, call tools, write files, and update shared memory. Cursor’s autonomous workflows (Composer agent mode, background agents) are built for interactive developers who can nudge stuck agents. Our cron runs at 3 AM with no human present. Constraint: single-machine, local-first, zero-container orchestration, operator must verify any failure in <5 minutes via local logs.

What We Tested

Tool / Approach	Use Case	Verdict	Why
Cursor Composer agent mode (auto)	Autonomous coding tasks	🟡 Reference	Strong for interactive work; “forgets” running apps; needs human nudge; no scheduling
Cursor background agents	Long-running tasks	🟡 Reference	Better persistence; still IDE-coupled; no policy engine; Mac-only until recently
Grok Build v0.2.20 (worktrees + MCP + compaction)	Reference for local agent infra	🟡 Reference	Ships the exact primitives we need: worktrees, MCP lifecycle, context compaction
Hermes current (cron → terminal → file state)	Production cron agents	✅ Current	Works for 18 jobs; simple; debuggable; no framework lock-in
Custom Python orchestrator (2025 attempt)	Full-featured agent platform	❌ Abandoned	Added complexity; cron + terminal + files covers 95% of needs

The Pivot Point

We tried adopting Cursor’s background agents for our overnight research job (deep-research-001). The agent started well but at turn 12 “forgot” the research brief context — it had been pushed out by tool call history. Cursor’s compaction wasn’t triggered because the session was technically “active” (just slow). The run produced 40 pages of hallucinated citations. We realized: Cursor’s autonomy assumes a human watches the sidebar. Our cron jobs have no sidebar.

What We Use Now

Cron + background terminal + file state — with incremental hardening:

Per-job workspace: Each cron job gets HERMES_WORKSPACE=/tmp/hermes-<job-id> env var; writes isolated, reads shared brain/ read-only
Session health logging: Every background terminal writes logs/hermes-<job-id>-<timestamp>.jsonl with tool calls, durations, errors, token attribution
Manual compaction: Weekly synthesis cron includes explicit “summarize prior context” step in prompt; not automatic yet
MCP readiness: n8n-mcp and metricool-mcp configured read-only; health check cron validates connectivity
Allowlist per job: config/hermes-policy/<job-id>.yaml — paths (read/write), commands (allow/deny), APIs (allow/deny), max_duration, max_tokens
sigstore keyless signing: cosign sign-blob on log files post-job; hermes verify-logs <job-id> checks integrity

When You’d Choose Differently

Adopt Cursor / Grok Build / similar IDE agents if: you want integrated coding + agent infra, accept Mac/Linux/Windows, prefer GUI over cron/terminal, human-in-the-loop is acceptable.
Build custom orchestrator (Temporal, Prefect, Kubernetes operators) if: you need multi-machine, complex DAGs, SLA-grade reliability, team collaboration on agent workflows.
Stay with cron + files if: single machine, <50 jobs, operator is technical, debuggability > features, zero external dependencies required.

Tool Crucible Rating

Dimension	Score (1–5)	Notes
Overall	3	Great for interactive dev; not built for headless production cron
Ease of Adoption	5	Zero setup if you already use Cursor; “it just works” for interactive tasks
Value	2	Negative value for our use case — fighting the IDE assumptions costs more than custom
Support/Ecosystem	4	Active development; Anysphere responsive; but roadmap prioritizes IDE users, not cron operators

This is part of our Agent Orchestration evaluation series. See full comparison: Tool Crucible Local Agent Infrastructure

Last reviewed 2026-06-10. See our methodology and affiliate policy.