AI Agent Architecture

AI Agent Loops Explained: Triggers, Goals, Sub-Agents, and When to Stop

Everybody is suddenly talking about loops. The phrase can sound like another AI hype cycle, but the practical idea is simple: stop asking an agent for one answer at a time and start designing the system that makes the agent work, check, and retry.

The video below is Andrew Warner learning the topic with Matthew Berman. It covers thumbnail loops, Codex automations, Claude Code goals, scheduled routines, sub-agents, token budgets, Zapier MCP permissions, and the very real question underneath all of this: when is a loop worth the cost?

Video source: Andrew Warner with Matthew Berman. This post uses the supplied transcript as commentary and checks loop mechanics against official Codex, Claude Code, and Forward Future sources.

JQ AI SYSTEMS take

Loop engineering is workflow design. The winning skill is not writing a prettier prompt. It is defining the trigger, goal, budget, tools, evidence, and stop condition.


Source note

This post credits Andrew Warner's interview with Matthew Berman and the loop work Matthew is publishing through Forward Future Loop Library. The library describes itself as practical AI-agent prompts with clear checks and stopping conditions. As of the checked page, it lists 64 loops and includes an installable Loop Library skill for coding agents.

I also checked the official Codex automations documentation and the official Claude Code /goal documentation. That matters because "loops" can mean several different surfaces: manual loops, scheduled loops, goal loops, stop hooks, sub-agent loops, and workflow automations.


What is a loop?

A loop is an agent workflow that repeats. The agent does something, checks the result, decides whether the goal is met, and either stops or takes another pass.

That means a loop needs more structure than a normal prompt. A good loop answers:

  • What starts it? A manual command, schedule, webhook, PR, bug report, new file, or status change.
  • What is the goal? A measurable end state or a judgeable quality bar.
  • What can it touch? Files, browser, APIs, plugins, database, local app, or only drafts.
  • How does it check itself? Tests, metrics, screenshots, logs, rubrics, human review, or real-world data.
  • When does it stop? Success, failure, budget, repeated no-progress, or approval needed.

Without those pieces, "run a loop" becomes "let the model wander." That is how you get token burn, vague progress, and edits that are hard to review.


Trigger plus goal

Matthew's cleanest explanation is that a loop needs two things: a trigger and a goal.

Trigger

The event that starts the loop. It can be manual, scheduled, or event-based, such as a pull request, failed check, new support ticket, or recurring weekly review.

Goal

The condition the loop is working toward. It can be deterministic, like passing tests, or judgment-based, like improving a thumbnail according to a rubric.

This is why loops are not limited to coding. A loop can run on page speed, thumbnails, docs, onboarding, product bugs, stale CRM records, ad copy, support replies, or a weekly market scan. The common pattern is not the domain. The common pattern is repeated work plus a checkable target.


Verifiable goals win

The video contrasts two kinds of goals: verifiable goals and LLM-as-judge goals.

The thumbnail example is easy to understand. You can ask an agent to generate ten YouTube thumbnail concepts, score them against a rubric, improve the top three, and keep iterating. That can be useful. But the model is still judging its own work unless you connect it to actual human feedback or real CTR data.

A verifiable goal is stronger:

  • all tests in `test/auth` pass;
  • every page loads under 50 ms under the same test condition;
  • the open bug queue is empty or every item has a clear owner;
  • documentation matches the current codebase and the pull request is reviewable;
  • no critical SEO/GEO crawl issues remain after rerunning the same benchmark.

This lines up with Claude Code's official `/goal` docs. Anthropic says `/goal` sets a completion condition and Claude keeps working across turns until that condition is met. The useful examples are measurable: tests pass, acceptance criteria hold, a file is split under a size budget, or a labeled backlog is empty.

Rule of thumb

If the loop can prove the result with a test, metric, screenshot, log, benchmark, or external data point, it is a better loop. If it only asks another LLM whether it "feels done," keep a human review gate.


Sub-agents and parallel work

Sub-agents are useful when the loop can split work into parallel pieces. The main agent becomes the orchestrator. Child agents investigate separate files, test separate hypotheses, generate separate candidates, or review different parts of the system. Then the main agent rolls the results back up.

That is useful for:

  • testing several fixes for one bug;
  • reviewing several modules for the same pattern;
  • generating ten design or thumbnail candidates;
  • running several research paths before summarizing;
  • using one builder agent and one reviewer agent in a loop.

It is not useful when the job is small, stateful, or delicate. If the work needs one careful thread of reasoning, parallel agents can create more merge work than value.


Scheduled loops

Codex automations are the concrete OpenAI version of scheduled loops. The official docs say automations can be created from a regular Codex thread, can use skills and plugins, and can wake on minute-based, daily, or weekly schedules depending on the type.

The practical idea is simple: first run the workflow manually, tune the prompt, confirm the output is useful, then schedule it.

Good scheduled loops:

  • daily bug triage from logs, GitHub, Linear, or Sentry;
  • weekly docs drift review;
  • morning inbox or Slack action summary;
  • nightly changelog draft from commits and merged PRs;
  • weekly AI visibility or SEO crawl audit;
  • recurring competitor or customer pain-point research.

Bad scheduled loops are the ones that take irreversible action without approval. A loop that drafts is safer than a loop that sends. A loop that opens a pull request is safer than a loop that deploys. A loop that produces evidence is safer than a loop that claims success.


Token budgets and stop rules

The criticism of loops is fair: they can waste money fast. A loop that generates, judges, rewrites, tests, spawns sub-agents, and repeats can burn through a plan limit without producing a better result.

So a production loop needs budgets:

  • Iteration cap: stop after three or five passes unless a human extends it.
  • Token or time cap: stop after a known budget and summarize progress.
  • No-progress rule: stop if the same blocker appears twice.
  • Risk gate: stop before sending, spending, deleting, publishing, deploying, or changing source-of-record data.
  • Evidence requirement: every success claim needs a test, log, screenshot, metric, or diff.

This is where loop engineering starts to look like operations. The clever prompt is the small part. The bigger part is bounding autonomy so it remains useful.


Loop Library

The most useful resource from the episode is the Forward Future Loop Library. It is a public collection of repeatable agent loops for engineering, evaluation, operations, content, and design.

The companion GitHub skill lets your agent find or adapt loop templates directly:

npx skills add Forward-Future/loop-library --skill loop-library -g

The GitHub skill lives here: Forward-Future/loop-library: loop-library skill.

Loop templates worth trying first

  • Docs sweep: keep docs aligned with the current codebase and open a reviewable PR.
  • Sub-50 ms page-load loop: optimize pages until a repeatable performance target is met.
  • Production error sweep: review logs, trace actionable issues, fix, verify, and report.
  • 100% test coverage loop: add meaningful tests until the target coverage passes.
  • SEO/GEO visibility loop: audit AI and search visibility gaps, fix the highest-impact issue, then rerun the benchmark.
  • Full product evaluation loop: test every user-facing surface and fix verified bugs with evidence.

The pattern across those examples is exactly what I like: narrow enough to verify, useful enough to repeat, and explicit enough to stop.


The episode references a wider loop-engineering conversation. These are the featured videos to watch next.


Builder checklist

Before you ask an agent to loop, write this down:

  1. Trigger: manual, scheduled, or event-based?
  2. Goal: what concrete end state should be true?
  3. Verification: what test, metric, screenshot, log, or review proves it?
  4. Inputs: what sources, files, systems, or examples can the agent use?
  5. Tools: what is the agent allowed to call?
  6. Budget: max iterations, max time, or max spend.
  7. Stop rule: when should it stop as done, blocked, or not worth continuing?
  8. Review gate: what must a human approve before the loop takes action?

The best first loop is not the flashiest. It is a loop where failure is cheap, success is measurable, and the output makes your next decision easier.

CTA

Do not ask an agent to loop until you have a trigger, a verifiable goal, a budget, and a stop condition. Autonomy without a stop rule is just expensive improvisation.


Sources

The short version: loops are worth learning, but the real skill is not infinite iteration. It is designing a bounded system where the agent knows what to try, how to verify it, and when to stop.

Common questions

What is an AI agent loop?
An AI agent loop is a repeated cycle where an agent works, checks the result, and continues until a goal is met, a budget is exhausted, or a human stops it. Good loops have a trigger, a goal, verification criteria, and a stop condition.
What makes a loop different from a prompt?
A prompt asks for one response. A loop gives the agent a repeatable system: when to start, what to improve, how to check progress, and when to stop. That is why loop design is closer to workflow design than copywriting.
Are LLM-as-judge loops safe to use?
They can be useful for subjective work like thumbnails, copy, design, or prioritization, but they are weaker than loops with measurable tests. Use LLM-as-judge loops for drafts and ranking, then add human review or real-world data.
When should I use sub-agents in a loop?
Use sub-agents when the work can be split into independent tasks: several files, several hypotheses, several candidates, or several reviews. Do not add sub-agents just because it sounds advanced.
What is the best first loop to try?
Start with a reviewable, low-risk loop: docs sync, test coverage, page-speed cleanup, stale issue triage, or a weekly research sweep. Avoid loops that send messages, spend money, delete data, or publish without approval.
Share
X LinkedIn Reddit
Build Yours

Want a system
like this one?

Book a free 30-minute call. We map your situation, identify the highest-impact automation, and figure out if we are a fit.

Book Free 30-min Call