Learn

/ Where /goal fits in the loop

Insight · /goal

How /goal slots into the loop.

I've been running /goal in Claude Code and Codex. Same word, different command. The difference matters, and it changes where this primitive lives in the five-phase loop this site is built around.

The loop you already know.

Every productive AI interaction runs the same five phases, load → clarify → execute → record → reflect, orbiting the protocol. Once it's moving, one question hangs over every iteration. When do we stop?

Most agentic work doesn't fail because the model couldn't do the work. It fails because nobody told it when the work is finished. The model decides, and the model is biased toward "yes, that's done."

The protocol has always tried to solve this by making you write the stop condition down, in TODO.md, as acceptance criteria.

TODO.md

## Active
- [ ] Build task list feature
- AC: Users can create, edit, delete tasks
- AC: Tasks persist in Supabase
- AC: Drag-and-drop reordering

Those AC lines are the protocol's version of a completion_condition. Until last month, the only thing checking them was the same model doing the work. That's the part /goal just changed.

/goal does two jobs in that loop.

Both Claude Code (v2.1.139) and Codex shipped a /goal primitive in the same window. They put it in roughly the same place, between execute and the next loop. But /goal doesn't just decide when to stop. It also keeps the loop pointed at the right target while it's running.

Fig. 01

where /goal sits in the five-phase loop

protocol

→

load

→

clarify

→

execute

→

/goal

→

record

→

reflect

↑ /goal lives here

↺ loop until /goal says stop

/goal lives on the seam between execute and record. It does two jobs from there. It holds the target across iterations, and it decides when the loop exits. Same loop. Two new roles.

Role 01 Intent-holder.

Across iterations, agents drift. They re-read the original prompt and quietly decide that "done" might mean something slightly different than it did three turns ago. /goal pins the target. The completion_condition you stated up front becomes the compass on every iteration. The agent doesn't get to redefine it mid-run because the harness keeps re-checking the same condition.

This is the role that used to require workarounds. Protocols like ours hand-rolled it by re-loading the goal at the top of every loop. /goal makes the re-check a primitive the agent can't route around.

Role 02 Terminator.

And after every turn, /goal asks, is the condition met yet? If yes, exit. If no, loop again. This is the role where the two implementations diverge. They disagree about who gets to answer that question.

claude code /goal

external judge

A second model reads the transcript and grades it against the completion_condition you wrote up front.

stops when, "was the work actually right?"

codex /goal

budget_limit

A token meter watches the run and wraps up when spend crosses the configured ceiling.

stops when, "did the work fit the budget?"

Same loop. Different exit door. Same primitive, two engineering bets about which failure mode is the one worth designing out.

Why Claude's choice maps to the protocol.

This is the part that matters if you're already running uno or duo.

The protocol has pushed one habit harder than all the others. Write the done-condition down. TODO.md doesn't say "build the task list." It says, AC: users can create, edit, delete tasks. AC: tasks persist in Supabase. AC: drag-and-drop reordering.

Those acceptance criteria are the same intellectual object as Claude's completion_condition.

Until /goal, the only thing checking the AC against the work was the agent that did the work. The grading happened implicitly at the RECORD step at end-of-loop, and the grader was the same model that had just spent the loop convincing itself the work was good.

Anti-pattern

Self-grading at end of long loops.

The agent that did the work is also the agent judging whether the work is done. Over a long run, the bias toward "yes, that's done" wins. The grader needs to be a different model.

Claude's /goal swaps in a different model as the grader. The intellectual move the protocol always wanted at RECORD, with a real second pair of eyes.

Codex isn't wrong, it's solving a different problem, long runs that quietly burn through context and turn to mush. Real failure. Different failure. If your worry is "the agent will declare victory too early," you want the external judge. If your worry is "the agent will run away with the bill," you want budget_limit. Most teams will eventually want both.

What it changes for the protocol.

→Your AC are already the input. If you've been keeping TODO.md the way the protocol prescribes, you don't have to rewrite anything to use /goal. Your AC translate directly into a completion_condition.
→The RECORD step gets a real judge. The implicit self-check that used to happen at end-of-loop now hands off to /goal's external judge. RECORD still auto-commits. But the judging at the seam is a different model.
→The loop gets quieter. Less self-grading drift. Less "I think we're done?" Less restating outcomes a turn later in reflect.
→Codex still has a job. /goal with budget_limit is a sensible default for unattended runs, batch jobs, anything where the cost ceiling is the actual constraint.

The through-line

The protocol's bet has always been that "done" needs to be written down before the work starts. /goal is the first time the platform agrees with us out loud. The rest of the protocol, the AC, the RECORD/REFLECT seam at end-of-loop, was already in the right shape for it.

Where to Go Next

How /goal slots into the loop.

The loop you already know.

/goal does two jobs in that loop.

Role 01 Intent-holder.

Role 02 Terminator.

Why Claude's choice maps to the protocol.

Self-grading at end of long loops.

What it changes for the protocol.

Keep going with the protocol.

Why we double down on Claude Code →

uno · operate →

duo · construct →

AI Basics 101 →

AI Readiness Check →