Command-Based vs Relationship-Based Agents

Blog image

Most of the "AI agent" debate is arguing about two different products as if they were one. One kind waits for a task and does it. The other kind already knows things about you and acts on that. They fail differently, they earn trust differently, and the question "which agent should I use" has no answer until you know which of the two you're actually holding. This piece sorts a personal AI agent from a task-running one — what each remembers, when each is the right call, and the one trust question that decides whether memory is a feature or a liability.

The confusion is fair. The word "agent" got stretched to cover a coding tool, a web-task runner, and a thing that texts you about your sister's birthday. Those aren't variations on a theme. They're separate categories that happen to share a noun.

A friend put it to me mid-debrief last week: "So is Macaron's two-list rule just the same as letting Codex run wild?" No — and that gap is the whole article. I run smaller tests when the bigger ones leak, and this distinction leaked everywhere until I split agents into two buckets by where they start.

Blog image

Why "AI Agent" Now Means Several Different Things

An agent, loosely, is software you hand a goal to instead of a click-by-click instruction. That's the only thing the examples below share. After that, they diverge hard.

The cleanest way I've found to cut them apart is the starting condition. Some agents start from a task you describe at that moment. Others start from the context they've accumulated about you over time. Everything else — the memory question, the trust question, the "when is this useful" question — falls out of that one split.

Command-Based Agents Start With a Task

You give it a job. It does the job. It forgets you the moment it's done. That's not a limitation — for the work these tools do, forgetting is correct.

Clear Instructions

Command-based agents want a well-formed request. "Refactor this module," "book a flight under $400 arriving before noon." The more precise you are, the better they perform. Vague input is where they wander.

OpenAI's own guidance on its web-task agent says this plainly — it warns against open-ended prompts like "check my email and handle everything," which is exactly the kind of instruction a command agent has no good way to scope. Avoiding vague prompts and stopping a task that looks suspicious are part of using ChatGPT Agent safely, per OpenAI's help center. The tool is sharpest when the target is sharp.

Blog image

Defined Outputs

These agents produce a deliverable you can inspect. Codex, OpenAI's coding agent, is the obvious case. Each Codex task runs in a separate cloud environment preloaded with your repository, where the agent reads and edits files, runs tests, and proposes code changes for review. Most tasks take between 1 and 30 minutes, and Codex returns command logs and test results so you can inspect what it did. You get a pull request, a spreadsheet, a booked reservation — a concrete artifact.

ChatGPT Agent works the same way on the web side. According to OpenAI's ChatGPT agent announcement, it brings together Operator's ability to interact with websites, deep research's information-synthesizing, and ChatGPT's conversational fluency, carrying out tasks on its own virtual computer while shifting between reasoning and action. The point is always a finished output, not an ongoing relationship.

User Supervision

You stay in the loop, and the loop is the safety mechanism. ChatGPT Agent requests permission before consequential actions and lets you interrupt, take over the browser, or stop a task at any point. That's the right design when an agent might spend your money or send mail in your name. Supervision is the feature, not friction.

Here's the thing about command-based agents: they're built around a single transaction. Nothing carries forward. And for code review or a one-off booking, nothing should.

Relationship-Based Agents Start With Context

Now flip the starting condition. A relationship-based agent doesn't begin with a task. It begins with what it already knows about you — and that changes everything about how help arrives.

Memory Across Conversations

This is the dividing line. AI memory is what lets the second category exist at all. A command agent that "remembered" your last refactor would be slightly creepy and not more useful. A life assistant that forgot you're lactose intolerant every time you asked about dinner would be useless.

I notice the friction before I notice the feature, and the friction with most tools is exactly this: re-explaining yourself. The first few minutes of every interaction spent rebuilding context the thing should have kept. Macaron sits in this category deliberately — a personal AI agent whose entire premise is continuity rather than transaction. It's not trying to be a coding tool that forgets you. It's trying to be the thing that doesn't.

Preferences and Patterns

Relationship-based agents work by accumulation. They notice you skip breakfast on deadline weeks, that you ask for gentle reminders rather than blunt ones, that "the usual" means something specific. None of that is task-shaped. It's pattern-shaped, and patterns only show up over time.

This is the part most write-ups skip, so I'll say it directly: the value isn't in any single answer. It's in the answer being calibrated to you without you re-specifying the calibration each time. That's a different kind of usefulness than a clean pull request.

Help That Adapts Over Time

A life assistant built this way gets more useful the longer you use it, which is the opposite curve from a command tool — a coding agent is as good on day one as day ninety. A relationship-based AI is supposed to be worse on day one and meaningfully better by month three. If it isn't, the memory isn't doing anything, and you should treat it like a command agent that's pretending.

Codex, ChatGPT Agent, and Personal AI Are Not the Same Category

Lining them up side by side makes the category error obvious. These tools don't compete. They barely overlap.

Coding and Work Execution

Codex is a coding agent and nothing else pretends otherwise. It's OpenAI's umbrella name for a family of surfaces — terminal CLI, IDE extension, cloud delegation through ChatGPT, a GitHub bot, and computer-use — that share a single underlying model and account context. Its job is shipping code. Memory of you would add nothing to that.

Online Task Completion

Blog image

ChatGPT Agent occupies the web-task lane. It can plan meetings, analyze documents, browse for content, interact with secure sites, and create deliverables like slide decks or spreadsheets. It grew out of OpenAI's Operator, the earlier browser-driving agent — closer to daily life than Codex, but still transactional, executing a task and handing it back.

Where it starts

What it optimizes for

Memory of you

Coding agent (Codex)

A coding task

A finished, inspectable code change

Not needed

Web-task agent (ChatGPT Agent)

An online task

A completed action or deliverable

Not the point

Personal AI (life assistant)

Accumulated context

Help calibrated to you over time

The whole point

Life Assistance and Emotional Context

The third lane is the one the other two don't touch: ongoing life support where being understood is the deliverable. Remembering what you're worried about. Knowing the reminder you wanted last week. That's not a task you complete and close. It's a relationship-based AI doing the thing a command agent structurally can't.

When Command-Based Agents Are the Better Fit

Reach for a command agent when the work is bounded and inspectable. A bug to fix, a form to fill, a flight to book, a deck to draft. Anything with a clear definition of done. You want the output, you'll review it, and you don't want the tool carrying anything forward. The transaction model is a strength here, full stop.

When Relationship-Based Agents Feel More Useful

Reach for a personal AI when the value is in continuity, not completion. Daily decisions where re-explaining your context every time is the actual cost. Emotional or situational support where being remembered is the point. The blurry, recurring stuff that never resolves into a single task with a checkbox. If you find yourself re-introducing yourself to a tool, you're using the wrong category for the job.

The Trust Question: What Should an Agent Remember?

Here's where it gets specific. Memory is the source of a relationship agent's usefulness and its biggest liability. The right default isn't "remember everything" — it's "remember the right things, with permission."

A useful split: ambient preferences (how you like reminders worded, your usual coffee order) are low-stakes and fine to retain. Sensitive context (health details, relationship specifics, anything you'd hesitate to write down) should be retained only on an explicit yes. The framing that matters is the difference between convenience and consent. The framing was useful; the prescription — "more memory is always better" — is a trap.

Notice the contrast with command agents, where the safety model is supervision, not memory. OpenAI's guidance for its web-task agent recommends enabling only the apps a task needs and clearing browser data after sensitive sessions, and its deep research documentation similarly tells users to only connect sources they're authorized to access. A relationship agent can't lean on per-task supervision the same way, because the whole point is that it persists. So the trust burden moves from "watch each action" to "control what's kept." Different category, different defense.

FAQ

Blog image

Can one AI agent be both command-based and relationship-based?

In principle, yes — a system could take precise one-off commands and also carry context forward. In practice the design pressures pull in opposite directions: command tools optimize for clean, forgettable transactions, relationship tools for accumulation. Most products are clearly weighted toward one. Ask which one starts the interaction — a task, or what it already knows about you.

What should a relationship-based agent remember only with permission?

Anything sensitive: health, finances, relationships, location patterns, anything you'd want to delete later. Low-stakes preferences (tone, defaults, your usual order) are reasonable to retain quietly. The test: would you be uncomfortable if this surfaced unprompted? If yes, it should be opt-in, not assumed.

When should I give an agent a specific command instead of letting it infer?

When the task is bounded and the cost of a wrong guess is real — spending money, sending a message, changing code. Inference is fine for low-stakes, reversible help. For consequential actions, be explicit, and pick a command-based tool that asks before it acts.

How can I tell if an agent is overstepping?

It references things you didn't tell it in this context, acts without confirming on something consequential, or "helps" in a direction you didn't ask for. A well-built relationship agent should be able to show you what it remembers and let you edit or delete it. If you can't inspect its memory, that's the warning sign.

Can a life assistant work without deep memory?

It can function, but it won't do the thing that justifies the category. Strip out memory and you've got a command agent with a friendlier tone — useful, but it'll never get more useful the longer you use it. The adapting-over-time curve is the entire payoff. No memory, no curve.

So the question was never "which agent is best." It's which category your problem belongs to — a task to finish, or a context to carry. If your needs are all bounded, one-off, inspectable work, a relationship-based AI is overhead you don't need; a command agent will serve you better and ask less of your trust. If you keep re-explaining yourself to a tool that should already know, that's the signal you're in the other category. Worth sitting with which one you actually reach for most.

Previous posts: