Use Codex and Opus Together: A Lightweight Plan → Build → Sanity-Check Workflow

Hey, I'm Anna.I didn't plan to "use Codex and Opus together." I just wanted my little weekend script to stop breaking every time I renamed a folder. I opened a new chat out of mild annoyance, not ambition. And then something quiet clicked: when I let a reasoning‑first model shape the plan and a code‑first model do the typing, the work felt… lighter. Not faster at first, but less mentally heavy. Enough to keep going.

Why using both can be normal

I'm not building a system or chasing perfect automation. Most days, I'm smoothing small frictions: a script that files PDFs, a nudge for a weekly review, a one‑off parser so I don't do the same copy‑paste dance again.

Here's what I noticed, after a few short runs across a handful of tasks: pairing a strong reasoning model (Claude Opus via the Anthropic console) with a code‑centric model (what many of us still shorthand as "Codex," though OpenAI's original Codex is retired, these days it's a code‑forward GPT or Copilot) felt natural. Not a grand workflow. More like passing a note across the table.

Opus handled the fuzzy parts: constraints, weird edge cases, "what am I missing?" It didn't rush me. The code model handled the literal parts: function signatures, import paths, unit tests. When I tried to make either do both, I spent more time correcting tone shifts, either overthinking in code, or under‑thinking in planning.

Projects have phases

I'm stating the obvious, but it matters: tiny projects have phases even when you do them in one sitting.

  • Before: What am I actually trying to do? What assumptions are shaky? Where can this break?
  • During: What's the fastest draft that isn't brittle? What's the minimum scaffolding so future‑me doesn't swear at past‑me?
  • After: Did it behave with real data? What did it quietly skip?

Opus was best before and after. The code model was best during. When I respected those roles, I noticed fewer "why did I do it that way?" moments later.

The lightweight handoff: Plan → Build → Sanity-check

I kept this simple: one chat with Opus for planning, one thread with the code model for building, then back to Opus for a sanity pass. No fancy orchestration, no plugins I had to babysit.

A real example: I had a folder of monthly statements named semi‑chaotically (Jan_2025, 2025-02, "March 2025 final"). I wanted a script to normalize names and move files into a year/month structure without touching any PDFs that were already tagged "reconciled."

What happened:

  • Plan (Opus): I pasted a short description and 4 example filenames. Opus asked about OS differences, duplicate handling, and logging, things I hadn't thought about. Mildly humbling, but useful.
  • Build (code model): I dropped Opus's bullet points into a fresh thread and asked for a Python script with dry‑run mode and idempotence. The first version worked on 70% of cases: two tweaks later, it handled all my samples. Time saved? Maybe 20 minutes. Mental load reduced? A lot more.
  • Sanity‑check (Opus): I returned with the final script and a short log of what it changed. Opus flagged one risky assumption around non‑UTF8 filenames. I added a try/except and moved on.

None of this felt "automated." It felt like pairing with two coworkers who are good at different things and don't mind short, specific asks.

Our Macaron is designed to help you solve the problem of dealing with such fragmented tasks: Every note, every small script reminder, can be directly transformed into an actionable next step. Even temporary ideas won't be left in folders or chat records.

Try Macaron for free now!

What to pass between models (3 items only)

  • A crisp intent: one paragraph describing the job, constraints, and any deal‑breakers.
  • Minimal real data: 3–5 true examples that capture the messy edges (not just the easy ones).
  • Acceptance checks: the 2–4 ways you'll decide it worked (e.g., "dry‑run shows only renames, no deletes").

That's it. I tried pasting entire folder trees and long rants: it didn't help. Tight inputs, better outputs.

How to avoid “two-model chaos”

The chaos shows up when I blur roles or lose the thread. A few things that kept me sane:

  • One source of truth. I keep a tiny spec at the top of the code file, three lines, max. When I change scope, I update that first. Both models get the same spec.
  • Name decisions out loud. If Opus suggests logging to a file and I decide on stdout, I write "Decision: stdout, because…" in the chat before asking for code changes. It prevents the model from resurrecting old ideas.
  • Don't "teach." I stopped trying to make either model remember my style. I copy the two or three rules that matter into each fresh thread. Less vibe, more clarity.

Also, small note on reality: OpenAI's classic Codex is gone: what I'm calling "Codex" here is just the code‑forward model in tools like GPT or GitHub Copilot.

When one model is enough

Plenty of days, I don't split it. If I'm writing a tiny bash alias or renaming ten files, the code model alone is fine. If I'm sense‑making, drafting a weekly note, deciding what "done" looks like for a mini‑project, Opus alone is calmer.

I'll keep using Codex and Opus together, casually. When the plan is fuzzy, the code is literal, and my attention is thin, the handoff helps. I'm curious whether this stays true the next time I forget to save a dry‑run and have to explain to myself why three files mysteriously vanished.

Hi, I'm Anna, an AI exploration blogger! After three years in the workforce, I caught the AI wave—it transformed my job and daily life. While it brought endless convenience, it also kept me constantly learning. As someone who loves exploring and sharing, I use AI to streamline tasks and projects: I tap into it to organize routines, test surprises, or deal with mishaps. If you're riding this wave too, join me in exploring and discovering more fun!

Apply to become Macaron's first friends