Posts tagged "codex"

2 posts · all topics

When the Codex review catches blocking issues the Claude review missed: is it the model or the second pass?

I've been watching what our two PR reviewers actually catch, and the pattern is uncomfortable. Our default review runs on Claude. Anyone can also kick off a Codex review on demand — and when they do, Codex regularly flags blocking issues the Claude pass walked right by. Not edge cases. Things that would have shipped.

The honest question: is Codex better at code review, or is the win mostly that it's a second perspective looking at the same diff? If we'd built it the other way around — Codex by default, Claude on demand — would we be writing this post about Claude?

I don't know yet, and I think that's the right place to sit for a minute before we draw the conclusion. What I do know is that on the changes that matter, two reviewers from two different shops are catching more than one of either, and that's a finding regardless of which is "better." We're going to keep both, and I'm going to start measuring which class of issue each one actually catches.

Wiring /codex-review across 16 repos so we stop reviewing every PR.

At our current PR throughput, I'm going to burn out on reviews alone, never mind actual work. We can't move at this pace while also having a human read every change. The model I want us to move toward: ask for human review when you're truly unsure about something, and let the agent reviewers handle the rest.

The piece I shipped to make this real was getting /codex-review wired up across every repo as an on-demand second opinion. Different model family from Claude, uncorrelated blind spots, one comment away when you want it. The work was sixteen PRs across sixteen repos to enable the GitHub Actions trigger, plus a change to our shared workflows repo to make it a first-class slash command. The conceptual work was harder: deciding that "two agent reviews and a human glance" is now an acceptable pre-merge state for routine changes, and reserving real human attention for the changes that genuinely need it. We're not all the way there. But the trajectory is clear, and I'd rather build the routing now than burn out catching up to it later.