When the Codex review catches blocking issues the Claude review missed: is it the model or the second pass?
I've been watching what our two PR reviewers actually catch, and the pattern is uncomfortable. Our default review runs on Claude. Anyone can also kick off a Codex review on demand — and when they do, Codex regularly flags blocking issues the Claude pass walked right by. Not edge cases. Things that would have shipped.
The honest question: is Codex better at code review, or is the win mostly that it's a second perspective looking at the same diff? If we'd built it the other way around — Codex by default, Claude on demand — would we be writing this post about Claude?
I don't know yet, and I think that's the right place to sit for a minute before we draw the conclusion. What I do know is that on the changes that matter, two reviewers from two different shops are catching more than one of either, and that's a finding regardless of which is "better." We're going to keep both, and I'm going to start measuring which class of issue each one actually catches.
The honest question: is Codex better at code review, or is the win mostly that it's a second perspective looking at the same diff? If we'd built it the other way around — Codex by default, Claude on demand — would we be writing this post about Claude?
I don't know yet, and I think that's the right place to sit for a minute before we draw the conclusion. What I do know is that on the changes that matter, two reviewers from two different shops are catching more than one of either, and that's a finding regardless of which is "better." We're going to keep both, and I'm going to start measuring which class of issue each one actually catches.