Dario Kondratiuk

sre + infra

I can't keep a chat window open for every PR, so I wrote pr-memory

I had a handful of PRs open at the same time last week, and the bottleneck wasn't the code — it was me remembering what each session was doing by the time I came back to it. You can't keep a chat window open for every PR. So I wrote a small skill: save dumps the current session to a file keyed by a pull request, and load brings it back when I return to that PR. If one piece of work spans three repos, it's stored once and I can reload it by any of the three PR numbers. When every PR in a memory gets merged, it quietly deletes itself in the background — I'm not going to garbage-collect my own notes. The whole store is local and gitignored, because this is my mess to keep, not the team's.

Reviewing agent diffs in hunk so my comments land where the agent reads

I plan a change with the agent, it opens a PR, and then I do what everyone does: go to GitHub to actually read the diff. GitHub is still the best place to read code. The problem is the round trip. I'd leave a comment there, switch back to the terminal, and then… what? Tell the agent to go read my comment on line 40? Paste it back myself? That's not reviewing, that's being a courier for my own feedback.

So I started using Hunk (not Hulk — Hunk: https://github.com/modem-dev/hunk). It opens a diff viewer that the agent and I are both looking at. I leave inline comments and the agent reads them right where I left them. No copy-paste, no relaying through chat.

The part I didn't expect: if a Hunk session is open, mauro-reviewer drops its review notes straight into it instead of the chat. So now I don't have to leave the diff to talk to the agent about the diff.

When a build fails, an agent reads the GitHub logs and runs /mabl-debug for you

A build fails. Now what? You go to GitHub, stare at a wall of actions, hunt for the red one, open it, and dig around to figure out what actually happened. And if it turns out to be a mabl deployment, you click the link and start all over again inside mabl. Forget about it. Nobody wants to do that.

So I made it stop. I built a github-build-explorer agent that runs on Haiku — cheap and fast, exactly right for digging through logs. It finds the red build and tells you what broke. And if it's a mabl deployment, /mabl-debug kicks in: it pulls the deployment, the failure analysis, the recovery sessions, goes hunting for the cause in the code, and reproduces the failure in a real browser with our new local debugger for agents.

It gets better. You can ask Claude to open the PR, wait for the red, and fix it — all on its own. And the best part: before it pushes anything back, the debugger re-runs the test to confirm it actually fixed the issue. Not "probably fixed." Fixed.

A CLAUDE.md rule that turns "let's try X" into "are you sure?"

I noticed Claude was building a lot of things I wasn't sure I wanted built yet. The pattern was always the same: I'd type something like "can we make X do Y?" and come back to a plan, a diff, and sometimes a PR description. Which is fine when "can we" really means "do it" — and a problem when I was still figuring out if it was the right move.

So I added a rule to my CLAUDE.md. If a prompt sounds like a suggestion or an open question — "let's do X", "what if we tried Y?", "should we Z?" — Claude has to push back first. Is this the right problem to solve? Is there a simpler approach? What's the downside? Only after I confirm does the model get to write code. Imperative phrasing — "add X", "fix Y", "implement Z" — still goes straight to work.

The first time it triggered I almost edited the rule back out, because the critique was annoying. Then I realized that was the point. I'd been paying for code I hadn't really asked for.

I had 20 worktrees and no idea what was in terminal five

Many of us have been struggling with rate limits lately. I spent part of last weekend thinking about why, and realized something embarrassing: I was using the agent to fix merge conflicts and bump a CLI version into the execution engine. That's not what the agent is for. The honest reason I kept doing it — I had roughly 20 worktrees open and couldn't tell you where the code in terminal five actually lived.

I scaled back to four. Named wt1 through wt4, multi-purpose — I decide what each one is for. They share the same color across my terminal, Chrome tab groups, VSCode, and Finder. Each tab gets a Planner session for feature design, a Terminal for deterministic tasks, and one panel per repo.

I know sub-agents could do something similar. But this is more transparent to me — and a CLI task inside a UI session eats context window, while a UI session in a CLI shell doesn't have the right skills loaded. First day. Maybe it helps someone else too.

Codifying our best human reviewer's habits into a code-review subagent.

I built a code-review subagent and named it after Mauro. This is not a joke about Mauro — Mauro really is our best reviewer. He reads the code with his eyeballs. He suggests an enum every time he sees three magic numbers in a row. He reads the strings inside the code, notices when "Error fetching MauroAgent data" should have been a template literal, and tells you. He won't accept eslint-disable-next-line without a reason. When he sees a prompt he can't follow, he says "if I can't understand it, the LLM won't either."

I wrote those rules down. That was the agent. It took an afternoon.

Mauro's reaction was that I replaced him because I got tired of waiting for his reviews. That part is also true. The interesting thing is how little of his review style I had to invent — most of it was already a list of habits he applies in the same order to every PR. The reviewers we trust most are the ones whose taste is the most legible. Turns out legible taste compiles.

When to write a script instead of letting the agent reason: a shared scripts directory for cross-repo ops.

I keep watching Claude reinvent the same shell pipeline three different ways across sessions. Routine cross-repo operations like dependency bumps are the canonical example: an engineer asks Claude to do it, and Claude figures out a slightly different approach each time — usually right, sometimes wrong, always slow.

What I've been pushing for is a shared scripts directory for the things that are deterministic. Bump a CLI version into a downstream repo? Script. Generate a new connector skeleton? Script. Snapshot a runner config? Script. When the work has a known shape, the agent shouldn't be reasoning it out — it should be calling the script. We pay for the agent's reasoning when we need reasoning. We shouldn't pay for it when we just need the right command in the right order. The cleaner the line we draw between "this is a deterministic operation" and "this needs the model," the better the system gets at both.