Joe Lust

release ops

Sorry Claude, Gunna Need You to Come in on Saturday

7 days in a week, 7 days in a token budget. Why is your agent at the beach on Saturday? Think of all the chunky tech debt projects nobody ever has time for. That's what agents are for.

I had an API with hundreds of endpoints, and I wanted to refactor every one of them to a more modern, robust, faster framework. Who has time to rotely refactor controllers and re-validate that _nothing_ broke? Claude does, with a /goal.

The whole thing is unlocked by tests we wrote years ago. Thousands of API-level validation tests and end-to-end suites for the web apps that consume the API — that's the feedback signal a /goal actually needs. "Get the suites green without changing the clients or the interfaces, only the server implementation." That's it. From there our CI does the rest: every PR spins up a deploy preview, fires the full cloud regression suite, and reports back. The agent runs permutations across branches in parallel and validates each one on its own.

While you were at the beach worrying about how much sand your kids would track into the car, Claude burned down a major chunk of the tech debt backlog. LFG.

Tracing the token burn through ubiquitous labels

You can't optimize what you can't measure. You've rolled out a myriad of recent AI features, but how do you analyze them in your illegibly dense cloud bill? The monthly cloud provider invoice is a tome, and the obvious question — who burned the tokens, and on what — isn't answerable unless you decided it was answerable months ago.

We did, mostly by habit. We've been labeling cloud resources since labels became a feature in GCP, and we've labeled enough of them over the years to find the edges of what their billing system will take. The payoff: a labeled thing becomes its own traceable line item on the bill. Nobody reads a bill that size by hand; BigQuery and agents do.

So we label every prompt with the details that matter: customer_id, agent_name, model_name, model_version. It's baked into our services by default. That's enough to attribute usage, cost, and tokens down to the feature, the customer, and the millisecond. GCP drops all this into BigQuery in real time with Billing Export, and now we just query it any way we fancy. Where did those billion tokens go? Got it. How much did that new prompt you shipped cost? $42, obviously.

Your agent is now a FinOps ninja. Buy it some cufflinks, and have it send along the billing report in the morning over coffee. Now time to get back to shipping features. LFG.

Eliminate the small stuff: automatic code styling FTW

Millions of PR comments a year get burned on style — tabs, spaces, imports, line breaks. That's a closed-form, solved problem. Why are you wasting your keystrokes and valuable context on style? We don't. No need to fill our CLAUDE.md files with 10 pages of format rules, no need for a new hire to spend a week learning our way to type. Don't fill your physical and virtual context windows with rules a CPU can apply.

So Spotless landed across our Java codebase this week, with a pre-commit hook and a CI check. I picked it specifically because it auto-fixes. Other tools will scold you about wildcard imports ("thou shalt not!"); Spotless will simply fix them, auto-magically. A tool that produces a report is the wrong pattern — at scale, we're shipping code, not reports. Calling "fix" is table stakes. The pre-commit hook means both human and agent operator styles are fixed before the PR opens. Neither has to grok the rules.

The goal isn't to sweat the small stuff. It's to eliminate it. I don't care how you use tabs or spaces. I care that your feature does what the customer needs. Leave the rest to the CPU. LFG.

Rockout to the stockout: 117,000 CI jobs in 30 days

Now that devs can readily integrate 10 PRs on a slow Monday, you'd better be serious about CI/CD (says the DevOps guy). My coworker just kicked off a CI job that used 3,000 cores. Did she bat an eyelash? Nah — it's $4, it'll get us some useful answers. Our compute provider hit a regional stockout (wasn't me) and we auto-routed around it. Our modest eng team ran 117,000 CI jobs in the last 30 days. About 4,000 jobs per contributor. All worth it when you've got a half-dozen agents coding, fixing, and validating on your behalf. Rockout to the stockout. Bits are cheap, light is fast, life is short. LFG.

Where we're going, we don't need IDEs

I haven't opened IntelliJ Ultimate in months — best tool, btw. I say this at conferences and people look at me in disbelief. You only need an IDE if you're reading or writing the code yourself. That's very 25Q4. My setup now: tricked-out tmux and eight Claude Code sessions running in parallel. The reason this works at all: years of investing in CI automation, linting, test coverage, and reviewer tooling. Those bets are paying off. Without that scaffolding, eight parallel agents would just be 8x the ways to break main. My job is approving the PRs, challenging the assumptions and the designs, keeping the agents honest. I'm here to spot the square wheels, catch the BS, avoid the foot guns, and keep this machination a cohesive whole. Type 2K lines yourself, then spend all day reviewing them? No, it's 2026, y'all. We've got tools for that. LFG.

It's time for Beast Mode: be uncomfortably motivated

I'm an efficiency addict. I eschew slowness. First tech job out of college in 2010, I brought my own pair of widescreen LCDs into the office because the standard 17" square was unworkable — facilities was annoyed. I bought 3x the RAM with my own cash and upgraded the machine; IT warned I might burn the building down. I used Cygwin and scp instead of CMD and drag-and-drop, and management called me "uncomfortably motivated." Sixteen years on, we have agents with whale-size brains running parallel jobs while we sleep. Hardware helps — 128GiB of RAM, four monitors, 2-gig fiber, 32 cores. But the rig is just one example. Buy your own gear if you have to. Install the better tool. Ignore the polite limit. This is the time to literally be a 100x engineer. LFG.

Recovering human data vacuum: scheduled agents on alerts, opex, logs, weather

I'm a recovering human data vacuum. There's never enough time to watch dashboards, sift the overnight 5xx spike, scroll service logs, eyeball opex, and then go build something. So I stopped doing it. I have scheduled Claude agents running against monitoring alerts, opex, service logs, and yes — today's weather (boots for my tot's?). They run on a cron, do the boring analysis, and only ping me if something is actually sliding sideways. Slack DMs and @mentions land in my client; email is for dinosaurs. The point is to protect my own context window — every minute I spend triaging a chart that turned out to be fine is a minute I'm not building the next thing. Agents are unreasonably good at the "skim a wall of telemetry, surface the one weird thing" job. So I let them. LFG.

Why our merge queue stops: a four-way race between matrix builds, runner names, GCP quota, and Pub/Sub.

Our productivity is up roughly 4x quarter over quarter. The thing I keep working on is making sure the build infrastructure can actually keep up. CLI builds intermittently failing on a datastore emulator issue. Self-hosted runners missing Docker. JDK downloads from Adoptium failing at random. The worst was a pernicious interplay between simultaneous matrix builds, GitHub's pre-registered runner names, GCP's 5-VM-per-call rate limit, and Pub/Sub retries — when all four collide, the merge queue stops moving. Our merge queue alone has cost us a day at a time when it breaks.

None of this work is glamorous. Cache config, runner audits, log diving, hunting down Pub/Sub retry semantics. But the math is straightforward: every minute the merge queue is broken is a minute the rest of the agentic pipeline is sitting idle. AI-native throughput needs reliable builds underneath it for the upstream gains to compound. Agents can write code as fast as they want — if main can't merge, none of it ships. So I keep investing here. It's the least visible work and the highest-leverage.