We hit the monthly Claude cap again, and the conversation that followed was more interesting than the cap. The bill is still small relative to other line items, but it's a meaningful fraction of our non-prod GCP spend now, so the questions are starting to matter. A few we don't have answers to yet. Per-seat with overages, or move high-volume work to direct API consumption for better visibility? Should overages be equitable across the team, or should the people pushing hardest get more headroom by default? Are low caps actually a feature, in that they force a conversation about how someone's using their tokens? How do we get any real visibility into what's driving consumption — right now we mostly can't see it. And the one I keep coming back to: relative to the productivity gains we just earned, should we really be tight on costs at all? I don't think the answer is the same for any two of those. But it's worth being explicit that we're choosing, not optimizing.
I built a `mabl debug` command suite for investigating test failures, and the user isn't a human — it's an AI agent picking through a failure. That changes the design in ways I didn't expect. Pretty terminal output is wasted. The agent wants structured JSON and classification up front so it doesn't have to read 10K tokens to guess at root cause. Large artifacts (HAR captures, DOM snapshots, screenshots) go to disk so they don't blow the context window — the CLI prints a path, not the contents. The biggest shift was realizing the CLI should ship its own skill: an `install-skill` subcommand drops a Markdown tutorial into the agent's workspace so it learns how to use the tool from the tool itself. No docs site to find, no examples to dig up. The CLI is the tutorial. The lesson: when the consumer is an agent, the highest-leverage work is the analysis the agent can't do quickly itself — classification, fingerprinting, deployment correlation. A human debugging skims and pattern-matches. An agent needs you to do the pattern-match upstream and hand it the conclusion.
Our default Anthropic seats include $150 of overage per month per premium seat. We're way past that for a lot of people, and we're going to be further past it next quarter. The real cost is the extra usage, not the seat — you should think of the seats as a promotional teaser and overage as the true cost. So we're treating it like a portfolio. We pre-purchase 1,000 credits at a 30% discount. We're looking at moving high-volume workloads to direct API consumption for better visibility. By the end of the year I doubt any of us will sit within seat allocation for full-time work — at least not without giving up the productivity gains we just earned. The companies that figure out the cost structure of agentic work as a separate discipline from "give engineers more tools" are going to have a real edge. The ones that don't are going to be surprised by their bill.