Main navigation

  • Articles
  • Speaking

Two Months with Claude Code

Jan 20 2026 6 min read

In November a friend demoed something he’d built with Claude Code, 100% AI-generated. Despite using LLMs daily, I was caught off guard by the quality. It looked great and implemented something that would have been challenging to build myself, in a fraction of the time. I realized I’d been sleeping on this wave of agentic tools. I’ve been using Claude Code since, and wanted to catalogue what I’ve learned.

This is a snapshot in time, not gospel.

I’ve tried to approach the way a new developer might. A career’s worth of habits can be a liability when tools change this fast.

Caging the Machine

“Claude Code running in this mode genuinely feels like a completely different product from regular, default Claude Code.” — Simon Willison on YOLO mode

He’s right. But these tools can also destroy things. So the first thing I did was look into containerizing it.

Anthropic includes a Dockerfile for Claude Code. I forked it into a script called robotcage that spins up Claude Code in YOLO with only the current folder mounted and network access locked down. It can only destroy what’s in that folder, and it can’t exfiltrate anything.

In practice, this approach has too much friction:

  • Claude often needs network access for legitimate purposes. I added a dynamic whitelist, but since Claude can run it, it defeats the purpose.
  • The mounted filesystem differs from my native one. Claude would misinterpret pasted commands, or node modules and uv environments would drift out of sync.
  • I’d often need access to folders outside the Docker root, which means running from a shared ancestor, which, again, defeats the purpose.

These frictions added up. I’ve moved away from containerization and now run Claude Code natively on a remote machine with everything version controlled; not bulletproof, but the best mitigation I’ve found.

One thing I want to keep from this experiment: a dedicated Claude profile with its own SSH keys and Github account. Git blame tells me instantly whether I wrote something or Claude did. When I’m reviewing old code and wondering why something is the way it is, I instantly know: if it’s Claude, I scrutinize differently. PRs show authorship the same way. It also separates credentials. Claude’s SSH keys can be revoked without touching mine. If something goes wrong, the blast radius is contained.

Embracing Undo

The common thread across the next few sections: I stopped trying to prevent damage and started embracing recovery. Assume some damage will happen and make it cheap to undo.

Notes as Artifacts

I use Claude Code to capture notes, plans, and decisions as markdown files in a notes/ folder.

I don’t want to commit these files because a lot of the text is transient, throwaway, or slop. But they’re still important to track. I had my own version of catastrophic data loss in December when I asked Claude to “clean up” a folder and it deleted a month’s worth of notes. I hadn’t realized how valuable they were until they were gone.

I wanted something like infinite undo, or a git repo that auto-committed on every change. Claude helped me set up a btrfs file system with once-a-minute snapshots. Lightweight, cheap to store, easy to restore.

Notes stay colocated with projects, backed up but not committed. What’s missing: semantic understanding of what a change is.

Chief of Staff

I have a “Chief of Staff” agent: a folder with claude.md and some skills to help organize my projects.

Most Claude Code users are probably familiar with plan vs execute mode, but this serves as a sort of “meta” planning mode across projects.

The value here is cross-project synthesis. It surfaces patterns across projects that I’ve missed; it also provides a single interface to pick off tasks and plan work across projects.

Coding Style

I started out as an incredibly pedantic reviewer with an incredibly pedantic workflow: an issue for each feature, a PR for each issue, detailed comments on every minor thing. I would never be this pedantic with a human colleague. I was also strict about which technologies Claude could use.

Over time I’ve loosened the reins in two ways.

Technology Choice

I’m letting Claude gravitate towards technologies it knows best. React instead of web components, for example, even though I’d prefer the latter.

My hypothesis: LLMs perform best where training data is richest. I don’t have evidence. It feels true. When Claude struggles with a technology I prefer, I ask whether the battle is worth fighting. Usually no.

Review Cycle

The pedantic cycle costs a lot of time and tokens. Reviewing a PR might take hours of back and forth. Claude isn’t good at extrapolating: if I ask it to separate 3 functions into files, it won’t apply that to similar code elsewhere in the same PR.

Is it necessary? I want my code a certain way, but with a robust test suite, who cares what’s on the inside?

This points to preference learning, or evals. More on that later.

Not Just Code

I run Librechat locally: Anthropic, Google, and OpenAI in a single chat interface. I’m pretty comfortable throwing a lot of my personal life at an LLM. But I’m surprised by how much more I’m throwing at it. I was wary of uploading personal information, but the benefits are so immense I couldn’t help myself.

The trend here is clear: LLMs are surprisingly capable outside of code. I’ve used it for tax strategy, legal fine print, financial planning. Research that would take hours and cost hundreds in professional fees now takes minutes.

And Claude Code specifically is a step up from how I was using LLMs before. More powerful models combined with tool use and file system access fit into my workflow more seamlessly than a web UI ever could.

A Loosening of the Cages

This is meant to be a snapshot in time. I’m far from the promised land of knowing how to work effectively with agents. But I can draw some conclusions:

One, don’t swim upstream. I started out exerting a high degree of control that loosened over time. Instead of running Claude in a container, I embraced undo. Instead of being a pedantic reviewer, I let it do its thing. Give it rope, but protect yourself with good rollback.

Two, my personal ambition has increased significantly. The range of projects I am considering has ballooned beyond what would have been reasonable last year, and it’s because projects that would’ve otherwise taken 2 weeks now take a day.

Three, interact to learn. By using Claude Code heavily I’m learning how to work with agents, but also finding that a tighter interactive loop helps when designing software. More akin to play. More akin to a REPL.

Four, the capability threshold has shifted. If you still believe LLMs are incapable of doing X, where X is some large component of your job, take the latest generation out for a spin. My priors from six months ago are already stale.

Further Reading

Others are reaching similar conclusions:

Thoughts on Claude Code (Slava Akhmechet) — Built a programming language in two weeks; the most detailed practitioner account I’ve found.

When AI Writes Almost All Code (Gergely Orosz) — Frames recent models as crossing a capability threshold.

Opus 4.5 is not the normal AI agent experience (Hacker News) — Practitioner thread on mature Claude setups: custom skills, code review agents, automated workflows.

I don’t know what being a developer means anymore. I’m figuring it out in real time.