Main navigation

  • Articles
  • Speaking

Two Months with Claude Code

Jan 6 2026 6 min read

In November a friend demoed something he’d built with Claude Code, 100% AI-generated. Despite using LLMs daily, I was caught off guard by the quality. It looked great and implemented something that would have been challenging to build myself, in a fraction of the time. I realized I’d been sleeping on this wave of agentic tools. I’ve been using Claude Code since, and wanted to catalogue what I’ve learned.

This is a snapshot in time, not gospel.

I’ve tried to approach the way a new developer might. A career’s worth of habits can be a liability when tools change this fast.

Caging the Machine

“Claude Code running in this mode genuinely feels like a completely different product from regular, default Claude Code.” — Simon Willison on YOLO mode

He’s right. But these tools can also destroy things. So the first thing I did was look into containerizing it.

Anthropic includes a Dockerfile for Claude Code. I forked it into a script called clankercage that spins up Claude Code in YOLO with only the current folder mounted and network access locked down. It can only destroy what’s in that folder, and it can’t exfiltrate anything.

In practice, this approach has too much friction:

  • Claude often needs network access for legitimate purposes. I added a dynamic whitelist, but since Claude can run it, it defeats the purpose.
  • The mounted filesystem differs from my native one. Claude would misinterpret pasted commands, or node modules and uv environments would drift out of sync.
  • I’d often need access to folders outside the Docker root, which means running from a shared ancestor, which, again, defeats the purpose.

These frictions added up. I’ve moved away from containerization and now run Claude Code natively on a remote machine with everything version controlled; not bulletproof, but the best mitigation I’ve found.

One thing I want to keep from this experiment: a dedicated Claude profile with its own SSH keys and Github account. At a glance I can tell what’s Claude’s work vs mine. I love this.

Taking Notes

I use Claude Code to capture notes, plans, and decisions as markdown files in a notes/ folder.

I don’t want to commit these files because a lot of the text is transient, throwaway, or slop. But they’re still important to track. I had my own version of catastrophic data loss in December when I asked Claude to “clean up” a folder and it deleted a month’s worth of notes. I hadn’t realized how valuable they were until they were gone.

I wanted something like infinite undo, or a git repo that auto-committed on every change. Claude helped me set up a btrfs file system with once-a-minute snapshots. Lightweight, cheap to store, easy to restore.

Notes stay colocated with projects, backed up but not committed. What’s missing: semantic understanding of what a change is.

Chief of Staff

I have a “Chief of Staff” agent: a folder with claude.md and some skills to help organize my projects.

Most Claude Code users are probably familiar with plan vs execute mode, but this serves as a sort of “meta” planning mode across projects.

It helped me categorize my projects into buckets, which surfaced shared primitives I could abstract.

An example: I’ve explored a tool that monitors a subreddit for relevant posts, an email classifier, a writing assistant, an RSS reader. What these all share is a need for “preference learning”: some way to encode my preferences that teaches the system.

Coding Style

I started out as an incredibly pedantic reviewer with an incredibly pedantic workflow: an issue for each feature, a PR for each issue, detailed comments on every minor thing. I would never be this pedantic with a human colleague. I was also strict about which technologies Claude could use.

Over time I’ve loosened the reins in two ways.

Technology Choice

I’m letting Claude gravitate towards technologies it knows best. React instead of web components, for example, even though I’d prefer the latter.

It would make sense that LLMs perform best where training data is richest. I don’t have hard evidence to back this up, but it feels true. And fighting it feels pointless.

Review Cycle

The pedantic cycle costs a lot of time and tokens. Reviewing a PR might take hours of back and forth. Claude isn’t good at extrapolating: if I ask it to separate 3 functions into files, it won’t apply that to similar code elsewhere in the same PR.

Is it necessary? I want my code a certain way, but with a robust test suite, who cares what’s on the inside?

This points to preference learning, or evals. More on that later.

Not Just Code

I run Librechat locally: Anthropic, Google, and OpenAI in a single chat interface. I’m pretty comfortable throwing a lot of my personal life at an LLM. But I’m surprised by how much more I’m throwing at it. I was wary of uploading personal information, but the benefits are so immense I couldn’t help myself.

The trend here is clear: LLMs matching or exceeding human performance. It’s not just code. I’ve used it for tax strategy, legal fine print, financial planning. Research that would take hours and cost hundreds in professional fees now takes minutes. Nothing seems immune.

And Claude Code specifically is a step up from how I was using LLMs before. More powerful models combined with tool use and file system access fit into my workflow more seamlessly than a web UI ever could.

A Loosening of the Cages

This is meant to be a snapshot in time. I’m far from the promised land of knowing how to work effectively with agents. But I can draw some conclusions:

One, don’t swim upstream. I started out exerting a high degree of control that loosened over time. Instead of running Claude in a container, I embraced undo. Instead of being a pedantic reviewer, I let it do its thing. Give it rope, but protect yourself with good rollback.

Two, my personal ambition has increased significantly. The range of projects I am considering has ballooned beyond what would have been reasonable last year, and it’s because projects that would’ve otherwise taken 2 weeks now take a day.

Three, interact to learn. By using Claude Code heavily I’m learning how to work with agents, but also finding that a tighter interactive loop helps when designing software. More akin to play. More akin to a REPL.

Four, LLMs are matching or exceeding human performance. Others are saying this and I’d dismiss it as hyperbole if I wasn’t seeing the same thing firsthand. If you still believe LLMs are incapable of doing X, where X is some large component of your job, take the latest generation out for a spin.

I don’t know what being a developer means anymore. I’m figuring it out in real time.