How I ship code I can't read

A lone figure on a cliff dropping a page into a vast net spanning the abyss

When I joined Lindy's engineering team this spring, I was pretty sure I'd drown.

I came straight out of running community. No CS degree, no years of TypeScript behind me. And the people around me are some of the best engineers I've ever met. By the San Francisco startup bar, where the average is already high, we have genuine A-players on the team, which is insane to be surrounded by every day. They hold a whole system in their head and spot the bug before they finish reading the function. I read code slowly. When I do read it carefully, I still miss things.

I want to be precise about what I'm bad at, because it matters for the rest of this. I don't think I'm a bad engineer. I think I'm a pretty good one. What I'm slow at is reading and understanding code, the specific act of looking at a diff and knowing what it does and what it'll break. Those are different skills, and most of this post is about building a workflow around the one I lack.

Music is the closest thing I know to it. I learned to read sheet music as a kid and then lost it. I can't sight-read a staff anymore. But I still play guitar, and I read tabs perfectly. Tabs are the stripped-down version, just where to put your fingers and when. The notation went and the playing stayed. You can make real music off the simplified version, as long as something faithful is feeding you the part that matters.

For a while I felt like a fraud. I was opening pull requests every day, but I didn't feel like a real engineer, because I wasn't really the one writing the code, and I couldn't always tell you exactly why it worked.

A small figure dwarfed beneath a towering cliff of glyphs, lit by a single lantern

Months in, I'm still here, shipping growth features alongside them. Not because I caught up on their terms. I didn't, and probably won't. I built a workflow that handles the parts I'm bad at, so the part I'm bad at stops being the thing that holds me up.

Here's what one growth experiment actually looks like going through that workflow.

The change is the easy part

Say I want to test a new cancellation flow. The idea comes from the growth side of my setup, the part that watches our channels and proposes experiments, which is its own post. An agent and I write the change together. The model writes most of the TypeScript, I steer, we get it green locally.

Then the actual work starts. Not writing the change. Getting it into a state where a busy senior engineer can look at it and understand it quickly, because their time is the scarce thing here, not mine.

Most of that work lives in a handful of skills I built. Each one is a folder of instructions the agent runs the same way every time.

~/.claude/skills/
├── pre-pr-review/    # adversarially review the diff before anyone sees it
├── pr-description/   # write the PR so a reviewer gets it in 30s
├── babysit/          # walk one PR from opened to green, fix review findings
├── pr-status/        # the whole board: CI, reviews, comments
├── pm/               # translate the work for non-engineers, and for me
└── prc/              # review other people's PRs in my voice

The change runs the same gauntlet every time:

write the change

▼

/pre-pr-review

▼

/pr-description

▼

/babysit

▼

/pr-status

▼

human review

▼

shipped

One PR, every time. The skills handle everything up to the human, and a lot of what used to need one.

/pre-pr-review: the gate I can't skip

This is the one that makes the rest safe, so I'll start here.

A PR can't reach a human until it clears /pre-pr-review. When the review passes it drops a marker file, and the command that opens the PR won't run without it. Past-me built a gate present-me can't skip on a tired Friday.

It runs cheap checks first: lint, types, breaking-contract scans, the convention checks the team has built up from past mistakes. Then the part that matters. It sends the diff to review agents in parallel, two of them a different model in a different harness. I'm in Claude Code, so it runs Codex for the correctness and pattern pass.

The different model is the whole point. The model that wrote the code is confidently wrong about the exact things it got wrong, and a same-model reviewer nods along. A different one doesn't share the blind spot, so it pushes back. That disagreement is where the bugs fall out.

One of mine that set up throwaway test accounts looked clean to me. The Codex pass didn't agree:

Round 1 caught three findings:
- a placeholder ID that broke the billing fallback (high)
- a reset that left orphaned sub-records behind (medium)
- a non-transactional create that could orphan a workspace on a race (medium)
All fixed. Round 2 caught a duplicate-key race. Fixed.

I couldn't have caught those by reading the diff. The review did, and I understood each one once it was pointed at me. Different skill from spotting them cold.

It compounds, too. When something slips past and a human or our cloud reviewer catches it later, that miss becomes a written learning the review reads on every run. I'm not getting better at reading code. The checklist is, and it doesn't forget.

/pr-description: making the change legible

We use Graphite internally, and one quirk of it is that when you submit a stack or a PR, it doesn't take a description up front. You write it after the PR is already open. So the default path is: the agent opens the PR, and either there's no description, or the agent backfills one on its own. Left alone, an agent writes a very thorough description of what it changed, file by file, that still somehow doesn't tell a reviewer the thing they need. It's all "what" and no "why." Accurate, dense, and useless for deciding whether to approve:

Refactored CancellationModal to read trialDaysRemaining from billing
context. Added cancel-reminder feature flag (boolean, default false).
Updated useBilling hook to expose trialDaysRemaining. Touched 6 files
across apps/web and packages/core. Added 2 unit tests. Build green.

A reviewer reads that and still has no idea why this PR exists or what they're protecting when they approve it. /pr-description writes mine in a house style I worked out from reading good PRs on the team: a tight What, a Why that leads with user impact, then How with the technical detail, and the verification at the bottom:

## What
Users who hit "cancel" on a paid plan now see a one-line reminder of how
many trial days they have left before they actually lose access.

## Why
A chunk of cancels look low-intent, people forgetting they still have
runway. This tests whether one honest reminder, no dark pattern, keeps
some of them. GROW ticket linked.

## How
- New copy on CancellationModal, behind `cancel-reminder` flag (50/50).
- Copy is sitevar-driven so we can tune wording without a deploy.
- Billing behaviour unchanged; the hook just exposes `trialDaysRemaining`,
  which the modal reads.

## Verification
- Unit tests for the day-count math (0 days, 1 day, expired).
- Local smoke: both flag arms render, reminder only shows with days left.
- adversarial-review (Codex): approve, no findings.

A reviewer should understand why this change exists, and how to trust it, before they read a single line of the diff. Community Marvin spent two years learning how to make someone care about something in the first sentence. That skill transferred straight to PR descriptions.

/babysit: walking it to green

A PR is never done when you open it. CI flakes. A check fails for a reason that has nothing to do with your change. A linter trips.

> $ /babysit
→ opened PR #19594, watching 7 checks
✓ lint  ✓ typecheck  ✓ unit
✗ e2e (flaky: known timeout in unrelated suite) → retrying
✓ e2e  passed on retry
✓ all green. ready for review.

/babysit is per-PR shipping discipline. It opens the PR safely, watches CI, tells real failures apart from flaky ones, and fixes the obvious stuff itself. It walks a single PR, or a whole Graphite stack, from "opened" to "actually green and ready."

But CI green is only half of it. We have a Cloud Review bot on every PR by default, and it's good. It regularly flags something none of the earlier passes caught. The old way to handle that was: read the comment, go back into the branch, make the fix, push, repeat for each finding. A weird, fiddly game. So /babysit also watches the PR's comments. When the Cloud Review leaves findings, it pulls them in, pressure-tests each one rather than blindly obeying, and if the finding is valid, implements the fix and pushes, without me sitting on it.

Same for human reviews. If a teammate leaves comments while I'm babysitting the PR, the agent does the same thing: reads the finding, checks whether it actually holds, and if it does, makes the change. I still see what happened. I'm just not the bottleneck between "good reviewer caught something" and "it's fixed."

/pr-status: seeing the whole board

When I have a few PRs in flight, "what's blocking me right now" gets expensive to answer by clicking through GitHub.

> $ /pr-statusPR       branch             CI        review          comments     next
───────────────────────────────────────────────────────────────────────
#19594   cancel-reminder    ✓ green   ● 1 requested   2 open       ping reviewer
#19596   parameterize-copy  ✓ green   ✓ approved      0 open       merge
#19597   sitevar-cleanup    ✗ failing  pending        1 cloud      fix billing.ts
#19601   trial-banner-copy  ◐ 4/7      pending        0 open       wait on CIfailing: #19597 — type error in billing.ts (real, not flake)
unanswered: #19594 — 2 Cloud Review comments, both look valid
approved + clean: #19596 — safe to merge now

/pr-status gives me the whole board at once: CI broken down by check, who's been requested and who's approved, how many comments are still open and whether they're from a human or the Cloud Review, and the next move for each one. It's the difference between feeling busy and knowing exactly where the next thirty minutes should go.

/pm: getting the non-engineers to yes

Half of shipping a growth experiment is not engineering at all. It's getting the people who care about the number, not the diff, to sign off.

/pm translates the technical work into plain language at whatever depth I need. A Slack-pasteable message. A one-pager. A summary of an RFC. I paste in a diff or a branch and get back something a head of growth reads in thirty seconds, without me booking a meeting to explain it.

But honestly, the person it helps most is me. I haven't spent twenty years speaking software engineering. So when an agent hands me a wall of dense explanation, or stops to ask me to make a call I don't fully understand, I need it broken down into terms I actually follow before I can decide anything. I noticed I was typing some version of "explain that to me simply, like I haven't been doing this for twenty years" over and over. So I built it into a skill. And because my agents have memory now, they reach for it on their own more often than not, explaining their own work back to me in plain language without my having to ask. The skill I built to translate for other people turned into the one that lets me keep up with my own.

/prc: reviewing other people's code

Here's the part I underrated at first. If I want my PRs reviewed, I have to review other people's. Review is a debt you pay into the team, and I owe more of it than most because people keep unblocking me.

/prc draft comment

/prc draft commentloading…

curious why we read the flag inside the loop here, won't this eval it on every iteration? could hoist it above the map if so

/prc is for commenting on PRs I don't own, in my actual voice. Lowercase, curious, two lines, draft first so I always see it before it posts. It reads the diff, finds the real issues, and drafts comments that sound like me asking a genuine question, not a bot leaving "consider extracting this."

[→]

tip

Agents made my pull requests cheap to produce and cheap to get ready. They did not make them cheap for a human to review. That gap is the thing the whole team is wrestling with right now.

The part nobody has solved

Review is still the bottleneck, mine included.

One weary figure at a candlelit desk beneath an avalanche of scrolls, a glowing queue snaking into a cathedral

I can get a PR to shippable quickly now. But a human still has to read it, and human attention does not scale the way agent output does. You can feel it on the team. The queue of open PRs grows faster than the queue of reviews, and I am part of the reason the queue grows.

The honest response has been interesting. We now have reviewer leaderboards internally, to make something explicit: reviewing other people's code is a core part of an engineer's job right now, maybe the most important part. The scarce resource these days is people willing to carefully read code. Some weeks the most useful thing I do is clear someone else's PR so they don't sit blocked.

I don't think that stays true forever. The review layer will get good agents too. But today the constraint is review, and pretending my speed at producing PRs is the whole story would be missing where the real work is.

Why I trust any of this to ship

A fair question: if I can't really read the code, how do I know the change is safe?

I don't, on my own. The system does. Unit tests catch the dumb stuff. Local smoke tests catch the "did this whole flow even run" stuff. The end-to-end suite in our repo catches the "works in isolation, breaks the real path" stuff. On top of that, offline and online evals tell me whether the thing behaves the way it's supposed to once real users touch it.

Work moving through the system, phase by phase. Some moments it's a burst, some a trickle, but it always runs the same track. That repeatability is what I lean on, not my own eyes on the diff.

No single one of those would make me comfortable shipping a change I didn't write line by line. Together they build enough trust that I can. What I lean on is that whole net, not my own eyes on the diff.

What actually moved

I came in expecting to spend a year catching up to the engineers on their terms, learning to read code as fast as they do.

That was the wrong goal. The leverage moved. It's less about "can you write the code" than it used to be, and increasingly less about "can you read every line" too. More of it is "can you turn an intention into a change, make it legible, get it reviewed, and trust the system to catch you when you're wrong, then get it in front of real users, measure what it did, and iterate on what you learn." Shipping is where the loop starts. What decides whether an experiment was worth it happens after the merge, in what the numbers say and what you do about them.

That's a game someone from community can play. It's the reason I can keep up with people who read code far faster than I ever will. The best of them are already moving the same way, building workflows around the parts that used to be slow.