Harness engineering

Hey Reader,

An interesting thought is popping up in conversations around AI agents: the environment around the thing matters more than the thing itself. Last week I read about Harness engineering and it felt very familiar. As if it was tapping into instincts I already had.

If you’ve ever debugged a flaky test only to find the problem was in the setup, not the assertion, that instinct will feel natural. But at the same time it also feels like an unfamiliar territory. It borrows the same principle, but the challenges are different: managing context windows, orchestrating tool access, building guardrails for non-deterministic outputs.

This week I’m exploring this new discipline, and why the mindset quality engineers already have might be a real head start.

The model is almost irrelevant

Rohit’s article on harness engineering puts forward a compelling argument: the harness, meaning the complete designed environment around a language model (its tools, context management, guardrails, scaffolding), is what separates teams shipping real software from teams fighting their agent pipelines.

What Rohit wrote makes a lot of sense to me. I’ve spent years building test environments where the goal wasn’t to make the tests smarter, but to make the environment around them so well-structured that even simple tests could produce reliable results. A good test harness handles setup, teardown, data isolation, and retry logic so that the actual test code can stay clean and focused. What Rohit describes is the same principle applied to AI agents. The model is the test. The harness is everything else. And “everything else” is where the real engineering happens.

Your CLAUDE.md is probably being ignored (and how to fix it)

Speaking of environments that shape agent behavior, Dex Horthy wrote a great article about how Claude Code handles your CLAUDE.md file. Claude wraps your CLAUDE.md contents in a tag that explicitly tells the model the contents “may or may not be relevant.” (I didn’t know that!) The longer your file gets, the more Claude treats individual sections as optional.

The fix is surprisingly elegant: wrap domain-specific sections in tags. Something like followed by your testing conventions.

Speaking of things I didn’t know, Lydia Hallie from Anthropic shared a cool tip: dynamic skills, where you embed executable commands directly in SKILL.md files that Claude runs at invocation time. It’s the same thing as using “!” in your Claude Code.

I also enjoyed these 50 Claude Code tips from Vishwas. One tip that’s not in the list but I randomly found, is setting up the attribution for your commits - if you ever feel like “Co-Authored-By Claude Sonnet 4.6” is making you look less cool 😃

The IDE might not be where the work happens anymore

It seems like one of the hot debate of last week was about usage of IDEs. It started with Andrej Karpathy’s tweet and rippled through the internet. Brace Sproul argues that, your IDE is actively slowing you down. His case for agent-first workflows is that we should be opening our editors less, not more. He points to LangChain’s Open SWE as an example of what happens when you let agents handle the mechanical parts of development while humans focus on direction and review.

A similar sentiment was shared by Theo, who went through a whole history of IDEs and where they are now. I recommend watching it, as he shares some great points on how coding has changed from single workspace coding environments to multi-workspace agent-first environments.

What’s been keeping me busy

On a personal note, I was part of a panel discussion in Slovak. We discussed how many of the rules we’ve been teaching for years might no longer apply in testing anymore.

I also did a webinar with Tricentis on effective AI workflows for quality engineers, which touches on many of the themes from this newsletter, specifically how to build the right environment for AI-assisted testing rather than just hoping the model figures it out.

As I said in the intro, designing the environment around a model matters more than the model itself. Most of what I read this week seems to confirm that. It should also make sense to quality engineers out there - it is the same instinct that makes quality engineers obsess over test infrastructure.

“Harness engineering” isn’t just test setup with a new name. It’s a new discipline with its own challenges. The good news is that the mindset transfers. The instinct to treat the environment as a first-class engineering problem puts you ahead of most. The unfamiliar part is everything else - and that’s where the interesting work is. I’d love to hear how you’re approaching it.

Filip Hric

Harness engineering

The model is almost irrelevant

Your CLAUDE.md is probably being ignored (and how to fix it)

The IDE might not be where the work happens anymore

What’s been keeping me busy

Too dangerous to release

Bad week for Anthropic

I’m joining Qodo