Harness engineering


Hey Reader,

An interesting thought is popping up in conversations around AI agents: the environment around the thing matters more than the thing itself. Last week I read about Harness engineering and it felt very familiar. As if it was tapping into instincts I already had.

If you’ve ever debugged a flaky test only to find the problem was in the setup, not the assertion, that instinct will feel natural. But at the same time it also feels like an unfamiliar territory. It borrows the same principle, but the challenges are different: managing context windows, orchestrating tool access, building guardrails for non-deterministic outputs.

This week I’m exploring this new discipline, and why the mindset quality engineers already have might be a real head start.

The model is almost irrelevant

​Rohit’s article on harness engineering puts forward a compelling argument: the harness, meaning the complete designed environment around a language model (its tools, context management, guardrails, scaffolding), is what separates teams shipping real software from teams fighting their agent pipelines.

What Rohit wrote makes a lot of sense to me. I’ve spent years building test environments where the goal wasn’t to make the tests smarter, but to make the environment around them so well-structured that even simple tests could produce reliable results. A good test harness handles setup, teardown, data isolation, and retry logic so that the actual test code can stay clean and focused. What Rohit describes is the same principle applied to AI agents. The model is the test. The harness is everything else. And “everything else” is where the real engineering happens.

Your CLAUDE.md is probably being ignored (and how to fix it)

Speaking of environments that shape agent behavior, Dex Horthy wrote a great article about how Claude Code handles your CLAUDE.md file. Claude wraps your CLAUDE.md contents in a tag that explicitly tells the model the contents “may or may not be relevant.” (I didn’t know that!) The longer your file gets, the more Claude treats individual sections as optional.

The fix is surprisingly elegant: wrap domain-specific sections in tags. Something like followed by your testing conventions.

Speaking of things I didn’t know, Lydia Hallie from Anthropic shared a cool tip: dynamic skills, where you embed executable commands directly in SKILL.md files that Claude runs at invocation time. It’s the same thing as using “!” in your Claude Code.

I also enjoyed these 50 Claude Code tips from Vishwas. One tip that’s not in the list but I randomly found, is setting up the attribution for your commits - if you ever feel like “Co-Authored-By Claude Sonnet 4.6” is making you look less cool 😃

The IDE might not be where the work happens anymore

It seems like one of the hot debate of last week was about usage of IDEs. It started with Andrej Karpathy’s tweet and rippled through the internet. Brace Sproul argues that, your IDE is actively slowing you down. His case for agent-first workflows is that we should be opening our editors less, not more. He points to LangChain’s Open SWE as an example of what happens when you let agents handle the mechanical parts of development while humans focus on direction and review.

​A similar sentiment was shared by Theo, who went through a whole history of IDEs and where they are now. I recommend watching it, as he shares some great points on how coding has changed from single workspace coding environments to multi-workspace agent-first environments.

What’s been keeping me busy

On a personal note, I was part of a panel discussion in Slovak. We discussed how many of the rules we’ve been teaching for years might no longer apply in testing anymore.

I also did a webinar with Tricentis on effective AI workflows for quality engineers, which touches on many of the themes from this newsletter, specifically how to build the right environment for AI-assisted testing rather than just hoping the model figures it out.

As I said in the intro, designing the environment around a model matters more than the model itself. Most of what I read this week seems to confirm that. It should also make sense to quality engineers out there - it is the same instinct that makes quality engineers obsess over test infrastructure.

“Harness engineering” isn’t just test setup with a new name. It’s a new discipline with its own challenges. The good news is that the mindset transfers. The instinct to treat the environment as a first-class engineering problem puts you ahead of most. The unfamiliar part is everything else - and that’s where the interesting work is. I’d love to hear how you’re approaching it.

Filip Hric

Sign up for weekly tips on testing, development, and everything related. Unsubscribe anytime you feel like you had enough 😊

Read more from Filip Hric

Hello Reader, AI can generate code, arguably with a pretty decent quality. That’s not news anymore. The question that’s been forming in my head all week is different: how do we decide what should go into production? Writing code is not the hard part (arguably, it never was). The hard part is making sure the right code ships and the wrong code doesn’t. And right now, that selection problem is becoming the defining challenge of AI-assisted development. Last week has definitely showed this....

Hey Reader, If you’re reading this, chances are you care about quality. Coming from QA, I never stop looking at the apps I build and systems I use through the lens of quality. Now, with more and more code being written by AI, this question matters more than ever. Many people wonder whether AI is even capable of delivering quality. I think it is. Though it’s worth remembering that quality is multidimensional. You can always have more or less of it. When it comes to AI and specifically LLMs,...

Hey Reader, It’s been a while since I’ve issued a newsletter, but I’m hoping to get back on track. There are just so many interesting things happening in the world of IT that I want to share with you all. But I’ve decided to change my approach to writing this newsletter a little bit. In the past, my main goal was to come up with some idea or some thought and basically write a post. This put a lot of pressure on me and I never wanted to force myself into writing a newsletter when I had nothing...