Quality in the age of A.I.


Hey Reader,

If you’re reading this, chances are you care about quality. Coming from QA, I never stop looking at the apps I build and systems I use through the lens of quality. Now, with more and more code being written by AI, this question matters more than ever. Many people wonder whether AI is even capable of delivering quality.

I think it is. Though it’s worth remembering that quality is multidimensional. You can always have more or less of it.

When it comes to AI and specifically LLMs, the output is non-deterministic, or as my friend Richard Bradshaw aptly put it - probabilistic.

The reason I believe we can shift that probability in our favor is that there are many tools at our disposal to help. So let’s talk about them.

Skills - just markdown files?

Chances are you heard about what skills are (if not, Debbie O’Brien made a great intro video on this). With a cynical outlook, you can just say they are markdown files with instructions that you append to your prompt. While partially true, it’s a rather narrow view. Once saved to your repository, they become instructions for everyone using the project, which can either enforce quality standards or point to specific project tools.

I also found them to be really useful for referencing documentation and examples. A favourite example of mine comes from this example from Kedasha, where she uses remotion skills for creating neat animations for her videos. I used this too in my latest video.

There’s a debate on whether the simplicity of skills will eventually remove the need for MCPs. Kent C. Dodds published a really thoughtful breakdown of the MCP versus skills. His take is that they’re not competing - they’re complementary. Skills are low-friction markdown files you drop into a project to teach an agent how to do something. MCP is about service interoperability, authentication, and distribution. You’d use a skill to say ā€œuse the Sentry MCP server to look up this issue,ā€ and MCP handles the actual connection. I think he’s right that the framing of ā€œone or the otherā€ misses the point entirely.

A quiet but important update has been made on skill-creator skill made by Anthropic. It now includes eval creation and benchmarking. In my previous newsletter I mentioned study that found that developer-written AGENTS.md files improved agent task completion by just 4% on average, while LLM-generated ones actually made things worse. I think this is a strong signal for quality engineering - unvalidated context is useless (or potentially harmful). Evals bring a very concrete way of validating, benchmarking and improving your skill files.

The code review is (not) the new bottleneck

swyx called code reviews the Final Boss of Agentic Engineering. The argument, laid out in detail in a guest post on Latent Space, is quite compelling. Teams with high AI adoption complete 21% more tasks and merge 98% more pull requests, but PR review time increases 91%. Two things are scaling exponentially - the number of changes and the size of changes. We cannot consume this much code manually. The proposed solution is to move the human checkpoint upstream: review specs, plans, and acceptance criteria instead of 500-line diffs. Code becomes an artifact of the spec, not the thing you review directly.

Itamar Friedman from Qodo AI pushed back slightly, arguing that reviews aren’t dead, they’re just the beginning of something bigger - AI code governance. I personally think like this take better. I think reviewing intent makes intuitive sense, it leaves a lot of room for unintended issues. The power of AI is in its multiplying effects. I imagine that multiple AI agents focusing on specific areas of quality (think one agent for security, other for compliance) could potentially do a better job than a manual review done by two overworked developers.

AI and quality

I think there might be people that disagree with my initial statement, that the AI is capable of delivering quality. An incident from last week where Claude Code wiped out the production database is something that will likely be thrown around in upcoming weeks.

But I personally liked what Scott Hanselman had to say about this. His correction was sharp: ā€œYOU wiped our production database.ā€ If a junior engineer takes down prod, it’s never the junior’s fault - it’s an access and SDLC issue.

This perfectly captures my sentiment about quality engineering in the AI era. We’re giving these tools more power without updating the guardrails. The speed at which AI can move means you need to be more knowledgeable than the AI, not less. As Hanselman put it: ā€œAI is an accelerator for talented engineers. In the wrong hands it’s a risk multiplier.ā€

Related to this statement, Dan Hockenmaier shared a 2x2 matrix that I keep coming back to. People with good judgment get 10x better with AI, but people without it get 10x worse. AI doesn’t make everyone better. It amplifies whatever you already are. The need for excellence is not going anywhere.

In fact, it may be needed more than ever. Over pas couple of months we’ve been listening about AI taking away jobs in tech. And with so many layoffs, many are inclined to believe this is actually the case. But a recent tweet by Rohan Paul showing data from Citadel Securities shows that job postings for software engineers are actually spiking.

Rohan interprets this as the Jevons paradox. When AI makes coding cheaper, companies don’t hire fewer engineers - they want to build more software. In the same twitter thread, Uber CEO Dara Khosrowshahi explains that if his average engineer becomes 25% more efficient, he’ll hire more engineers because he wants to go faster. Though he also predicts that in five years, the ROI of a human engineer gets surpassed by AI agents and GPU power.

So we might be hiring more engineers, but those engineers are doing more work, faster, with higher stakes. The tooling is getting better. The SDLC is being rewritten. And the gap between the people who have good judgment and those who don’t is getting wider. I’m not sure whether to find that exciting or unsettling. A little bit of both.

I’d love to hear what you think. Especially if you’re already seeing these shifts in your own team. Just hit reply.

Filip Hric

Sign up for weekly tips on testing, development, and everything related. Unsubscribe anytime you feel like you had enough 😊

Read more from Filip Hric

Hey Reader, It’s been a while since I’ve issued a newsletter, but I’m hoping to get back on track. There are just so many interesting things happening in the world of IT that I want to share with you all. But I’ve decided to change my approach to writing this newsletter a little bit. In the past, my main goal was to come up with some idea or some thought and basically write a post. This put a lot of pressure on me and I never wanted to force myself into writing a newsletter when I had nothing...

Filip Hric 22th July 2025 AI, BDD, and Why We're Bad at Predicting the Future Hello Reader,Lately, it seems the conversation around AI has shifted. We're moving past the initial "what can this chat window do?" phase and into the more practical, and sometimes awkward, integration into our development lifecycle. To me, one of the most interesting moments recently is that we're seeing major players like Amazon use BDD for AI-driven requirement planning. Plus the whole ecosystem of AI builder...

Filip Hric 13th May 2025 TO BE OR NOT TO BE (TECHNICAL) Hey Reader!There’s an ongoing debate in the QA world on whether it makes sense to get technical or not. I remember having a fairly heated debate on this last November at Agile Testing Days in Potsdam. My talk titled "Test like a developer, develop like a tester" argued that we should put the effort as testers to become experts in the software we test. And yes, that includes having a fair bit of developer knowledge. Jason Arbon called...