|
Hello Reader, AI can generate code, arguably with a pretty decent quality. Thatās not news anymore. The question thatās been forming in my head all week is different: how do we decide what should go into production? Writing code is not the hard part (arguably, it never was). The hard part is making sure the right code ships and the wrong code doesnāt. And right now, that selection problem is becoming the defining challenge of AI-assisted development. Last week has definitely showed this. Focus on code reviewsLast week, Anthropic announced that Claude Code now has a Code Review feature. When a PR opens, Claude dispatches a team of agents to hunt for bugs. Many people point out the irony in Claude reviewing itās own code (insert spiderman meme). Boris Cherny, who created Claude Code, responded to one of the tweets in an interesting manner: the more tokens you throw at a coding problem, the better the result. In other words, one agent can cause bugs while another catches them. A feature release like this one is always an interesting signal. As I mentioned in my last newsletter, it seems that code reviews are the next big challenge in AI adpoptin. With code velocity at an all time high, manual reviews just donāt cut it anymore. It seems like code review systems are what comes next. In fact, Claude is not the first tool to tackle this problem. Martian recently released a review bench of various different code review tools. Tools were put on trial against real codebases and ranked on how thorough and precise each tool is. I highly recommend checking out the results. I think that code reviews are indeed an interesting problem space. It does seem to be the current bottleneck and potentially a great way to keep bugs at bay. But I also feel that thereās more. Code review not just just about catching bugs. It has always been about knowledge transfer, about mentorship, about building a shared understanding of the codebase. I wonder what heppens with all this, when an AI reviews your code. Will the knowledge stay in the modelās context window and then be gone? I wonder how the part where the team gets smarter happen in the AI era. Is code evolving, or just mutating?Itamar Friedman, CEO of Qodo, published a piece that reframes the entire AI coding conversation through the lens of evolution. His argument is simple but profound: code generation is just mutation. Models write functions, agents generate pull requests, systems produce entire features - but from an evolutionary perspective, thatās just creating variation. What creates progress is selection. Evolution requires three ingredients: mutation, selection, and persistence. Without selection, mutations accumulate. With selection, improvement compounds. He points out that software engineering has always had selection loops ā tests, code review, CI pipelines, governance mechanisms. We just never described them that way. And now AI is dramatically increasing the mutation rate. Agents can understand unfamiliar codebases, propose architectural refactors, implement entire features. The rate of code production is skyrocketing. But the selection layer is not scaling at the same speed. The bottleneck in software development is moving from writing code to verifying it and selecting what survives. This landed hard for me. In my previous newsletter I talked about how quality engineering is evolving, and Itamarās framing gives it a language Iāve been missing. Weāre not just ātestersā or āquality engineersā. Weāre the selection layer. And if that layer doesnāt keep up with the mutation rate, systems donāt evolve ā they drift. The hot dog problemMo Bitar posted a video called āI was a 10x engineer. Now Iām uselessā and it hit me harder than I expected. Mo describes what happened when he used ChatGPT to deploy his entire product without looking at the code. It worked. And he hates it. His analogy is perfect: he made a hot dog. It looks like food, it tastes like food, the transaction is complete. But he canāt sell it because he has no emotional connection to it. He didnāt earn it. He didnāt suffer for it. And that suffering, that struggle, thatās what used to make us better at our craft. Moās video is honest, and asks an important question. What do you do, when you love to code? The activity and the craft of coding doesnāt seem to be in such a high demand as it used to. This new AI era takes something away from those who loved it. On the other hand, I belive there is a path forward, the goalpost has just moved. This tweet by Franziska seems to suggest an interesting problem space for engineers. Instead of making your work faster, you engineer AI systems. The problem with AI demosVidhya Ranganathan wrote a piece called āProduction Telemetry Is the Spec That Survivedā that I think should be required reading for anyone deploying AI agents on existing codebases. She introduces a framework that distinguishes between greenfield systems (new, clean, well-specified), brownfield (evolving, messy), and what she calls āblackfieldā - legacy systems under heavy load where the original intent is lost, documentation has rotted, and business rules hide in undocumented conditionals. AI coding tools are great at greenfield. They struggle with brownfield. And they fail at blackfield, because they infer specifications from code patterns, creating implicit specs that fail silently when they contradict accumulated production behavior. The only honest specification left in these systems lives in production telemetry: traffic patterns, error rates, usage data. I think this has always been a great pointer for testers on which tests should be written first. But it is also a very smart approach for adoption of new tools and testing PoC for services that provide nice demos, but leave you curious about real world usage. OpenAI acquires PromptFooThis connects to OpenAI acquiring Promptfoo, an AI security startup that specializes in red-teaming and vulnerability testing for AI systems. Promptfoo serves about 25% of Fortune 500 companies and has 130,000 developers using it monthly. OpenAI is integrating it into their Frontier platform to make security testing a built-in part of how teams ship AI agents. The fact that OpenAI felt the need to buy a company whose entire job is testing whether AI systems are safe tells you something about where we are. Weāre building agents that write code, review code, and deploy code, and weāre only now starting to seriously ask: but who tests the agents? Great questions to be asked about AIHank Green and Cal Newport sat down for a conversation about AI that I think captures the current moment better than most. Hankās approach is to catalog every legitimate concern - addiction, manipulation, hallucination, labor displacement, economic bubbles, childrenās exposure - and resist the urge to collapse them into a single narrative. Each concern has its own severity and its own likelihood. Theyāre separate problems. Cal Newport introduced a concept: āprogress laundering.ā Advances in one AI technology, like language models, get unfairly attributed to completely different domains like protein folding or robotics. These are separate technologies with separate trajectories, but the narrative treats them as one unstoppable wave. Itās a useful framing because it explains why the discourse feels so overwhelming. Weāre not dealing with one problem. Weāre dealing with dozens of separate problems being marketed as one. The whole conversation is great, but what surprised me (but makes perfect sense) was Calās take on current AI models. He claims that weāll probably end up with smaller, specialized systems that do specific things well - which, in a way, loops back to where we started. Specialized models. Specialized agents. Selection systems that keep the good mutations and discard the rest. Instead of having one know-it-all model like GPT-5.4, many will focus on models that are really good at specialized tasks. But thatās a prediction, not a certainty, so weāll see where we eventually end up. Iād love to hear how this is landing for you. Has your team started using AI code reviews, or are you still doing them manually? Do you see yourself as the selection layer, or does that framing feel off? And if youāre someone who loves the craft of coding - how are you making peace with the hot dog era? Hit reply, Iām genuinely curious where everyone is at right now. |
Sign up for weekly tips on testing, development, and everything related. Unsubscribe anytime you feel like you had enough š
Hey Reader, If youāre reading this, chances are you care about quality. Coming from QA, I never stop looking at the apps I build and systems I use through the lens of quality. Now, with more and more code being written by AI, this question matters more than ever. Many people wonder whether AI is even capable of delivering quality. I think it is. Though itās worth remembering that quality is multidimensional. You can always have more or less of it. When it comes to AI and specifically LLMs,...
Hey Reader, Itās been a while since Iāve issued a newsletter, but Iām hoping to get back on track. There are just so many interesting things happening in the world of IT that I want to share with you all. But Iāve decided to change my approach to writing this newsletter a little bit. In the past, my main goal was to come up with some idea or some thought and basically write a post. This put a lot of pressure on me and I never wanted to force myself into writing a newsletter when I had nothing...
Filip Hric 22th July 2025 AI, BDD, and Why We're Bad at Predicting the Future Hello Reader,Lately, it seems the conversation around AI has shifted. We're moving past the initial "what can this chat window do?" phase and into the more practical, and sometimes awkward, integration into our development lifecycle. To me, one of the most interesting moments recently is that we're seeing major players like Amazon use BDD for AI-driven requirement planning. Plus the whole ecosystem of AI builder...