|
Hey Reader,, A few years ago Cursor was a nicer place to type code. This week it announced Origin, its own Git competitor built for agent workloads, and got acquired by SpaceX for sixty billion dollars. Somewhere in between, it stopped being an editor and started becoming the whole stack ā the place you write, review, merge, and increasingly test your software. I come from QA, so my first instinct isnāt excitement, itās a question: when one tool owns every step of the loop, whoās the independent check on quality? Everyone builds with Cursor. Almost nobody tests with it.This is the gap thatās been bugging me, and itās why I put together a workshop on AI-powered quality engineering with Cursor. Everyone reaches for it to generate features. Far fewer people point it at the thing that tells you whether those features actually work. The catch is that an agent will happily write you 10,000 lines of test slop ā tests that pass, look thorough, and verify almost nothing. So the goal isnāt āCursor, generate my tests.ā Itās the opposite: you steer it with smart decisions about what behavior actually matters, and the test becomes the way you encode that. RizĆØl Scarlett pointed to a conversation with Angie Jones that says this better than I can ā tests matter more in the age of agents, not less, because a test is how you teach an agent how your software is supposed to behave. Angieās been one of the clearest voices in testing for years, and her full episode is worth the listen. The oldest idea in QA turns out to be the leash, and learning to hold it well is the 2.5 hours I want to spend with people. When the same tool writes and grades the homeworkOrigin is the part I keep turning over, because resolving merge conflicts with an agent means the same product now writes, hosts, and proposes to merge your code. Addy Osmani put numbers on why that should worry you in a sharp thread on agentic code review. Agents produce roughly 4x more output but only about 10% more real value, and the gap is review work nobodyās keeping up with. His phrase for it: āWe made writing cheap, and understanding stayed exactly as expensive.ā Code churn up 861%, defects climbing from 9% to 54%, zero-review merges up 31%. I donāt agree with all of it, though. Addy suggests running multiple AI reviewers on the same PR, and I pushed back on that. More tools flag more issues, sure, but the cost is alert fatigue, and a wall of low-signal flags is just another way of not really reviewing. Reconstructing intent is worth doing ā there are more focused ways to do it than piling on reviewers. My colleague Nnenna makes the sharper structural point: code quality is a governance question before itās a tooling one. Risk-profile your repos so a payment flow and a throwaway internal script donāt get the same scrutiny, push deterministic checks earlier in the workflow, and treat AI review as independent verification tied to explicit rules. Thatās the āblast radiusā idea made operational, and itās the opposite of letting one tool decide everything by default. This is the half of the job I think gets too little attention, which is why we put together a free Code Review Academy at Qodo, where I work ā for people whoād rather get good at reviewing than outsource it entirely. None of the tools here are the villain. Cursor is good enough that owning the write-review-merge-test loop feels like convenience rather than a trap, and thatās exactly what makes it worth watching. When one product can do every step, quality stops being something the workflow hands you and becomes something you have to insist on ā the test you actually steer, the risk map for what deserves real scrutiny, the review that isnāt graded by the thing that wrote it. None of that shows up by default. You put it there on purpose, or it isnāt there. So draw that line before the tooling draws it for you. If your team already lives inside one tool, whatās the one quality check youād refuse to give up ā and would it survive the next convenient update? Thatās the part Iām still working out, and Iād genuinely like to hear what youāre holding onto. |
Sign up for weekly tips on testing, development, and everything related. Unsubscribe anytime you feel like you had enough š
āToo dangerous to releaseā has become its own genre of AI announcement. Project Glasswing is the latest entry: not quite a product launch, but a claim about a threshold, dressed up with enough corporate coalition to signal this one is serious. Anthropic says their new security-focused model, Claude Mythos Preview, can find software vulnerabilities better than all but the most skilled human experts. George Hotz challenged the ātoo dangerous to releaseā narrative by pointing at the obvious:...
Anthropic had a rough week. And the part that stings isnāt just that something went wrong - itās how they handled it. A map file, a DMCA frenzy, and a Python loophole On March 31st, Anthropic accidentally shipped Claude Codeās TypeScript source code via a map file left in their npm package. The leak was spotted almost immediately, and GitHub repositories mirroring the code started receiving DMCA takedowns shortly after. What followed was a fairly aggressive takedown campaign by Anthropic. One...
Hello Reader, If youāve been reading this newsletter for a while, you know that quality engineering is the hill Iāll always choose to stand on. And this week, I get to share something personal that ties directly into that. Iām joining Qodo Iāve been following Qodo for almost a year now, and Iāve been getting more and more impressed every day. So Iām thrilled to share that Iām joining Qodo as a DevRel engineer. Qodo is an enterprise multi-agent platform for AI-driven code reviews. As AI...