Original title: Thoughts on slowing the fuck down
Original author: Mario Zechner
Translation: Peggy, BlockBeats
Editor’s note: In today’s moment when generative AI is accelerating into software engineering, industry sentiment is shifting from “awe at capability” to “efficiency anxiety.” Not writing fast enough, using it enough, or automating enough—apparently all of it can create the pressure of feeling left behind. But when coding agents truly enter production environments, some more realistic problems start to surface: errors get amplified, complexity gets out of control, systems gradually become incomprehensible, and increases in efficiency don’t translate into proportional improvements in quality.
Based on hands-on practice in the field, this article offers a calm reflection on this wave of “agentic coding.” The author points out that agents don’t learn from errors the way humans do; in the absence of bottlenecks and feedback mechanisms, small issues get quickly magnified. And in complex codebases, their local perspective and limited recall further intensify the system’s structural chaos. The essence of these problems isn’t the technology itself—it’s that, under anxiety-driven behavior, humans hand over judgment and control too early.
So instead of getting stuck in the anxiety of “whether we have to fully embrace AI,” we should recalibrate the relationship between humans and tools: have agents take on local, controllable tasks, while keeping system design, quality gates, and key decisions firmly in your own hands. In this process, “slowing down” becomes a kind of capability—it means you still understand the system, can make trade-offs, and still retain a sense of control over the work.
In an era where tools keep evolving, what may be truly scarce isn’t faster generation capability, but the ability to judge complexity—and the resolve to choose between efficiency and quality.
Below is the original text:
The turtle’s face—those are the expressions I see when I look at this industry
About a year ago, coding agents that can truly help you “complete a whole project end to end” began to appear. Before that, there were tools like Aider and early Cursor, but they were more like assistants than “agents.” The new generation of tools is highly appealing, and many people have spent a large amount of their spare time turning into reality all the projects they’d always wanted to do but never had time for.
I don’t think there’s anything wrong with that. It’s fun to build things with your spare time, and most of the time you don’t really need to worry about code quality and maintainability. This also gives you a path to learn a new tech stack.
During the Christmas holidays, Anthropic and OpenAI also released some “free credits,” pulling people in like slot machines. For many people, it’s the first time they’ve truly experienced the magic of “agents writing code.” More and more people join in.
Nowadays, coding agents are also starting to enter production codebases. Twelve months later, we’re starting to see the consequences of this “progress.” Here are my current thoughts.
Even though most of these are just anecdotal takes, software today really does feel like it could “fall apart at any moment.” A 98% uptime rate is shifting from being an exception into the norm—even for large services. The user interface is full of all kinds of absurd bugs, the kind that QA teams should be able to catch at a glance.
I admit this situation already existed before agents. But now, the problems are clearly accelerating.
We can’t see the real situation inside companies, but occasionally some information leaks out—like that rumored “AI-caused AWS outage.” Amazon Web Services promptly “corrected” the claim right away, but then immediately launched a 90-day restructuring plan internally.
Satya Nadella (Microsoft CEO) has also been emphasizing lately that an increasing amount of code in the company is written by AI. While there’s no direct evidence, you do get the feeling that Windows quality is declining. Even from some of Microsoft’s own blog posts, they seem to assume this is the case.
Companies that claim “100% of the product is generated by AI code” almost always end up shipping the worst products you can imagine. Not that this targets anyone in particular—but memory leaks in the range of GB, UI chaos, missing features, frequent crashes… none of this is the “quality endorsement” they believe they’re providing, and it’s definitely not a positive example of “letting an agent do everything for you.”
Off the record, you’ll hear more and more people—both at large companies and small teams—saying one thing: they’ve been pushed into a dead end by “agent-written code.” Without code review, handing design decisions to agents, and then piling on a bunch of features nobody needs—the outcome isn’t going to be good.
We’ve basically given up on engineering discipline and subjective judgment and fallen into an “addictive” way of working: there’s only one goal—to generate the most code in the shortest amount of time, with absolutely no consideration for what the consequences will be.
You’re building an orchestration layer to command an automated army of agents. You install “Beads,” yet you have no idea that, in essence, it’s basically an undetachable “malware.” Just because people say “everyone does it.” If you don’t do it, you’re “done” (ngmi).
You keep consuming yourself in an endless “doll-in-a-doll” loop.
Look—Anthropic used a bunch of agents to make a C compiler. Even though it still has problems now, the next-generation model will definitely fix it, right?
And look—Cursor used a large group of agents to make a browser. Even though it’s basically unusable for now and still needs occasional manual intervention, the next-generation model will definitely be able to handle it, right?
“Distributed,” “divide and conquer,” “autonomous systems,” “dark factories,” “solving software problems in six months,” “SaaS is dead; my grandma just built a Shopify with Claw”…
These narratives sound great.
Of course, this approach might indeed “still work” for your kind of side project that almost nobody uses (including you). Maybe there really is a genius who can use this method to produce a non-junky, truly used software product. If you’re that person, I genuinely admire you.
But at least in the developer circles around me, I haven’t seen any real success stories that this method actually works. Of course, maybe it’s just that we’re all too incompetent.
The problem with agents is that they make mistakes. That part isn’t special—humans make mistakes too. They might be correctness errors: the kind that are easy to identify and easy to fix, and then stabilizing things with a regression test would make it even safer. They might be code “smells” that the linter can’t catch: an unused method here, an unreasonable type there, duplicated code, and so on. Look at each one individually and they’re not that harmful—human developers make these kinds of small mistakes as well.
But “machines” aren’t people. After humans repeat the same mistakes a few times, they usually learn not to repeat them—either they get scolded into it or they correct it through real learning.
And agents don’t have that kind of learning ability—at least not by default. They will repeat the same mistakes again and again, and they can even “create” fascinating combinations of different errors based on the training data.
Of course, you can try to “train” them: write rules in AGENTS.md so it won’t make those mistakes again; design a complex memory system so it can query past errors and best practices. This does work for certain specific types of problems. But the prerequisite is—you must first observe that it made that mistake.
A more crucial difference is this: humans have bottlenecks, and agents don’t.
Humans can’t spit out twenty thousand lines of code in a few hours. Even if the mistake rate isn’t low, in a single day you can only introduce a limited number of errors, and the accumulation is slow. Usually, once “the pain caused by errors” accumulates to a certain level, humans (out of a visceral hatred of pain) will stop to fix things a bit. Or the person gets replaced, and someone else fixes it. In short, the problem gets handled.
But when you orchestrate a whole “army” of agents, there’s no bottleneck and no sense of “pain.” Those tiny mistakes that were originally negligible will stack up at an unsustainable speed. You’ve been removed from the loop, and you don’t even know that these seemingly harmless issues have already grown into a huge monster. By the time you actually feel the pain, it’s often too late.
Until one day, you want to add a new feature and find that the current system architecture (essentially already a pile of accumulated mistakes) can’t support modification at all; or users start complaining like crazy because the most recent release has broken things—maybe even losing data.
That’s when you realize: you can no longer trust this codebase.
Worse still, the thousands of unit tests, snapshot tests, and end-to-end tests generated by agents are no longer trustworthy either. The only remaining way to determine whether “the system is working properly” is manual testing.
Congratulations—you’ve dug a hole for yourself (and your company).
You have no idea what’s happening in the system, because you’ve handed over control to the agents. And agents, in essence, are “selling complexity.” They’ve seen countless terrible architectural decisions in the training data, and in reinforcement learning they keep reinforcing these patterns. You have them design systems—so the outcome is predictable.
What you end up with is: a whole suite of extremely complex systems, mixed with poor imitations of “industry best practices,” and before the situation goes out of control, you haven’t imposed any constraints.
But the problem doesn’t end there. Your agents don’t share an execution process with each other; they can’t see the full codebase; and they don’t know what you or other agents decided previously. As a result, their decisions are always “local.”
This directly leads to the problems mentioned earlier: lots of duplicated code, abstract structures that are abstract just for the sake of it, and all kinds of inconsistencies. These issues keep stacking up, eventually forming an irreversible complex system.
This is actually very similar to enterprise-level codebases written by humans. The complexity there usually results from years of accumulation: the pain is distributed across many people, no one reaches the critical point of “it has to be fixed,” and the organization’s tolerance is high—so complexity and the organization “co-evolve” in a symbiotic way.
But with the combination of humans + agents, this process gets accelerated massively. Two people plus a bunch of agents can reach that kind of complexity in a matter of weeks.
You might hope that agents can “clean up the mess”—help you refactor, optimize, and make the system tidy. But the problem is: they simply can’t do that anymore.
Because the codebase is too large and the complexity is too high, and agents can only ever see locally. It’s not just that the context window isn’t big enough, or that long-context mechanisms fail when faced with millions of lines of code. The issue is more subtle.
Before an agent tries to fix a system, it must first find all the code that needs changes, as well as any existing implementations that can be reused. We call this step agentic search (agent search).
How agents do this depends on what tools you give them: Bash + ripgrep, a queryable code index, LSP services, vector databases…
But no matter what tools you use, the core is the same: the larger the codebase, the lower the recall. Low recall means the agent can’t find all the relevant code, and naturally it can’t make correct modifications.
That’s also why those small “code smell” mistakes show up at the beginning: it didn’t find an existing implementation, so it reinvented the wheel and introduced inconsistencies. In the end, these issues keep spreading and stacking, blooming into an extremely complex “mess of broken patterns.”
So how should we avoid all of this?
Coding agents are like sirens—using lightning-fast code generation speed and that kind of “inconsistent, but occasionally dazzling” intelligence to draw you in. They can often complete some simple tasks with astonishing speed and high quality. The real problems start when you develop the thought that “this thing is too powerful—computer, do the work for me!”
Handing tasks to agents itself isn’t necessarily a problem. Good agent tasks usually have a few characteristics: the scope can be constrained well, without needing to understand the entire system; the task is closed-loop, meaning the agent can evaluate the result on its own; the output isn’t on the critical path—just some temporary tooling or internal software that won’t affect real users or revenue; or you only need an “eraser duck” to help you think—essentially running your ideas through a collision with the compressed knowledge of the internet and synthesized data.
If those conditions are met, then this is the kind of task that’s suitable to give to agents—provided that you, as a human, remain the final quality gatekeeper.
For example, optimizing app startup time with Andrej Karpathy’s auto-research method? Sure. But the prerequisite is that you understand the code it outputs is absolutely not production-ready. auto-research works because you give it an evaluation function so it can optimize around a particular metric (like startup time or loss). But that evaluation function covers only a very narrow dimension. The agent will confidently ignore every metric that isn’t in the evaluation function—code quality, system complexity, and in some cases even correctness—if your evaluation function itself is flawed.
The core idea is actually simple: have the agent do those boring things that won’t teach you anything new, or the exploratory work you simply don’t have time to try. Then you evaluate the results, pick out the parts that are truly reasonable and correct, and complete the final implementation. Of course, for the last step, you can also use the agent.
But what I want to emphasize more is: yeah, it really is time to slow down a bit.
Give yourself time to think about what you’re doing and why you’re doing it. Give yourself a chance to say “no”—“no, we don’t need this.” Set a clear ceiling for the agent: how many lines of code it’s allowed to generate per day should match what you can realistically review. All parts that determine the system’s “overall shape,” such as architecture and APIs, should be written by you personally. You can use autocomplete to get some of that “hand-written code” feel, or pair program with the agent—but the key is: the code must be in your hands.
Because personally writing the code, or watching it get built step by step, creates a kind of “friction” by itself. It’s precisely that friction that helps you understand more clearly what you want to do, how the system operates, and what the overall “feel” is like. This is where experience and “taste” come into play—something the most advanced models today still can’t replace. Slowing down, enduring a bit of friction—that’s exactly how you learn and grow.
In the end, you’ll get a system that remains maintainable—at least no worse than it would have been before agents showed up. Yes, the old systems weren’t perfect either. But your users will thank you, because your product is “useful,” not a pile of junk cobbled together.
You’ll build fewer features, but they’ll be more correct. Learning to say “no” is itself a capability. You can also sleep soundly, because you still know what’s happening in the system and you still retain the initiative. It’s precisely this understanding that lets you compensate for the recall problems of agentic search, making agent outputs more reliable and requiring less patching.
When the system goes wrong, you can step in to fix it yourself. And when the design is fundamentally unreasonable from the start, you can understand what’s wrong and refactor it into a better shape. As for whether there are agents, it’s honestly not that important.
All of this requires discipline. None of it works without people.
[Original link]
Click to learn about hiring opportunities at Lydong BlockBeats
Welcome to join Lydong BlockBeats’ official community:
Telegram subscription group: https://t.me/theblockbeats
Telegram discussion group: https://t.me/BlockBeats_App
Twitter official account: https://twitter.com/BlockBeatsAsia