Monthly Archives: December 2025

Faster horses, not trains

The lossy interface

All meaningful work starts in a physical, social, constraint-filled environment. We reason with space, time, bodies, artefacts, relationships, incentives, and history. Much of this understanding is tacit. We sense it before we can explain it.

To involve a computer, that reality has to be translated into symbols. Text, files, data models, diagrams, prompts. Every translation step compresses context and/or throws information away. There is loss from brain to keyboard. Loss from keyboard to prompt. And loss again when the output comes back and has to be interpreted.

GenAI only ever sees what makes it across that boundary. It reasons over compressed representations of reality that humans have already filtered, simplified, and distorted.

Better models reduce friction within that interface, but they don’t change its dimensionality. In that respect it doesn’t really matter how “smart” the models get, or how well they do on the latest benchmarks. The boundary stays the same.

Because of that, GenAI works best where the world is already well-represented in digital form. As soon as outcomes depend on things outside its boundary, its usefulness drops sharply.

That is why GenAI helps with slices of work, not whole systems. It is powerful, but fundamentally bounded.

Some real world examples:

In software development, generating code hasn’t been the main bottleneck since we moved away from punch cards. The far bigger constraints are understanding the problem, communicating with stakeholders, working effectively with other people, designing the system, managing risks and trade-offs, and operating systems in complex social environments over time.
In healthcare, GenAI can assist with diagnosis or documentation, but outcomes are dominated by staff, facilities, funding, and coordination across complex human systems. Better reasoning does not create more nurses or hospital beds.

In both cases, GenAI accelerates parts of the work without shifting the underlying constraint.

Faster horses, not trains

In that respect, GenAI feels like faster horses rather than trains. It makes us more effective at things we were already doing, writing, code, analysis, planning, and sense-making, but operates on only parts, thin slices of systems.

Trains didn’t just make transport faster. They removed a hard upper bound on the movement of people and goods. Once that constraint moved, everything else reorganised around it. Supply chains, labour markets, cities, timekeeping, and even how people understood distance and work all changed. Railways were not just a tool inside the system, they became the system.

GenAI doesn’t yet do that. It works through a narrow, virtual interface and plugs into existing workflows. But as often as not the real systematic constraints lie elsewhere.

What actually changed the world

A recent conversation reminded me of Vaclav Smil’s How the World Really Works, which I read last year.

Smil highlights that modern civilisation rests on a small number of physical pillars: energy, food production (especially nitrogen), materials like steel and cement, and transport. Changes in these pillars are what led to the biggest transformations in human life. Information technology barely registers at that level in his analysis. He doesn’t deny its importance, but treats it as secondary, an optimiser of systems whose limits are set elsewhere.

Through that lens, GenAI doesn’t (yet) register as a civilisation-shaping force. It doesn’t produce energy, grow food, create new materials, or move mass. It operates almost entirely above those pillars, improving coordination, design, and decision-making around systems whose hard limits are set elsewhere.

That doesn’t make it trivial. But it explains why, so far, it looks closer to previous waves of information technology than to steam or electricity. It optimises within existing constraints rather than breaking them.

The big if

Smil’s framing doesn’t say GenAI cannot matter at an industrial scale. It says where it would have to show up. GenAI becomes civilisation-shaping only if it materially accelerates breakthroughs in those physical pillars – things that change what the world can physically sustain.

This is where “superintelligence” comes in. If GenAI can explore hypothesis spaces humans cannot, design and run experiments, or compress decades of scientific iteration into years, resulting in major scientific breakthroughs, it moves from optimising within constraints to changing them.

This is also where my own doubts sit. Many think just scaling what we have now will get us there. For those that don’t believe that, but are still optimistic about AI’s potential, they turn to world models, embodiment, or agents that can act in the real world. There are sketches and hopes for how this may happen, but as yet, not much more than that.

So while superintelligence is the path by which AI could plausibly become industrial-scale transformative, it’s a long and uncertain one.

What kind of change are we talking about?

If you mean web-scale change, then GenAI is already there. But if we mean the kind of change associated with the industrial revolution (as it’s often compared to) – longer lives, better health, radically different working conditions, step changes in material living standards, then what we have today does not qualify. Historically, those shifts followed from breaking physical constraints, not from better information or reasoning alone.

For me, and why I’m not really feeling successive model improvements, it isn’t that GenAI lacks value. It’s that those improvements don’t change the shape of what’s possible. They operate within the same narrow, lossy interface, so they barely register in practical terms. GenAI still adds value, and already feels web-scale transformative. But until that boundary moves, or something else breaks the underlying constraints, they don’t feel like steps toward an industrial-revolution-scale shift.

More with less, or is it more with the same?

AI Is still making code worse: A new CMU study confirms

9 Replies

In early 2025 I wrote about GitClear’s analysis of the impact of GenAI on code quality, based on 2024 data, which showed a significant degradation in code quality and maintainability. I recently came across a new study from Carnegie Mellon, “Does AI-Assisted Coding Deliver? A Difference-in-Differences Study of Cursor’s Impact on Software Projects” that looks at a more recent period, tracking code quality in projects using GenAI tools up to mid-2025. So has code quality improved as the models and tools have matured?

The answer appears to be no. The study finds that AI briefly accelerates code generation, but the underlying code quality trends continue to move in the wrong direction.

How the study was run

Researchers at Carnegie Mellon University analysed 807 open source GitHub repositories that adopted Cursor between January 2024 and March 2025, and tracked how those projects changed through to August 2025. Adoption was identified by looking for Cursor configuration files committed to the repo.

For comparison, the researchers built a control group of 1,380 similar GitHub repositories that didn’t adopt Cursor (see caveats below).

For code quality, they used SonarQube, a widely used and well respected code analysis tool that scans code for quality and security issues. The researchers ran SonarQube monthly to track how each codebase evolved, focusing on static analysis warnings, code duplication and code complexity.

Finally, they attempted to filter out toy or throwaway repositories by only including projects with at least 10 GitHub stars.

Key findings

Compared to the control group:

A short lived increase in code generated: Activity spikes in the first one or two months after adoption. Commits rise and lines added jump sharply, with the biggest increase in the first month
The increase does not persist: By month three, activity returns to baseline. There is no sustained increase in code generated.
Static analysis warnings increase and remain elevated: Warnings rise by around 30 percent post-adoption and stay high for the rest of the observation window.
Code complexity increases significantly: Code complexity rose by more than 40 percent, more than could reasonably be accounted for by just the growth in codebase size.

Caveats/Limitations

The study only looked at open source projects, which aren’t really comparable to production code bases. Also, adoption is inferred from committed Cursor configuration files which I would say is a reasonably reliable signal of usage within those projects. However the control group is not necessarily AI usage free, code in those repositories may still have been created using Copilot, Claude Code or other tools.

My Takeaways

A notable period for AI assisted development

What’s notable is the period this study tracks. In December 2024 Cursor released a major upgrade to their IDE and introduced its agent mode. It was the first time I heard experienced developers I respect describe AI coding assistants as genuinely useful. Cursor adoption climbed quickly and most developers I knew were using Claude Sonnet for day to day coding. Then in February 2025 Anthropic released Claude 3.7 Sonnet, followed in May by Sonnet 4.0 and their first reasoning model, Opus 4.1.

If improvements in models or tooling were going to reverse the code quality issues seen previously, you’d expect it to show up during this period. This study shows no reversal. The pattern is broadly the same as GitClear observed for 2024.

It’s not just “user error”

A common argument is that poor AI-generated code is the user’s fault, not the tool’s. If developers wrote clearer prompts, gave better instructions or reviewed more carefully, quality wouldn’t suffer. This study disagrees. Even across hundreds of real projects, and even after accounting for how much code was added, complexity increased faster in the AI-assisted repos than in the control group. The tools are contributing to the problem, not merely reflecting user behaviour.

Context collapse playing out in real time

Organisations training LLMs probably use similar signals to this study to decide which open source repositories to train on: popularity, activity and signs of being “engineered” rather than experimental. This study shows more than 800 popular GitHub projects with code quality degrading after adopting AI tools. It’s hard not to see a form of context collapse playing out in real time. If the public code that future models learn from is becoming more complex and less maintainable, there’s a real risk that newer models will reinforce and amplify those trends, producing even worse code over time.

Things are continuing to evolve quickly, but…

Of course, things have continued to move quickly since the period this study covers. Claude Code is currently the poster child for GenAI assisted development. Developers are learning how to instruct these tools more effectively through patterns like Claude.md and Agents.md, and support for these conventions is improving within the IDEs.

In my recent experience at least, these improvements mean you can generate good quality code, with the right guardrails in place. However without them (or when it ignores them, which is another matter) the output still trends towards the same issues: long functions, heavy nesting of conditional logic, unnecessary comments, repeated logic – code that is far more complex than it needs to be.

No doubt the tools will continue to improve, and much of the meaningful progress is happening in the IDE layer rather than in the models themselves. However this study suggests the underlying code quality issues aren’t shifting. The structural problems remain, and they aren’t helped by the fact that the code these models are trained on is likely getting worse. The work of keeping code simple, maintainable and healthy still sits with the human, at least for the foreseeable future.

Rob Bowley

Adventures In Software Development

Monthly Archives: December 2025

Faster horses, not trains

The lossy interface

Faster horses, not trains

What actually changed the world

The big if

What kind of change are we talking about?

More with less, or is it more with the same?

AI Is still making code worse: A new CMU study confirms

How the study was run

Key findings

Caveats/Limitations

My Takeaways

A notable period for AI assisted development

It’s not just “user error”

Context collapse playing out in real time

Things are continuing to evolve quickly, but…