Author Archives: Rob

GenAI is amplifying the skills gap in software engineering

This is a cross post of an article I wrote for Ada, the National College for Digital Skills

All the available evidence suggests that GenAI-assisted coding is most powerful in the hands of highly experienced software engineers, while having neutral or even negative effects for less experienced ones.

It’s easy to see how this may be interpreted inside organisations. If experienced engineers can be made significantly more productive with GenAI, then it can appear rational to rely more heavily on that group. Smaller teams of senior engineers, supported by GenAI, with fewer junior or entry-level roles, can look like an attractive opportunity.

However, genuinely good, experienced software engineers are already scarce across the industry. It is difficult to put a precise figure on this, but studies on the impact of AI in software engineering suggest that only a minority of engineers and teams currently have the skills and experience needed to realise sustained benefits from GenAI-assisted software development, likely somewhere in the region of 10–30%.

That raises an obvious question about where the next generation of experienced engineers will come from.

The risk is not only that organisations hire fewer junior engineers, but that even when they do, the conditions for learning are compromised. A recent study by Anthropic, one of the organisations at the frontier of GenAI and the creators of the Claude models and tools such as Claude Code, found that developers using AI assistance completed coding tasks slightly faster but demonstrated significantly weaker understanding afterwards. When the tool was allowed to do too much of the thinking, learning suffered.

More generally, as organisations offload more work to these tools, team dynamics begin to shift. Fewer questions are asked. Explanation gives way to acceptance. Output rises, but shared understanding does not.

This all happens quietly. Everything looks efficient, right up until it’s not.

Despite repeated waves of tooling, the core skills that define good software engineering have remained remarkably stable. Effective problem-solving, system-level thinking, feedback, shared understanding, automated testing and iterative change have been recognised as good practice for decades.

Learning is not just an individual concern. Software development is a learning activity at every level. Teams learn about users, systems, risks, and constraints through the work itself.

What has also remained true, despite this being well understood, is that only a minority of the industry consistently applies them. The evidence also increasingly suggests these practices are becoming even more relevant in the age of GenAI.

GenAI accelerates this dynamic. Without deliberate effort, it can speed up delivery while quietly weakening an organisation’s ability to create and sustain expertise. When organisations optimise purely for short-term efficiency, learning is often the first thing to erode. When learning slows, capability follows.

The organisations that will thrive in the GenAI era will not be the ones that simply adopt the tools. They will be the ones that treat learning as core to how they operate.

That includes investing in early-career development, creating environments where experience is accumulated rather than bypassed, and recognising that effective software development has always depended on people who can exercise judgement, reason about systems, and learn continuously, not just produce output.

Sixty years of learning the same lesson

2 Replies

GenAI feels like another turning point for software development. It’s really just the latest moment in a long, repeating pattern of partial revelation and broad avoidance of how creating software needs to be approached.

Early limits of software development

The “software crisis” of the 1960s was the first time this became apparent – there was a sudden leap in computing power, huge optimism, and a belief that software development would simply scale to match it. It didn’t. Projects ran late and over budget, systems failed in production, and codebases became unmaintainable.

Systems had become too complex for the informal development practices at the time. The response was software engineering as a discipline, the industry recognising that writing software was not just typing instructions but coordinating human understanding. The answer was more structure, more process, more formality. The underlying assumption, though, was that software development could be treated like other branches of engineering: plan carefully, specify up front, then execute

When big up front planning began to struggle to keep up

By the 1990s that model was starting to crack. Several things had shifted. Programming languages and tooling had improved. IDEs, compilers and version control reduced friction. Hardware and compute continued to become cheaper and more powerful. But most importantly, the internet removed distribution friction. You no longer had to ship physical media or install systems on site. Software changes could reach users almost instantly. The cost of change dropped sharply.

Agile emerged in that gap. Not because people suddenly liked stand ups and sticky notes, but because fast feedback through working software was proving to be more effective than detailed upfront plans.

The problem was that the underlying assumption that software was a construction problem never really went away. Most organisations copied the rituals and missed the point. Agile in practice largely became process rather than philosophy.

Software development in the GenAI era

And now GenAI-assisted coding. GenAI has emerged from the continued trend of compute becoming more powerful and cheaper, combined with the sheer volume of data now available as a result of the internet era. The underlying concepts of machine learning have existed for decades, but only recently have the economics and data volumes made it practical.

As a result, the distance from intent to working artefact has shrunk again. You can explore ideas, generate alternatives, test assumptions and see consequences in hours rather than days.

And yet, software engineering is still, to this day, largely treated as a construction problem. Most teams operate as feature factories, fed tickets from a backlog, optimising for “velocity” rather than fast feedback and being able to quickly adapt and respond to new information. Work is broken into tasks, handed off between roles, and judged by output rather than learning.

What building software is really about

At its core, software exists to solve one problem: managing the flow of information. Information is not static. It is provisional, contextual, and constantly changing. Every use of a system creates new information that should influence what happens next. Many of its most important inputs only appear once it is in use.

Software is therefore less a construction problem and more an ongoing conversation between users, systems and decisions. Its quality lies in the strength of its feedback loops and how effectively it enables learning and adjustment as conditions change.

Technology advances keep reducing the cost of building software, but organisations keep failing to adapt how they approach it. Each era exposes that mismatch more visibly, without resolving it.

The organisations that will thrive in this era will be the ones that optimise for learning over output, feedback over plans, and understanding over speed.

Hopefully this time it will be different.

Coding has never been the bottleneck

Leave a reply

Coding has never been the governing bottleneck in software delivery. Not recently. Not in the last decade. And not across the entire history of the discipline.

I wrote this post in response to the current wave of people claiming “AI means coding is no longer the bottleneck” and to have somewhere to point them too – a long trail of experienced practitioners highlighting the main constraints in software delivery have always sat elsewhere.

That doesn’t mean coding speed never matters. In small teams, narrow problem spaces, or early exploration, it can be a local constraint, for a time. The point is that once software becomes non-trivial, progress is governed far more by other factors – such as understanding, decision-making, coordination and feedback – than by the rate at which code can be produced.

By the late 2000s, it was a meme

In 2009, Sebastián Hermida created a sticker that shows a row of monkeys hammering on keyboards under the caption “Typing is not the bottleneck”. It spread widely and became a piece of shared shorthand in the software community. He didn’t invent the idea. He turned it into a meme because, by then, it was already widely understood among practitioners.

Kevlin Henney, a long-standing independent consultant and educator, known for decades of international conference keynotes and training on software design, has said he was using the phrase in talks and training as far back as the late 1990s. The same wording also appeared in a 2009 blog post by GeePaw Hill, (a software developer, coach, and writer best known for his work in Extreme Programming), challenging the notion that practices like TDD and pair programming slow teams down.

Whether Hermida encountered the phrase via Henney, Hill, or elsewhere doesn’t matter. By the late 2000s, this way of thinking was already widely shared and known among many experienced practitioners.

Around 2000, the constraint was already understood to sit upstream

In 2000, Joel Spolsky, then a well known, prolific, blogger and co-founder of Fog Creek Software (then later Stack Overflow and Trello), published a series of articles on Painless Functional Specifications.

The articles are often remembered as an argument for writing specs, and they are. The more important point is why Spolsky cared about them. He argued that teams lose time by committing to decisions too early in code, then discovering problems only after that code exists.

You don’t have to agree with Spolsky’s preferred balance between upfront and iterative design to accept the premise. Twenty-five years ago, he was already pointing out that the limiting factor was deciding what to build and how it should behave, not how quickly code could be produced – “failing to write a spec is the single biggest unnecessary risk you take in a software project.”

Today, with GenAi there’s lots of interest in “specification driven development.” as if it’s the hot new thing. It’s not. It reflects the same underlying constraint Spolsky was describing in 2000. Across the lifecycle, code has long been, relatively speaking, easy to produce. The harder part has always been deciding what should exist, and living with the consequences once it does.

In the early 1990s, mainstream engineering literature said the same thing

In 1993, Steve McConnell published Code Complete, a book that has remained in print for decades and is still widely recommended as a foundational text in professional software development. The book was intended to consolidate what was known, from research and industry practice, about how professional software is actually built.

Drawing on a wide range of studies, McConnell showed that the dominant drivers of cost and schedule are not the act of coding itself, but defects discovered late – during system testing or after release – and the resultant cost of rework. Those defects overwhelmingly originate in requirements and design rather than during coding itself.

Even in the punchcard era, coding was not the bottleneck

Whilst programming was painfully slow by modern standards, it was still fast compared to the time it took to learn whether the code worked. Programs were submitted as batch jobs and queued for execution, with results returning hours or even days later. Any mistake meant correcting the code and starting the entire cycle again.

In 1975, Fred Brooks published The Mythical Man-Month, one of the most cited and enduring books in the history of software engineering, drawing directly on his experience building large IBM mainframe systems in the batch and punchcard era. Brooks’s essays focused on coordination, communication, and conceptual integrity – implying that the dominant challenges lay elsewhere than code production.

In Brook’s now famous essay No Silver Bullet, added to the anniversary edition of The Mythical Man-Month (1986), he made his core argument explicit. Software is hard for reasons that tools cannot remove. He distinguished between essential complexity, the difficulty of understanding a problem and deciding how software should behave, and accidental complexity, which comes from tools, languages, and machines. Decades of tooling improvements reduced accidental complexity to the point where there was, even by 1986, no order of magnitude benefit to be had from further tooling improvements.

At a similar time to the first edition of Brook’s book, in Structured Analysis and System Specification, Tom DeMarco argued for careful analysis and specification precisely because discovering misunderstandings after implementation was so expensive in batch environments.

This was already apparent even earlier. Maurice Wilkes, one of the pioneers of stored program computing, later reflected in Memoirs of a Computer Pioneer his realisation in the late 1940s, that “a good part of the remainder of my life was going to be spent in finding errors in my own programs.” From the very beginning, debugging and verification, not writing code, dominated effort.

Faster horses, not trains

Leave a reply

I’ve been trying to work out why successive advances in GenAI models don’t feel particularly different to me, even as others react with genuine excitement.

I use these tools constantly and have done since ChatGPT4 was released nearly 3 years ago. I couldn’t imagine a world without them. In that sense, they already feel as transformative as the web. I’ve been thinking perhaps how its once they become ambient, the magic fades. You get used to them and stop noticing improvements. But the more I’ve thought about it, the more I think there are deeper structural reasons why the experience has plateaued, for me at least.

The lossy interface

All meaningful work starts in a physical, social, constraint-filled environment. We reason with space, time, bodies, artefacts, relationships, incentives, and history. Much of this understanding is tacit. We sense it before we can explain it.

To involve a computer, that reality has to be translated into symbols. Text, files, data models, diagrams, prompts. Every translation step compresses context and/or throws information away. There is loss from brain to keyboard. Loss from keyboard to prompt. And loss again when the output comes back and has to be interpreted.

GenAI only ever sees what makes it across that boundary. It reasons over compressed representations of reality that humans have already filtered, simplified, and distorted.

Better models reduce friction within that interface, but they don’t change its dimensionality. In that respect it doesn’t really matter how “smart” the models get, or how well they do on the latest benchmarks. The boundary stays the same.

Because of that, GenAI works best where the world is already well-represented in digital form. As soon as outcomes depend on things outside its boundary, its usefulness drops sharply.

That is why GenAI helps with slices of work, not whole systems. It is powerful, but fundamentally bounded.

Some real world examples:

In software development, generating code hasn’t been the main bottleneck since we moved away from punch cards. The far bigger constraints are understanding the problem, communicating with stakeholders, working effectively with other people, designing the system, managing risks and trade-offs, and operating systems in complex social environments over time.
In healthcare, GenAI can assist with diagnosis or documentation, but outcomes are dominated by staff, facilities, funding, and coordination across complex human systems. Better reasoning does not create more nurses or hospital beds.

In both cases, GenAI accelerates parts of the work without shifting the underlying constraint.

Faster horses, not trains

In that respect, GenAI feels like faster horses rather than trains. It makes us more effective at things we were already doing, writing, code, analysis, planning, and sense-making, but operates on only parts, thin slices of systems.

Trains didn’t just make transport faster. They removed a hard upper bound on the movement of people and goods. Once that constraint moved, everything else reorganised around it. Supply chains, labour markets, cities, timekeeping, and even how people understood distance and work all changed. Railways were not just a tool inside the system, they became the system.

GenAI doesn’t yet do that. It works through a narrow, virtual interface and plugs into existing workflows. But as often as not the real systematic constraints lie elsewhere.

What actually changed the world

A recent conversation reminded me of Vaclav Smil’s How the World Really Works, which I read last year.

Smil highlights that modern civilisation rests on a small number of physical pillars: energy, food production (especially nitrogen), materials like steel and cement, and transport. Changes in these pillars are what led to the biggest transformations in human life. Information technology barely registers at that level in his analysis. He doesn’t deny its importance, but treats it as secondary, an optimiser of systems whose limits are set elsewhere.

Through that lens, GenAI doesn’t (yet) register as a civilisation-shaping force. It doesn’t produce energy, grow food, create new materials, or move mass. It operates almost entirely above those pillars, improving coordination, design, and decision-making around systems whose hard limits are set elsewhere.

That doesn’t make it trivial. But it explains why, so far, it looks closer to previous waves of information technology than to steam or electricity. It optimises within existing constraints rather than breaking them.

The big if

Smil’s framing doesn’t say GenAI cannot matter at an industrial scale. It says where it would have to show up. GenAI becomes civilisation-shaping only if it materially accelerates breakthroughs in those physical pillars – things that change what the world can physically sustain.

This is where “superintelligence” comes in. If GenAI can explore hypothesis spaces humans cannot, design and run experiments, or compress decades of scientific iteration into years, resulting in major scientific breakthroughs, it moves from optimising within constraints to changing them.

This is also where my own doubts sit. Many think just scaling what we have now will get us there. For those that don’t believe that, but are still optimistic about AI’s potential, they turn to world models, embodiment, or agents that can act in the real world. There are sketches and hopes for how this may happen, but as yet, not much more than that.

So while superintelligence is the path by which AI could plausibly become industrial-scale transformative, it’s a long and uncertain one.

What kind of change are we talking about?

If you mean web-scale change, then GenAI is already there. But if we mean the kind of change associated with the industrial revolution (as it’s often compared to) – longer lives, better health, radically different working conditions, step changes in material living standards, then what we have today does not qualify. Historically, those shifts followed from breaking physical constraints, not from better information or reasoning alone.

For me, and why I’m not really feeling successive model improvements, it isn’t that GenAI lacks value. It’s that those improvements don’t change the shape of what’s possible. They operate within the same narrow, lossy interface, so they barely register in practical terms. GenAI still adds value, and already feels web-scale transformative. But until that boundary moves, or something else breaks the underlying constraints, they don’t feel like steps toward an industrial-revolution-scale shift.

More with less, or is it more with the same?

Leave a reply

The crude clickbait narrative is that AI means job cuts, replacing roles. But when I look at how AI is actually being used in real organisations, it seems more likely it’ll be more effective at expanding capacity rather than reduce headcount. Many organisations may end up doing more with the same long before they can credibly do the same (or more) with less.

This thought started for me with an observation – AI is not substituting whole roles, we’re getting micro-specialists that can do slices of work. In software you see agents for tests, code review, planning. Other sectors look much the same. Legal teams using AI for drafting. Sales teams for outreach. Finance for reconciliation. Tools handling tasks, not outcomes, and someone still has to stitch the pieces together.

There are (at least) three forces I can think of that matter when asking whether organisations will genuinely be able do more with less:

1. How automatable the work already is.
Where the work is rules based, high volume, and low variation, AI may replace labour in the same way classic automation has. Think claims processing, simple customer support, structured back office workflows. These functions already lived close to the automation frontier. AI just expands the frontier a bit.
This will reduce headcount, but mostly in places where headcount has been under pressure for decades anyway.

2. How much the organisation can absorb increased output.
Most professional work is not constrained by how fast someone types or drafts. It is constrained by coordination, sequencing, ambiguity, stakeholder alignment, and quality. Software is a good example. So is legal, consulting, product, sales. If you cut the number of lawyers because drafting is faster, you will simply overload the remaining lawyers with negotiation, risk, and client work.

3. The cost and consequences of mistakes.
In many industries, the limiting factor is not productivity, but risk. Healthcare, aviation, finance, law. Increased throughput also increases the risk surface area. If AI increases the probability or cost of an error, you cannot shrink the team. You often need more human oversight, not less.

If you put these together, the more likely outcome is is this:

Some operational functions will shrink, but these were already at risk of automation.
Most knowledge work will shift toward more with the same, not less.
Some domains will accidentally create more with more, because oversight and correction absorb the gains.

AI Is still making code worse: A new CMU study confirms

9 Replies

In early 2025 I wrote about GitClear’s analysis of the impact of GenAI on code quality, based on 2024 data, which showed a significant degradation in code quality and maintainability. I recently came across a new study from Carnegie Mellon, “Does AI-Assisted Coding Deliver? A Difference-in-Differences Study of Cursor’s Impact on Software Projects” that looks at a more recent period, tracking code quality in projects using GenAI tools up to mid-2025. So has code quality improved as the models and tools have matured?

The answer appears to be no. The study finds that AI briefly accelerates code generation, but the underlying code quality trends continue to move in the wrong direction.

How the study was run

Researchers at Carnegie Mellon University analysed 807 open source GitHub repositories that adopted Cursor between January 2024 and March 2025, and tracked how those projects changed through to August 2025. Adoption was identified by looking for Cursor configuration files committed to the repo.

For comparison, the researchers built a control group of 1,380 similar GitHub repositories that didn’t adopt Cursor (see caveats below).

For code quality, they used SonarQube, a widely used and well respected code analysis tool that scans code for quality and security issues. The researchers ran SonarQube monthly to track how each codebase evolved, focusing on static analysis warnings, code duplication and code complexity.

Finally, they attempted to filter out toy or throwaway repositories by only including projects with at least 10 GitHub stars.

Key findings

Compared to the control group:

A short lived increase in code generated: Activity spikes in the first one or two months after adoption. Commits rise and lines added jump sharply, with the biggest increase in the first month
The increase does not persist: By month three, activity returns to baseline. There is no sustained increase in code generated.
Static analysis warnings increase and remain elevated: Warnings rise by around 30 percent post-adoption and stay high for the rest of the observation window.
Code complexity increases significantly: Code complexity rose by more than 40 percent, more than could reasonably be accounted for by just the growth in codebase size.

Caveats/Limitations

The study only looked at open source projects, which aren’t really comparable to production code bases. Also, adoption is inferred from committed Cursor configuration files which I would say is a reasonably reliable signal of usage within those projects. However the control group is not necessarily AI usage free, code in those repositories may still have been created using Copilot, Claude Code or other tools.

My Takeaways

A notable period for AI assisted development

What’s notable is the period this study tracks. In December 2024 Cursor released a major upgrade to their IDE and introduced its agent mode. It was the first time I heard experienced developers I respect describe AI coding assistants as genuinely useful. Cursor adoption climbed quickly and most developers I knew were using Claude Sonnet for day to day coding. Then in February 2025 Anthropic released Claude 3.7 Sonnet, followed in May by Sonnet 4.0 and their first reasoning model, Opus 4.1.

If improvements in models or tooling were going to reverse the code quality issues seen previously, you’d expect it to show up during this period. This study shows no reversal. The pattern is broadly the same as GitClear observed for 2024.

It’s not just “user error”

A common argument is that poor AI-generated code is the user’s fault, not the tool’s. If developers wrote clearer prompts, gave better instructions or reviewed more carefully, quality wouldn’t suffer. This study disagrees. Even across hundreds of real projects, and even after accounting for how much code was added, complexity increased faster in the AI-assisted repos than in the control group. The tools are contributing to the problem, not merely reflecting user behaviour.

Context collapse playing out in real time

Organisations training LLMs probably use similar signals to this study to decide which open source repositories to train on: popularity, activity and signs of being “engineered” rather than experimental. This study shows more than 800 popular GitHub projects with code quality degrading after adopting AI tools. It’s hard not to see a form of context collapse playing out in real time. If the public code that future models learn from is becoming more complex and less maintainable, there’s a real risk that newer models will reinforce and amplify those trends, producing even worse code over time.

Things are continuing to evolve quickly, but…

Of course, things have continued to move quickly since the period this study covers. Claude Code is currently the poster child for GenAI assisted development. Developers are learning how to instruct these tools more effectively through patterns like Claude.md and Agents.md, and support for these conventions is improving within the IDEs.

In my recent experience at least, these improvements mean you can generate good quality code, with the right guardrails in place. However without them (or when it ignores them, which is another matter) the output still trends towards the same issues: long functions, heavy nesting of conditional logic, unnecessary comments, repeated logic – code that is far more complex than it needs to be.

No doubt the tools will continue to improve, and much of the meaningful progress is happening in the IDE layer rather than in the models themselves. However this study suggests the underlying code quality issues aren’t shifting. The structural problems remain, and they aren’t helped by the fact that the code these models are trained on is likely getting worse. The work of keeping code simple, maintainable and healthy still sits with the human, at least for the foreseeable future.

Findings from DX’s 2025 report: AI won’t save you from your engineering culture

1 Reply

The DX AI-assisted engineering: Q4 (2025) impact report offers one of the most substantial empirical views yet of how AI coding assistants are affecting software development, and largely corroborates the key findings from the 2025 DORA State of AI-assisted Software Development Report: quality outcomes vary dramatically based on existing engineering practices, and both the biggest limitation and the biggest benefit come from adopting modern software engineering best practices – which remain rare even in 2025. AI accelerates whatever culture you already have.

Who are DX and why the report matters

DX is probably the leading and most well regarded developer intelligence platform. They sell productivity measurement tools to engineering organisations. They combine telemetry from development tools with periodic developer surveys to help engineering leaders track and improve productivity.

This creates potential bias – DX’s business depends on organisations believing productivity can be measured. But it also means they have access to data most researchers don’t.

Data collection

The report examines data collected between July and October 2025. Drawing on data from 135,000 developers across 435 companies, the data set is substantially larger than most productivity research. It combines:

System telemetry from AI coding assistants (GitHub Copilot, Cursor, Claude Code) & source control systems (GitHub, Gitlab, BitBucket).
Self-reported surveys asking about time savings, AI-authored code percentage, maintainability perception, and enablement quality.

Update: However, they aren’t particularly transparent about what data they used to create their findings. They mention how they calculate AI usage (empirical data) and time savings (self reported surveys), but nothing on how they calculated metrics like CFR, which is a notable one in the report.

Key Findings

Existing bottlenecks dwarf AI time savings

This should be the headline: meetings, interruptions, review delays, and CI wait times cost developers more time than AI saves. Meeting-heavy days are reported as the single biggest obstacle to productivity, followed by interruption frequency (context switching). Individual task-level gains from AI are being swamped by organisational dysfunction. This corroborates 2025 DORA State of AI-assisted Software Development Report findings that systemic constraints limit AI impact.

This image has an empty alt attribute; its file name is Screenshot-2025-11-05-at-09.21.43.png

You can save 4 hours writing code faster, but if you lose 6 hours to slow builds, context switching, poorly-run meetings, the net effect is negative.

Quality impact varies dramatically

The report tracks Change Failure Rate (CFR) – the percentage of changes causing production issues. Results split sharply: some organisations see CFR improvements, others see degradation. The report calls this “varied,” but I’d argue it’s the most important signal in the entire dataset.

What differentiates organisations seeing improvement from those seeing degradation? The report doesn’t fully unpack this.

Modest time savings claimed, but seem to have hit a wall

Developers report saving 3.6 hours per week on average, with daily users reporting 4.1 hours. But this is self-reported, not measured (see limitations).

More interesting: times savings have plateaued around 4 hours even as adoption climbed from ~50% to 91%. The report initially presents this as a puzzle, but the data actually explains it. The biggest finding, buried on page 20, is – as above – that non-AI bottlenecks dwarf AI gains.

Throughput gains measured, but problematic

Daily AI users merge 60% more PRs per week than non-users (2.3 vs 1.4). That’s a measurable difference in activity. Whether it represents productivity is another matter entirely. (More on this in the limitations section.)

Traditional enterprises show higher adoption

Non-tech companies show higher adoption rates than more native tech orgs. The report attributes this to deliberate, structured rollouts with strong governance.

There’s likely a more pragmatic explanation: traditional enterprises are aggressively rolling out AI tools in hopes of compensating for weak underlying engineering practices. The question is whether this works. If the goal is to shortcut or leapfrog organisational dysfunction without fixing the root causes, the quality degradation data suggests it won’t. AI can’t substitute for modern engineering practices; it can only accelerate whatever practices already exist.

Other findings

Adoption is near-universal: 91% of developers now use AI coding assistants, matching DORA’s 2025 findings. The report also reveals significant “shadow AI” usage: developers using tools they pay for themselves, even when their organisation provides approved alternatives.
Onboarding acceleration: Time to 10th PR dropped from 91 days to 49 days for daily AI users. The report cites Microsoft research showing early output patterns predict long-term performance.
Junior devs use AI most, senior devs save most time: Junior developers have highest adoption, but Staff+ engineers report biggest time savings (4.4 hours/week). Staff+ engineers also have the lowest adoption rates. Why aren’t senior engineers adopting as readily? Scepticism about quality? Lack of compelling use cases for complex architectural work?

Limitations and Flaws

Pull requests as a productivity metric

The report treats “60% more PRs merged” as evidence of productivity gains. This is where I need to call out a significant problem – and interestingly, DX themselves have previously written about why this is flawed.

PRs are a poor productivity metric because:

They measure motion, not progress. Counting PRs shows how many code changes occurred, not whether they improved product quality, reliability, or customer value.
They’re highly workflow-dependent. Some teams merge once per feature, others many times daily. Comparing PR counts between teams or over time is meaningless unless workflows are identical.
They’re easily gamed and inflated. Developers (or AI) can create more, smaller, or trivial PRs without increasing real output. “More PRs” often just means more noise.
They’re actively misleading in mature Continuous Delivery environments. Teams practising trunk-based development integrate continuously with few or no PRs. Low PR counts in that model actually indicate higher productivity.

Self-reported time savings can’t be trusted

The “3.6 hours saved per week” is self-reported, not measured. People overestimate time savings. As an example. the METR Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity study found developers estimated they’d got a 20% speedup from AI but were actually 19% slower.

Quality findings under-explored

The varied CFR results are the most important finding, but they’re presented briefly and then the report moves on. What differentiates organisations seeing improvement from those seeing degradation? Code review practices? Testing infrastructure? Team maturity?

The enablement data hints at answers but doesn’t fully investigate. This is a missed opportunity to identify the practices that make AI a quality accelerator rather than a debt accelerator.

Missing DORA Metrics

The report covers Lead Time (poorly, approximated via PR throughput) and Change Failure Rate. But it doesn’t measure deployment frequency or Mean Time to Recovery.

That means we’re missing the end-to-end delivery picture. We know code is written and merged faster, but we don’t know if it’s deployed faster or if failures are resolved more quickly. Without deployment frequency and MTTR, we can’t assess full delivery-cycle productivity.

Conclusion

This is one of the better empirical datasets on AI’s impact, corroborating DORA 2025’s key findings. But the real story isn’t in the headline numbers about time saved or PRs merged. It’s in two findings:

Non-AI bottlenecks still dominate.

Meetings, interruptions, review delays, and slow CI pipelines cost more than AI saves. Individual productivity tools can’t fix organisational dysfunction.

As with DORA’s findings, the biggest limitation and the biggest opportunity both come from adopting modern engineering practices. Small batch sizes, trunk-based development, automated testing, fast feedback loops. AI makes their presence more valuable and their absence more costly.

AI is an accelerant, not a fix

It reveals and amplifies existing engineering culture. Strong quality practices get faster. Weak practices accumulate debt faster. The variation in CFR outcomes isn’t noise – it’s the signal. The organisations seeing genuine gains are those already practising modern software engineering. Those practices remain rare.

My advice for engineering leaders:

Tackle system-level friction first. Four hours saved writing code doesn’t matter if you lose six to meetings, context switching and poor CI infrastructure and tooling.
Adopt modern engineering practices. The gains from adopting a continuous delivery approach dwarf what AI alone can deliver.
Don’t expect AI to fix broken processes. If review is shallow, testing is weak, or deployment is slow, AI amplifies those problems.
Invest in structured enablement. The correlation between training quality and outcomes is strong.
Track throughput properly alongside quality. More PRs merged isn’t a win if it isn’t actually resulting in shipping faster and your CFR goes up. Measure end to end cycle times, CFR, MTTR, and maintainability.

You’re probably listening to the wrong people about AI Coding

1 Reply

Unsurprisingly, there are a lot of strong opinions on AI assisted coding. Some engineers swear by it. Others say it’s dangerous. And of course, as is the way with the internet, nuanced positions get flattened into simplistic camps where everyone’s either on one side or the other.

A lot of the problem is that people aren’t arguing about the same thing. They’re reporting different experiences from different vantage points.

I’ve sketched a chart to illustrate the pattern I’m seeing. It’s not empirical, just observational. It’s more nuanced than this, before camps start arguing about it. This is still an oversimplified generalisation.

The yellow line shows perceived usefulness of AI coding tools. The blue line shows the distribution of engineering competence. The green dotted line shows what the distribution would look like if we went by how experienced people say they are.

Different vantage points

Look at the first peak on the yellow line. A lot of less experienced and mediocre engineers likely think these tools are brilliant. They’re producing more code, feeling productive. The problem is they don’t see the quality problems they’re creating. Their code probably wasn’t great before AI came along. Most code is crap. Most developers are mediocre, so it’s not surprising this group is enthusiastic about tools that help them produce more (crap) code faster.

Then there’s a genuinely experienced cohort. They’ve lived with the consequences of bad code and learnt what good code looks like. When they look at AI-generated code, they see technical debt being created at scale. Without proper guidance, AI-generated code is pretty terrible. Their scepticism is rational. They understand that typing isn’t the bottleneck, and that speed without quality just creates expensive problems.

Calling these engineers resistant to change is lazy and unfair. They’re not Luddites. They’re experienced enough to recognise what they’re seeing, and what they’re seeing is a problem.

But there’s another group at the far end of the chart. Highly experienced engineers working with modern best practices – comprehensive automated tests, continuous delivery, disciplined small changes. Crucially they’ve also learned how work with AI tools using those practices. They are getting productivity without impacting quality. They’re also highly aware typing is not the bottleneck, so not quite as enthusiastic as our first cohort.

Interestingly, I’ve regularly seen sceptical experienced engineers change their view once they’ve been shown how you can blend modern/XP practices with AI assisted coding.

Why the discourse is broken

When someone from that rare disciplined expert group writes enthusiastically about AI tools, it’s easy to assume their experience is typical. It isn’t. Modern best practices are rare. Most teams don’t deploy to production multiple times per day. Most codebases don’t have comprehensive automated tests. Most engineers don’t work in small validated steps with tight feedback loops.

Meanwhile, the large mediocre majority is also writing enthusiastically about these tools, but they’re amplifying dysfunction. They’re creating problems that others will need to clean up later. That’s most of the industry.

And the experienced sceptics – the people who can actually see the problems clearly – are a small group whose warnings get dismissed as resistance to change.

The problem of knowing who to listen to

When you read enthusiastic takes on AI tools, is that coming from someone with comprehensive tests and tight feedback loops, or from someone who doesn’t know what good code looks like? Both sound confident. Both produce content.

When someone expresses caution, are they seeing real problems or just resistant to change?

The capability perception gap – that green dotted line versus reality – means there are probably far fewer people with the experience and practices to make reliable claims than are actually making them. And when you layer on the volume of hype around AI tools, it becomes nearly impossible to filter for signal.

The loudest voices aren’t necessarily the most credible ones. The most credible voices – experienced engineers with rigorous practices – are drowned out by sheer volume from both the mediocre majority and the oversimplified narratives that AI tools are either revolutionary or catastrophic.

We’re not just having different conversations. We’re having them in conditions where it’s genuinely hard to know whose experience is worth learning from.

After the AI boom: what might we be left with?

5 Replies

Some argue that even if the current AI boom leads to an overbuild, it might not be a bad thing – just as the dotcom bubble left behind the internet infrastructure that powered later decades of growth.

It’s a tempting comparison, but the parallels only go so far.

The dotcom era’s overbuild created durable, open infrastructure – fibre networks and interconnects built on open standards like TCP/IP and HTTP. Those systems had multi-decade lifespans and could be reused for whatever came next. Much of the fibre laid in the 1990s still carries traffic today, upgraded simply by swapping out the electronics at each end. That overinvestment became the backbone of broadband, cloud computing, and the modern web.

Most of today’s AI investment, by contrast, is flowing into proprietary, vertically integrated systems rather than open, general-purpose infrastructure. Most of the money is being spent on incredibly expensive GPUs that have a 1-3 year lifespan due to becoming obsolete quickly and wearing out under constant, high-intensity use. These chips aren’t general-purpose compute engines; they’re purpose-built for training and running generative AI models, tuned to the specific architectures and software stacks of a few major vendors such as Nvidia, Google, and Amazon.

These chips live inside purpose-built AI data centres – engineered for extreme power density, advanced cooling, and specialised networking. Unlike the general-purpose facilities of the early cloud era, these sites are tightly coupled to the hardware and software of whoever built them. Together, they form a closed ecosystem optimised for scale but hard to repurpose.

That’s why, if the AI bubble bursts, we could just be left with a pile of short-lived, highly specialised silicon and silent cathedrals of compute – monuments from a bygone era.

The possible upside

Still, there’s a more positive scenario.

If investment outruns demand, surplus capacity could push prices down, just as the post-dotcom bandwidth glut did in the early 2000s. Cheap access to this kind of compute might open the door for new experimentation – not just in generative AI, but in other high-compute domains such as simulation, scientific research, and data-intensive analytics. Even if the hardware is optimised for GenAI, falling prices could still make large-scale computation more accessible overall. A second-hand market in AI hardware could emerge, spreading access to powerful compute much more widely.

The supporting infrastructure – power grid upgrades, networking, and edge facilities – will hopefully remain useful regardless. And even if some systems are stranded, the talent, tooling, and operational experience built during the boom will persist, as it did after the dotcom crash.

Without openness, the benefits stay locked up

The internet’s long-term value came not just from cheap capacity, but from open standards and universal access. Protocols like TCP/IP and HTTP meant anyone could build on the same foundations, without permission or platform lock-in. That openness turned surplus infrastructure into a shared public platform, unlocking decades of innovation far beyond what the original investors imagined.

The AI ecosystem is the opposite: powerful but closed. Its compute, models, and APIs are owned and controlled by a handful of vendors, each defining their own stack and terms of access. Even if hardware becomes cheap, it won’t automatically become open. Without shared standards or interoperability, any overbuild risks remaining a private surplus rather than a public good.

So the AI boom may not leave behind another decades-long backbone like the internet’s fibre networks. But it could still seed innovation if the industry finds ways to open up what it’s building – turning today’s private infrastructure into tomorrow’s shared platform.

Update: This post has received quite a lot of attention on HackerNews. Link to comments if you enjoy that sort of thing. Also, hi everyone 👋, I’ve written a fair bit of other stuff on AI, among other things, if your interested.

On “Team dynamics after AI” and the Illusion of Efficiency

Leave a reply

This is one of the most important pieces of writing I’ve read on AI – and that’s not the kind of thing I say lightly. If you’re leading in a business right now and looking at AI adoption, it’s worth your full attention.

Duncan Brown’s Team dynamics after AI isn’t about model performance or the usual surface-level debates. It’s about the potential for AI to quietly reshape the structure and dynamics of teams – how work actually gets done.

He shows how the promise of AI enabling smaller teams (“small giants”) and individuals taking on hybrid roles can lead organisations to blur boundaries, remove friction and assume they can do more with less. But when that happens, you lose the feedback loops, diversity of perspective – and start to erode the structural foundations that quietly hold alignment together and make teams effective.

He also points to something I’ve been saying for a while – that AI doesn’t necessarily make us more productive, it can just make us busier. More output, more artefacts, more noise – but not always more value.

Here lies the organisational risk. The system starts to drift. Decisions narrow. Learning slows. More artefacts get produced, but they create more coordination and interpretation work, not less. The subtle structures that keep context and coherence together begin to thin out. Everything looks efficient – right up until it isn’t.

A bit like what happened with Nike: they optimised for the short-term and de-emphasised the harder, slower work that built long-term brand strength. It seemed to work at first, but the damage wasn’t visible until it was too late and it’ll now take them years to build back.

It’s also written by someone who’s been deep in the trenches – leading engineering at the UK Gov’s AI incubator, so not your usual ill-informed AI commentator.

And as a massive Ian MacKaye/Fugazi fan and a lapsed skateboarder, it honestly feels like another me wrote it.

Essential reading. It’s a long read – get a brew and a quiet 15 minutes.