Category Archives: Uncategorized

Getting started with running, the easy way

I started running about a year ago. I had always hated the idea of it. I started anyway because I needed some exercise and running was the only thing I could practically fit in. No gym, no kit faff, no booking anything. Put shoes on, leave house.

For the first six months I did it badly. Every run was as fast as I could, chasing PBs on Strava. I felt sick at the end of every run and kept picking up niggling injuries. The times improved for a while, then stopped. That wall was what made me actually do some reading. This is basically everything I wish I knew when I started.

Kit

Get fitted for proper running shoes. Runners Need do a free fitting, they put you on a treadmill, look at your gait, and recommend shoes that suit how you actually run. A decent pair is around £80-90. This is the single most important thing you can do to avoid injuries before you have even started. Skip the carbon-plated stuff, it is not for you yet, probably not ever.

Get a watch that tracks heart rate. You will need it for the next bit. A second-hand Garmin Forerunner 55 on eBay is about £80 and does everything you need. If you already have a smartwatch that tracks heart rate, that is fine.

That’s all you need (assuming you have shorts and a t-shirt).

Run slowly. Much more slowly than you think.

ALL of your running should be in “zone 2”, which means a heart rate low enough that you could hold a conversation. It feels absurdly slow at first. People will overtake you walking briskly. You will feel like you are not really exercising.

This is the point. Slow running is low impact, builds aerobic fitness, and does not break you. Once I started running almost entirely in zone 2, the injuries stopped and I got fitter rather than more battered and could run more often. Zone 2 is hard to judge by feel when you start out, which is why the watch matters, you will go too fast without realising.

Set up a “slow run” workout on your watch that alerts you when you go above zone 2, and stick to it. The watch will work out your zone 2 from your age, you do not need to calculate anything. Start at fifteen minutes and add time gradually. Try and get out a couple of times a week. There is no need to do anything more than this for at least the first six months.

Technique

Technique is important to avoid injuries.

Hips forward, shoulders back, arms held up roughly like you are holding drumsticks. Lean forward slightly. Light, bouncy steps, landing on the ball of your foot or flat, not on your heel. Keep strides short. If you want to go faster, take more steps rather than longer ones. Try to land evenly on both sides.

Pay attention to your form. It will improve with practice. If your form starts falling apart, slow down or walk. Running with broken technique is how you get injured.

Strength

Strength training is important, but at least for now, you do not need a gym programme. Lunges are the best thing you can do for the time invested, they strengthen the muscles around the knee and hip which is where most running injuries come from. They also fit naturally onto the end of a run. I do sixty lunges as a warm-down at the end of each run, three sets of ten on each side.

Why this works

The whole point is to keep running, get fitter, and enjoy it. Every recommendation here is really about not getting injured, because if you avoid injuries you run more often, and if you run more often you get fitter. You will naturally get faster over time without trying to. If you push for speed early you will get hurt and stop. It will take time though, there is no shortcut.

A year of going slowly has done more for my fitness than the six months of going hard ever did. I can now run 10k literally without breaking a sweat.

I am now, regrettably, a running bore.

Your SDLC is a power tool, not a compliance document

Leave a reply

The Software Delivery Lifecycle (SDLC) document sitting in your governance folder is one of the most useful tools in the business. Most orgs never use it that way. They treat it as a compliance box ticking exercise, and teams see it as a governance burden at best.

The SDLC is a value stream. Once you appreciate this it becomes a power tool. It defines how an organisation turns concept into cash, and because it defines it, it’s also how you change it. Treat it as a compliance artefact and you’ve wasted one of the most powerful levers you have.

My approach

I follow a pretty consistent approach with clients. Map the end to end value stream (concept to cash). Run a RACI exercise to bring clarity to accountabilities and responsibilities between roles. Do a value stream mapping exercise to identify pain points and areas for improvement.

Then convert all of it into a full SDLC. Importantly, most SDLCs only start at requirements or design, which means they start halfway through. Mine start at strategic planning, where opportunities get prioritised and decisions about what to build actually happen.

Collaboration, not silos

SDLCs are often accused of reinforcing silos. Done well they do the opposite, encoding where collaboration is expected rather than where handoffs occur. A recent example with a client: their SDLC now expects software engineers to be actively involved in solution design and requirements definition rather than work being prescribed to them.

Done this way, the SDLC stops being a document and starts being a mechanism, an actual working artefact rather than just a compliance burden, and a powerful lever for change.

What it takes to benefit from GenAI coding

1 Reply

GenAI coding tools are genuinely powerful. In the right hands, in the right environment, the stuff is remarkable.

Experienced engineers with good practices around them are doing things in hours that used to take weeks. Ideas get tested that previously stayed as hypotheses. Long-standing technical debt is getting cleared. Work that wasn’t worth the investment a year ago is now done in an afternoon.

Right environment means organisations that genuinely understand software engineering. An appreciation that building software is not a production line, but a learning process.

Right hands means experienced software engineers who take full end to end ownership. Product mindset. XP practices. Continuous delivery with all the automation, tests and guardrails that let you learn and iterate quickly without breaking things.

Most organisations don’t have that, which is why most of the industry isn’t getting much from these tools.

The organisations best placed to benefit from GenAI are the ones who invested in engineering foundations years ago. For everyone else, the shortcut you were hoping for doesn’t exist.

For CEOs and founders hoping to benefit, the answer isn’t as simple as handing out Claude licences (as Jason Gorman puts it, “just because you attach a code-generating firehose to your plumbing, that doesn’t mean you’ll get a power shower”). It’s investing in the engineering culture and practices. Unglamorous, slow work, but there’s no way around it.

Footnote: By experienced I don’t mean “senior” by the way. Most “senior” engineers I meet have never worked in a genuine XP or continuous delivery environment. They have years of experience, just not the experience that matters.

Experienced in this context means having built and shipped software in organisations that understand the craft. Fast feedback, small batches, tests as a design tool, code as a liability to be managed. That’s not about title or tenure. It’s about the environment you learned in.

I’ve worked with many “juniors” with e.g. 2-4 years experience who run rings around people with 10+. Because they learned in the right environment from the start.

Anthropic squeezed three ways

1 Reply

Anthropic’s Claude Code pricing fiasco is what it looks like when a company is squeezed at three ends. Anthropic quietly removed Claude Code from the $20 Pro plan, making it exclusive to the $100 and $200 Max tiers. Their Head of Growth framed it as a small test on 2% of new signups (which didn’t match what users were seeing). Within hours they reversed it.

What interests me is what the test reveals about the bind Anthropic is in. It was an attempt to fix unit economics: heavy users on flat-fee plans consume vastly more than the plans recover, and the Head of Growth, Amol Avasare, said as much on X – plans weren’t built for current usage patterns.

That’s one real pressure. But it’s not the only one. They’re squeezed three ways at once.

The first squeeze is unit economics. Someone running Claude Code all day on a $20 subscription costs far more to serve than they pay. Either prices go up or costs come down. However raising prices risks making them uncompetitive against OpenAI and Google, who are already taking advantage of this moment.

The second squeeze is compute. Claude has been below 99% uptime for a quarter. They are clearly struggling with the huge increase in demand they’re experiencing. A year ago the product was mostly chat. Today a significant share of usage is coding agents running for hours. Demand shape changing faster than provisioning can keep up.

So why not do what Gmail and Bluesky did and gate new signups? Match supply to demand, protect the experience for existing users, generate some FOMO and desirability in the process, and buy time to sort the rest out.

That brings us to the third squeeze. Anthropic’s valuation, like that of every frontier AI lab, rests on growth trajectory rather than current profitability. However dressed up, limiting signups reads as a capacity wall, and from there it’s a short step to growth slowing and the IPO narrative wobbling.

The best approach for managing the compute squeeze is ruled out by the growth squeeze, which means infrastructure strain has to be absorbed through rate limits and outages instead, upsetting all your existing users in the process.

It also means heavy users keep arriving, which continues to make the unit economics worse, which is how you end up running silent pricing tests on Tuesday afternoons.

AI “Watershed Moment” or expensive pen tester? The AISI Mythos Data

1 Reply

The UK’s AI Security Institute has published the first independent evaluation of Claude Mythos’s cyber capabilities. The headline finding – first AI model to complete a full 32-step simulated network attack – is notable. But there’s a finding buried in the accompanying methodology paper that puts it in a rather different light. On current pricing and reliability, according to my maths, a human expert would do the same job cheaper, faster and more reliably.

What AISI found

On capture-the-flag tasks – common security challenges AISI have been using to test models since 2023 – Mythos sits broadly on the existing trend line. Real improvement, but incremental, and not unique to Mythos. The capability has been building across multiple labs for over a year.

The more significant result is with what AISI call “chained attacks” – where a model has to execute a long sequence of steps across a network to take it over, rather than exploit a single vulnerability in isolation. AISI measured this using their “The Last Ones” simulation: a 32-step corporate network attack spanning initial reconnaissance through to full network takeover, which they estimate a human expert would complete in 14 hours.

Mythos is the first model to complete all 32 steps end to end – though Opus 4.6, Anthropic’s previous model, wasn’t far behind in its best run.

The limitations & takeaways

The model was already inside the network, and the simulated environment had no active security monitoring and no defensive tools. Real networks aren’t like that – at least they shouldn’t be.

For most organisations the biggest threats remain phishing, weak passwords, and unpatched systems. AISI’s own advice in the article reflects this: focus on the basics – patch regularly, enforce access controls, enable logging. More importantly, the most common and successful attacks continue to target humans rather than rely on technical sophistication – as the Co-op, M&S and JLR attacks last year demonstrated.

The trajectory is real and worth taking seriously – but AISI’s findings are more measured than Anthropic’s “watershed moment” framing, and the most important things you can do about it are the same things you should have been doing anyway.

There’s a finding buried in the methodology

AISI published an accompanying academic paper detailing the evaluation methodology and results for models prior to Mythos – including detailed cost and timing data. This is where things get interesting.

According to that paper, the best Opus 4.6 run at 100M tokens cost approximately $80 and took around 10 hours – completing 22 of 32 steps, equivalent to roughly 6 of those 14 human hours. Slower, and less than halfway through in human time equivalent.

Mythos is priced at 5x Opus 4.6 per token. Its best run completed 32 steps versus Opus 4.6’s 22 – but crucially the additional steps fall in the later, harder milestones which are significantly more time and token-intensive. Accounting for both the price differential and likely higher token usage on those harder steps, a rough extrapolation puts a Mythos run at approximately $880.

The variance problem

The paper shows all models have very high variance across runs. Opus 4.6’s best run reached 22 steps, its worst only 11, with an average of 15.6. And the AISI article shows Mythos only completed all 32 steps in 3 of its 10 attempts – a 70% failure rate on full completion.

To expect one successful outcome you’d need 3-4 runs on average – and each run is likely comparable in time to the Opus 4.6 runs.

That’s approximately $2,900-3,500 per successful outcome.

A human expert completing the same range: 14 hours, once, reliably. At $125-190 per hour (UK rates) that’s $1,750-2,625.

So at least today, and according to AISI data, and assuming my maths are roughly correct, an experienced cyber human would be cheaper, more reliable and at least as quick as the most capable AI model currently available.

METR’s developer productivity research: 2026 update

Leave a reply

You may be seeing posts claiming METR’s widely-cited 2025 study has been followed up with new research showing an 18% productivity boost. That’s not what the article says.

METR: We are Changing our Developer Productivity Experiment Design

In 2025, METR found experienced open-source developers using GenAI were 19% slower – and notably, developers themselves thought they were being sped up. They started a new experiment to track how things were changing – but couldn’t complete it. They say the data was too compromised to produce reliable results.

The interesting part is why the study broke down. Developers are now so reliant on AI that they won’t work without it (to be part of the control group). And the nature of how they work when using GenAI has changed too, which undermines e.g. simple time-on-task measurements.

METR believe GenAI coding productivity is improving – but say they can no longer measure it reliably with this study design and are reworking their approach. Personally I don’t see how you can practically design an effective controlled experiment considering everyone uses it now.

Worth noting too that, either way, these kinds of studies are still a narrow window on software delivery – individual task completion by autonomous open-source contributors, not developers working in teams on production codebases with all the organisational complexity that entails.

Multiple studies now suggest AI is genuinely increasing coding velocity – including CircleCI’s recent State of Software Delivery report. But the same report points to a more troubling pattern at the system level: less code reaching production and increasing instability.

My take: teams with strong engineering practices – genuine continuous delivery, high code quality, solid test coverage – appear to be realising real benefits from GenAI, and there’s data to support that. The problem is those teams represent a small fraction of the industry. For everyone else, higher coding velocity is likely resulting in a negative impact downstream.

More code, less delivery but does the CircleCI 2026 Report really show 1 in 20 teams are benefiting?

Leave a reply

CircleCI’s 2026 State of Software Delivery report has two findings that are already travelling: AI is meaningfully boosting software delivery, but only 1 in 20 teams are capturing that benefit. Both claims are more uncertain than the report suggests, for different reasons.

What the report is measuring

The report’s primary metric is “throughput” – the number of times a CI pipeline runs per day. A CI pipeline is the automated process teams use to build, test and progress code toward production. It is not production deployments, it is not features shipped. The report is using pipeline execution data to infer things about software delivery. That’s not unreasonable – it’s real data – but it’s worth understanding what’s actually being measured before drawing conclusions.

The headline numbers

The report measures throughput on both feature branches and main branches and aggregates both into its headline figures. Throughput as a metric on feature branches is effectively meaningless. Throughput should be an end-to-end metric – feature branches aren’t end-to-end, they get merged to main. The only meaningful “throughput” measure is against the main branch. What the feature branch data actually shows is a lot more code being written, but not much more reaching production.

Average teams are up 4% on the aggregated figure, but main branch throughput is down 7%
The top 10% of teams show aggregated throughput up nearly 50%, main branch essentially flat
For 95% of teams, AI is generating more work in progress that isn’t shipping

The success rate of main branch builds compounds this further. It has fallen to 70.8%, its lowest in over five years – 30% of attempts to merge code for production are now failing.

The 1 in 20 claim

The report identifies the top 5% of teams as the only group seeing meaningful main branch throughput growth – 26% – and uses this to argue that some teams have cracked the AI delivery problem.

But the summary data for that group is odd. Their average CI pipeline duration is 6 seconds. A pipeline doing anything meaningful – compiling, running tests, scanning – it’s hard to think of a single CI step that legitimately completes in 6 seconds. Perhaps it is an error in the report. There’s also data that may be skewing the findings more broadly – one team apparently running 130,000 CircleCI workflows a day would have an outsized effect on any aggregate figures.

What to take from it

The integration bottleneck finding is credible. If you’re generating code faster than your team can review and integrate it (safely), that’s a genuine problem this data is consistent with.

The “1 in 20 teams have cracked it” conclusion is less solid than it appears. Not to say that some aren’t getting benefit I believe there are, however the data here for the teams making that case doesn’t add up clearly enough to draw confident lessons from.

Evidence that GenAI is widening the software engineering skills gap

Leave a reply

According to the FT, demand for software engineers is rising again, and in relative terms is outperforming the wider jobs market. That’s the headline most people will take away.

But the more important detail is that the growth is concentrated in more experienced roles, while entry-level hiring remains weak.

That broadly lines up with a concern I wrote about recently for Ada National College for Digital Skills: GenAI is amplifying the already existing skills gap in software engineering.

Genuinely strong, experienced engineers are already a relatively small pool. If the FT data is right, demand is increasingly being concentrated into that already constrained part of the market.

That means a supply squeeze. Salary inflation, harder hiring, slower execution, and more organisations unable to deliver on the strategy and goals they have set.

Further, as we saw in the pandemic hiring boom, shortage pressure creates title inflation and level distortion, such as mid-level people taking senior or lead roles at new companies. The result is organisations paying more for talent that is, on average, less experienced than the role implies, while carrying more execution risk and less capability depth than the org chart suggests.

If companies keep optimising away that layer, it is not just bad for the industry overall, it is economically short-sighted for the individual organisation. It increases future dependence on an already constrained and expensive part of the market, while missing the real value of junior engineers, not just what they contribute now, but the capability they become.

Parkinson’s Law is really about organisational bloat

Leave a reply

Parkinson’s Law is mainly interpreted as “people fill the time available”. That line was really only a hook. The original paper is really about how organisations create extra work and headcount for themselves, regardless of whether there is more genuinely useful work to do.

As organisations grow, work often grows around the work itself – layers, handoffs, approvals, reporting, internal dependency. More work about work.

That’s why adding more people, more process, more tooling (more AI?) so often fails to produce the productivity gains people expect.

Adding people can be beneficial to a point, but often far less than people might expect – and beyond a point, things usually get slower and more expensive, not faster. Doug Putnam’s analysis of hundreds of software projects found small teams were generally best, with 3-5 often the economic sweet spot, and larger teams quickly suffering diseconomies of scale.

The countermeasures to keep things as small, clear and contained as possible:

clear direction and priorities
clear accountability, ownership and decision boundaries
ruthless prioritisation
less work in flight – a culture of finishing

This is why I keep coming back to the same principle:

Fewer, better people doing less, better will usually get more done.

Not because small is always magically better. But because complexity compounds, and every extra person, process, dependency and priority adds drag.

Footnote: Probably the grossest misinterpretation of Parkinson’s Law is the idea that giving individuals less time or tighter deadlines will somehow make work happen faster. In practice, that will often just compresses time without removing the underlying drag.

GenAI is amplifying the skills gap in software engineering

Leave a reply

This is a cross post of an article I wrote for Ada, the National College for Digital Skills

All the available evidence suggests that GenAI-assisted coding is most powerful in the hands of highly experienced software engineers, while having neutral or even negative effects for less experienced ones.

It’s easy to see how this may be interpreted inside organisations. If experienced engineers can be made significantly more productive with GenAI, then it can appear rational to rely more heavily on that group. Smaller teams of senior engineers, supported by GenAI, with fewer junior or entry-level roles, can look like an attractive opportunity.

However, genuinely good, experienced software engineers are already scarce across the industry. It is difficult to put a precise figure on this, but studies on the impact of AI in software engineering suggest that only a minority of engineers and teams currently have the skills and experience needed to realise sustained benefits from GenAI-assisted software development, likely somewhere in the region of 10–30%.

That raises an obvious question about where the next generation of experienced engineers will come from.

The risk is not only that organisations hire fewer junior engineers, but that even when they do, the conditions for learning are compromised. A recent study by Anthropic, one of the organisations at the frontier of GenAI and the creators of the Claude models and tools such as Claude Code, found that developers using AI assistance completed coding tasks slightly faster but demonstrated significantly weaker understanding afterwards. When the tool was allowed to do too much of the thinking, learning suffered.

More generally, as organisations offload more work to these tools, team dynamics begin to shift. Fewer questions are asked. Explanation gives way to acceptance. Output rises, but shared understanding does not.

This all happens quietly. Everything looks efficient, right up until it’s not.

Despite repeated waves of tooling, the core skills that define good software engineering have remained remarkably stable. Effective problem-solving, system-level thinking, feedback, shared understanding, automated testing and iterative change have been recognised as good practice for decades.

Learning is not just an individual concern. Software development is a learning activity at every level. Teams learn about users, systems, risks, and constraints through the work itself.

What has also remained true, despite this being well understood, is that only a minority of the industry consistently applies them. The evidence also increasingly suggests these practices are becoming even more relevant in the age of GenAI.

GenAI accelerates this dynamic. Without deliberate effort, it can speed up delivery while quietly weakening an organisation’s ability to create and sustain expertise. When organisations optimise purely for short-term efficiency, learning is often the first thing to erode. When learning slows, capability follows.

The organisations that will thrive in the GenAI era will not be the ones that simply adopt the tools. They will be the ones that treat learning as core to how they operate.

That includes investing in early-career development, creating environments where experience is accumulated rather than bypassed, and recognising that effective software development has always depended on people who can exercise judgement, reason about systems, and learn continuously, not just produce output.

Rob Bowley

Adventures In Software Development

Category Archives: Uncategorized

Getting started with running, the easy way

Kit

Run slowly. Much more slowly than you think.

Technique

Strength

Why this works

Your SDLC is a power tool, not a compliance document

My approach

Collaboration, not silos

What it takes to benefit from GenAI coding

Anthropic squeezed three ways

AI “Watershed Moment” or expensive pen tester? The AISI Mythos Data

What AISI found

The limitations & takeaways

There’s a finding buried in the methodology

The variance problem

METR’s developer productivity research: 2026 update

More code, less delivery but does the CircleCI 2026 Report really show 1 in 20 teams are benefiting?

Evidence that GenAI is widening the software engineering skills gap

Parkinson’s Law is really about organisational bloat

GenAI is amplifying the skills gap in software engineering