June | 2024 | Rob Bowley

I recently wrote that we may be reaching a plateau with GenAI development, and the implications for use in business-critical systems, due to their current limitations. But just how precise do they need to be?

🛣 Road Transportation in the UK shows a 99.99996% safety rate [source], with only 0.4 casualties per million miles travelled in 2022. 🛩 Aviation globally is similar, with one accident for every 1.26 million flights in 2023 [source].

🏥 Healthcare, not so great. According to the WHO, 1 in 10 patients are harmed in healthcare worldwide globally, with 1 in 20 preventable [source]. For the sake of the argument, let’s say accuracy here would need to exceed 95% (it’s a lot more complicated than this of course).

Most industries have stringent regulatory safety standards they need to comply with, and the consequences of errors can come with huge financial implications and often criminal punishment.

How accurate are GenAI systems currently?

Studies I found show GPT model accuracy varies widely, from ~50% to ~90% (the higher end generally for more simple tasks) [source].

For healthcare, while LLMs like ChatGPT4 are good at interpreting medical notes, their accuracy drops in complex diagnosis – 93% in identifying common diseases but only 53.3% in identifying the most likely diagnosis, far behind physicians at 98.3% [source]. And 18% less accurate in diagnosis than radiologists in musculoskeletal radiology [source].

As anyone who’s worked on system reliability will tell you, it’s the last mile that’s the hardest. Improvements often face diminishing returns, especially as you approach higher levels of reliability.

This is why, unless there’s another exponential leap, we could still be a long way off from these models being reliable enough to be used in business critical systems (of course this doesn’t mean there isn’t the potential for lots of other valuable uses for GenAI).

I’m regularly asked for my advice on 𝗼𝘂𝘁𝘀𝗼𝘂𝗿𝗰𝗶𝗻𝗴 𝗮𝗻𝗱 𝗼𝗳𝗳𝘀𝗵𝗼𝗿𝗶𝗻𝗴 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴, for both start-ups and established orgs, so here you go👇

Early stage start-ups

For 𝗲𝗮𝗿𝗹𝘆-𝘀𝘁𝗮𝗴𝗲 𝘀𝘁𝗮𝗿𝘁-𝘂𝗽𝘀 unless you’re well funded, I typically recommend outsourcing, as hiring permanent engineers at this stage is a significant cost and commitment. However, big caveats here: I rarely come across founders who’ve had a good experience. If you’re not an experienced technical founder, I’d advise getting a fractional CTO or similar to support you. While an additional expense, it’ll save you money in the long run (a lot of my talk “Navigating the Tech Galaxy for Early Stage Start-ups” covers how to avoid common pitfalls).

Established organisations

The following advice is for everyone else, from later stage start-ups/scale-ups to large enterprise organisations

Account for management overhead

For 𝗲𝘀𝘁𝗮𝗯𝗹𝗶𝘀𝗵𝗲𝗱 𝗼𝗿𝗴𝗮𝗻𝗶𝘀𝗮𝘁𝗶𝗼𝗻𝘀 considering outsourcing – either for perceived cost savings or due to a lack of internal capability – the management overhead is often underestimated. To work effectively, it requires more oversight than an internal team, especially if they’re working in a different timezone. Organisations often pendulum swing from “expensive” internal capability to outsourced/offshore, only to find things take longer, objectives aren’t met, cost savings aren’t realised, and then swing back the other way.

Outsourcing projects

If you’re thinking about 𝗼𝘂𝘁𝘀𝗼𝘂𝗿𝗰𝗶𝗻𝗴 𝗮 𝗽𝗿𝗼𝗷𝗲𝗰𝘁, make sure to factor in for a ramp up period for new engineers to get familiar with the code and architecture (typically around a month) and, for what even might be considered “stand alone” projects, not insignificant impact on the existing team required to support (and notwithstanding whether your architecture effectively supports another team working in the codebase). For these reasons I advise a minimum 3 month engagement and the most beneficial impact over a medium to long term period and an ongoing relationship.

Insourcing

My best experience has been with “𝗶𝗻𝘀𝗼𝘂𝗿𝗰𝗶𝗻𝗴” or 𝘁𝗲𝗮𝗺 𝗮𝘂𝗴𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 – bolstering internal teams with people from external partners, meaning lower management overhead, allowing you to flex capacity based on demand and retain knowledge internally, avoiding dependency on a partner. It won’t work if the supplier is working in a significantly different timezone.

True partner relationship

In all cases, it works best when it’s a 𝘁𝗿𝘂𝗲 𝗽𝗮𝗿𝘁𝗻𝗲𝗿 𝗿𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝘀𝗵𝗶𝗽, not a customer/supplier one. It should feel like you’re all part of the same team. Your contracts and commercial relationships can incentivise this – ensure quality and operational requirements are shared responsibilities, and define mutual obligations to enable effective collaboration.

In-house teams

Overall, my best experience – both in terms of cost and outcomes – has been with 𝗳𝘂𝗹𝗹𝘆 𝗶𝗻-𝗵𝗼𝘂𝘀𝗲 𝘁𝗲𝗮𝗺𝘀, using independent contractors sparingly to flex where needed or fill short-term capability gaps. This approach works best with a broadly even and predictable pattern of demand.

Important footnote: UK R&D Tax Relief Changes

If you’re using or thinking of using nearshore/offshore engineering partners and intending to claim tax credits under the UK R&D Tax Relief scheme, note that since April 2024 overseas expenditure is no longer eligible under the scheme, except for very limited circumstances.

Rob Bowley

Adventures In Software Development

Monthly Archives: June 2024

How accurate do GenAI models need to be to be used in business critical systems?

Practical advice on outsourcing & offshoring engineering