Monthly Archives: June 2024

How accurate do GenAI models need to be to be used in business critical systems?

I recently wrote that we may be reaching a plateau with GenAI development, and the implications for use in business-critical systems, due to their current limitations. But just how precise do they need to be?

๐Ÿ›ฃ Road Transportation in the UK shows a 99.99996% safety rate [source], with only 0.4 casualties per million miles travelled in 2022. ๐Ÿ›ฉ Aviation globally is similar, with one accident for every 1.26 million flights in 2023 [source].

๐Ÿฅ Healthcare, not so great. According to the WHO, 1 in 10 patients are harmed in healthcare worldwide globally, with 1 in 20 preventable [source]. For the sake of the argument, let’s say accuracy here would need to exceed 95% (itโ€™s a lot more complicated than this of course).

Most industries have stringent regulatory safety standards they need to comply with, and the consequences of errors can come with huge financial implications and often criminal punishment.

How accurate are GenAI systems currently?

Studies I found show GPT model accuracy varies widely, from ~50% to ~90% (the higher end generally for more simple tasks) [source].

For healthcare, while LLMs like ChatGPT4 are good at interpreting medical notes, their accuracy drops in complex diagnosis – 93% in identifying common diseases but only 53.3% in identifying the most likely diagnosis, far behind physicians at 98.3% [source]. And 18% less accurate in diagnosis than radiologists in musculoskeletal radiology [source].

As anyone who’s worked on system reliability will tell you, itโ€™s the last mile thatโ€™s the hardest. Improvements often face diminishing returns, especially as you approach higher levels of reliability.

This is why, unless thereโ€™s another exponential leap, we could still be a long way off from these models being reliable enough to be used in business critical systems (of course this doesnโ€™t mean there isnโ€™t the potential for lots of other valuable uses for GenAI).

Practical advice on outsourcing & offshoring engineering

Iโ€™m regularly asked for my advice on ๐—ผ๐˜‚๐˜๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ถ๐—ป๐—ด ๐—ฎ๐—ป๐—ฑ ๐—ผ๐—ณ๐—ณ๐˜€๐—ต๐—ผ๐—ฟ๐—ถ๐—ป๐—ด ๐—ฒ๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด, for both start-ups and established orgs, so here you go๐Ÿ‘‡

Early stage start-ups

For ๐—ฒ๐—ฎ๐—ฟ๐—น๐˜†-๐˜€๐˜๐—ฎ๐—ด๐—ฒ ๐˜€๐˜๐—ฎ๐—ฟ๐˜-๐˜‚๐—ฝ๐˜€ unless you’re well funded, I typically recommend outsourcing, as hiring permanent engineers at this stage is a significant cost and commitment. However, big caveats here: I rarely come across founders whoโ€™ve had a good experience. If you’re not an experienced technical founder, Iโ€™d advise getting a fractional CTO or similar to support you. While an additional expense, it’ll save you money in the long run (a lot of my talk โ€œNavigating the Tech Galaxy for Early Stage Start-upsโ€ covers how to avoid common pitfalls).

Established organisations

The following advice is for everyone else, from later stage start-ups/scale-ups to large enterprise organisations

Account for management overhead

For ๐—ฒ๐˜€๐˜๐—ฎ๐—ฏ๐—น๐—ถ๐˜€๐—ต๐—ฒ๐—ฑ ๐—ผ๐—ฟ๐—ด๐—ฎ๐—ป๐—ถ๐˜€๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐˜€ considering outsourcing – either for perceived cost savings or due to a lack of internal capability – the management overhead is often underestimated. To work effectively, it requires more oversight than an internal team, especially if theyโ€™re working in a different timezone. Organisations often pendulum swing from “expensive” internal capability to outsourced/offshore, only to find things take longer, objectives arenโ€™t met, cost savings arenโ€™t realised, and then swing back the other way.

Outsourcing projects

If youโ€™re thinking about ๐—ผ๐˜‚๐˜๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ถ๐—ป๐—ด ๐—ฎ ๐—ฝ๐—ฟ๐—ผ๐—ท๐—ฒ๐—ฐ๐˜, make sure to factor in for a ramp up period for new engineers to get familiar with the code and architecture (typically around a month) and, for what even might be considered โ€œstand aloneโ€ projects, not insignificant impact on the existing team required to support (and notwithstanding whether your architecture effectively supports another team working in the codebase). For these reasons I advise a minimum 3 month engagement and the most beneficial impact over a medium to long term period and an ongoing relationship.

Insourcing

My best experience has been with โ€œ๐—ถ๐—ป๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ถ๐—ป๐—ดโ€ or ๐˜๐—ฒ๐—ฎ๐—บ ๐—ฎ๐˜‚๐—ด๐—บ๐—ฒ๐—ป๐˜๐—ฎ๐˜๐—ถ๐—ผ๐—ป –  bolstering internal teams with people from external partners, meaning lower management overhead, allowing you to flex capacity based on demand and retain knowledge internally, avoiding dependency on a partner. It won’t work if the supplier is working in a significantly different timezone.

True partner relationship

In all cases, it works best when itโ€™s a ๐˜๐—ฟ๐˜‚๐—ฒ ๐—ฝ๐—ฎ๐—ฟ๐˜๐—ป๐—ฒ๐—ฟ ๐—ฟ๐—ฒ๐—น๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐˜€๐—ต๐—ถ๐—ฝ, not a customer/supplier one. It should feel like youโ€™re all part of the same team. Your contracts and commercial relationships can incentivise this – ensure quality and operational requirements are shared responsibilities, and define mutual obligations to enable effective collaboration.

In-house teams

Overall, my best experience – both in terms of cost and outcomes – has been with ๐—ณ๐˜‚๐—น๐—น๐˜† ๐—ถ๐—ป-๐—ต๐—ผ๐˜‚๐˜€๐—ฒ ๐˜๐—ฒ๐—ฎ๐—บ๐˜€, using independent contractors sparingly to flex where needed or fill short-term capability gaps. This approach works best with a broadly even and predictable pattern of demand.

Important footnote: UK R&D Tax Relief Changes

If you’re using or thinking of using nearshore/offshore engineering partners and intending to claim tax credits under the UK R&D Tax Relief scheme, note that since April 2024 overseas expenditure is no longer eligible under the scheme, except for very limited circumstances.