I recently wrote that we may be reaching a plateau with GenAI development, and the implications for use in business-critical systems, due to their current limitations. But just how precise do they need to be?
๐ฃ Road Transportation in the UK shows a 99.99996% safety rate [source], with only 0.4 casualties per million miles travelled in 2022. ๐ฉ Aviation globally is similar, with one accident for every 1.26 million flights in 2023 [source].
๐ฅ Healthcare, not so great. According to the WHO, 1 in 10 patients are harmed in healthcare worldwide globally, with 1 in 20 preventable [source]. For the sake of the argument, let’s say accuracy here would need to exceed 95% (itโs a lot more complicated than this of course).
Most industries have stringent regulatory safety standards they need to comply with, and the consequences of errors can come with huge financial implications and often criminal punishment.
How accurate are GenAI systems currently?
Studies I found show GPT model accuracy varies widely, from ~50% to ~90% (the higher end generally for more simple tasks) [source].
For healthcare, while LLMs like ChatGPT4 are good at interpreting medical notes, their accuracy drops in complex diagnosis – 93% in identifying common diseases but only 53.3% in identifying the most likely diagnosis, far behind physicians at 98.3% [source]. And 18% less accurate in diagnosis than radiologists in musculoskeletal radiology [source].
As anyone who’s worked on system reliability will tell you, itโs the last mile thatโs the hardest. Improvements often face diminishing returns, especially as you approach higher levels of reliability.
This is why, unless thereโs another exponential leap, we could still be a long way off from these models being reliable enough to be used in business critical systems (of course this doesnโt mean there isnโt the potential for lots of other valuable uses for GenAI).
Monthly Archives: June 2024
Practical advice on outsourcing & offshoring engineering
Iโm regularly asked for my advice on ๐ผ๐๐๐๐ผ๐๐ฟ๐ฐ๐ถ๐ป๐ด ๐ฎ๐ป๐ฑ ๐ผ๐ณ๐ณ๐๐ต๐ผ๐ฟ๐ถ๐ป๐ด ๐ฒ๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด, for both start-ups and established orgs, so here you go๐
Early stage start-ups
For ๐ฒ๐ฎ๐ฟ๐น๐-๐๐๐ฎ๐ด๐ฒ ๐๐๐ฎ๐ฟ๐-๐๐ฝ๐ unless you’re well funded, I typically recommend outsourcing, as hiring permanent engineers at this stage is a significant cost and commitment. However, big caveats here: I rarely come across founders whoโve had a good experience. If you’re not an experienced technical founder, Iโd advise getting a fractional CTO or similar to support you. While an additional expense, it’ll save you money in the long run (a lot of my talk โNavigating the Tech Galaxy for Early Stage Start-upsโ covers how to avoid common pitfalls).
Established organisations
The following advice is for everyone else, from later stage start-ups/scale-ups to large enterprise organisations
Account for management overhead
For ๐ฒ๐๐๐ฎ๐ฏ๐น๐ถ๐๐ต๐ฒ๐ฑ ๐ผ๐ฟ๐ด๐ฎ๐ป๐ถ๐๐ฎ๐๐ถ๐ผ๐ป๐ considering outsourcing – either for perceived cost savings or due to a lack of internal capability – the management overhead is often underestimated. To work effectively, it requires more oversight than an internal team, especially if theyโre working in a different timezone. Organisations often pendulum swing from “expensive” internal capability to outsourced/offshore, only to find things take longer, objectives arenโt met, cost savings arenโt realised, and then swing back the other way.
Outsourcing projects
If youโre thinking about ๐ผ๐๐๐๐ผ๐๐ฟ๐ฐ๐ถ๐ป๐ด ๐ฎ ๐ฝ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐, make sure to factor in for a ramp up period for new engineers to get familiar with the code and architecture (typically around a month) and, for what even might be considered โstand aloneโ projects, not insignificant impact on the existing team required to support (and notwithstanding whether your architecture effectively supports another team working in the codebase). For these reasons I advise a minimum 3 month engagement and the most beneficial impact over a medium to long term period and an ongoing relationship.
Insourcing
My best experience has been with โ๐ถ๐ป๐๐ผ๐๐ฟ๐ฐ๐ถ๐ป๐ดโ or ๐๐ฒ๐ฎ๐บ ๐ฎ๐๐ด๐บ๐ฒ๐ป๐๐ฎ๐๐ถ๐ผ๐ป – bolstering internal teams with people from external partners, meaning lower management overhead, allowing you to flex capacity based on demand and retain knowledge internally, avoiding dependency on a partner. It won’t work if the supplier is working in a significantly different timezone.
True partner relationship
In all cases, it works best when itโs a ๐๐ฟ๐๐ฒ ๐ฝ๐ฎ๐ฟ๐๐ป๐ฒ๐ฟ ๐ฟ๐ฒ๐น๐ฎ๐๐ถ๐ผ๐ป๐๐ต๐ถ๐ฝ, not a customer/supplier one. It should feel like youโre all part of the same team. Your contracts and commercial relationships can incentivise this – ensure quality and operational requirements are shared responsibilities, and define mutual obligations to enable effective collaboration.
In-house teams
Overall, my best experience – both in terms of cost and outcomes – has been with ๐ณ๐๐น๐น๐ ๐ถ๐ป-๐ต๐ผ๐๐๐ฒ ๐๐ฒ๐ฎ๐บ๐, using independent contractors sparingly to flex where needed or fill short-term capability gaps. This approach works best with a broadly even and predictable pattern of demand.
Important footnote: UK R&D Tax Relief Changes
If you’re using or thinking of using nearshore/offshore engineering partners and intending to claim tax credits under the UK R&D Tax Relief scheme, note that since April 2024 overseas expenditure is no longer eligible under the scheme, except for very limited circumstances.