The human bottleneck and accountability: AI's two critical hurdles in production
Klarna's reversal on AI demonstrates the pitfalls of overhyped technology. Human oversight and accountability remains crucial, even if it hinders AI efficiency.
Klarana, the Sweden-based payments company, recently reversed its decision to replace customer service agents with AI. It is reportedly rehiring humans after its AI initiative failed to deliver on its promise.
Klarna’s case is emblematic of the current state of AI. There is a lot of hype surrounding the capabilities of the latest generation of large language models (LLMs). On the surface, they can perform tasks that require human experts, often commanding very high salaries. They can do it at a fraction of the cost and several times the speed. Sometimes, as in the case of Klarna, they give the impression that they can replace those humans.
However, once you move beyond fancy demos and deploy these systems to production, they start making unexpected errors, and things can fall apart. And the problem comes down to two key issues: the human bottleneck and the accountability problem.
The human bottleneck
The human bottleneck refers to the cost of having a human expert review, correct, and approve the response of an AI.
For benign use cases, where tolerance for error is high, handing over the task completely to AI can result in higher output at the cost of acceptably inferior quality.
For example, consider a platform that takes orders for content and passes them on to freelance writers. With the help of LLMs, the platform can use AI to automate the responses. This will allow the company to cut the costs of content production and pass on those savings to the customers. At the same time, it can handle many more requests and expand its reach. However, the quality of the content will not be as high as that created by a professional writer. This is a tradeoff that the customers accept as they are getting better prices and are the final arbiter of the content. Maybe they can order five versions of the content (still at a fraction of the original price) and choose the best one.
On the other hand, if the quality of the content is below a certain threshold, then the cost savings will not be worth the inferior experience. This seems to be the pitfall that Klarna fell into by going too hard on AI. They now need to find the right balance between AI and human oversight. Depending on how they implement the combination of manual and automated ticket handling, human review will become the bottleneck of how much they can accomplish with AI. As AI gets better, that bottleneck will become less prominent or eventually go away completely.
The accountability problem
For more critical applications, the rules of accountability need to be redefined. A perfect example of this is self-driving cars. Current autonomous vehicles drive more safely than most humans. But when you have thousands or millions of AVs driving across the world, you will eventually get into accidents. And who will be held liable if the AV is culpable for the accident?
Currently, all providers of AV technology are solving this problem with human oversight. Tesla makes it clear that the driver is responsible and must be engaged when FSD is active. Waymo, which is operating a self-driving ride-hailing service, started by having human backup drivers sitting in the driver’s seat. Eventually, as the technology got better and the regulations evolved, they transitioned to remote drivers who are ready to take over if the AV gets stuck in a situation it can’t handle. We don’t know how many cars each remote driver controls, but it is likely more than one, which means they were able to scale to some degree. As the technology continues to evolve, incidents might become so rare that they will get rid of the remote operators altogether.
So you either have be find a way to scale human oversight, or the AI must be good enough to enable you to take full responsibility for the occasional glitches that your automated system makes. This is no different from, say elevators, which once used to be operated by humans but are now fully automated.
Now project the same rules to another use case, such as writing code. Currently, AI coding tools have disclaimers that absolve the company of any harm done by the code the model generates, putting the onus on the user. If you’re vibe coding a personal tool or a prototype for a product that will not be deployed into production, your error tolerance might be high, and you might give full control over to the AI tool that is writing the code.
But if you’re using AI to write enterprise software, the tolerance for error lowers. You have to make sure that the code is secure, robust, compliant with industry regulations, and compatible with your existing codebase. Here, the human bottleneck kicks in. You must have an experienced human developer review and approve the code and assume responsibility for its performance.
The AI might be able to write millions of lines of code per hour, but it will be bottlenecked by the human developer who has to review the code. You will still be able to do more with your current developers (which is partly why big tech companies are laying off software engineers), but the real productivity gain will be much lower than the model’s ability to churn out code.
As AI’s coding abilities improve and the proper guardrails and controls are implemented (e.g., automated test units), the amount of human effort required to review the code will be reduced, increasing productivity. Eventually, we might reach a point where the tool provider might assume responsibility for all aspects of the code, giving app builders the full power of automation. We’re not there yet.
Final thoughts
So for any use case that you want to automate with AI, think about the human bottleneck and accountability:
What is your error tolerance? Can you tradeoff quality for speed or can you meet the required quality with AI tools?
Who is responsible for the errors that AI makes? Will you assume responsibility for errors or will you pass them on to the user?
Do you need human oversight to meet your error tolerance level? If yes, is there any way to scale human oversight by improving the model or redefining the way the application is used?
The road to automation is not smooth. We will make mistakes, backtrack, and figure things out as the industry continues to evolve.
I think there is a long way to go to find the in-between of automation, making the most of AI, and human potential at the same time. Automated cars a great example to illustrate this issue.