Lack of verification could act as a bottleneck for AI’s economic impact

In the last few weeks, an increasing number of people have woken up to the sheer capabilities of today’s coding models. Will AI soon be able to do all of the jobs of a coder? And, obvious next follow up question, what about the rest of knowledge work?

What are the economic limits of today’s LLMs? At the moment, they are mostly confined to knowledge work - but within that, what kinds of task are they likely to remain unreliable for without a fundamental advance in algorithms?

There are lots of different theories over what the exact missing capabilities are in today’s models. Some of them, most people accept, we can paper over with enough specialised post training, fine tuning and dedicated scaffolds to give guidance on exactly how we want each task done. But will this be enough to let cover all tasks?

One way to think about this is to focus on the difference between how hard a task is to perform, and how hard is to verify. If a task is cheap or easy to verify than LLM’s not being perfectly reliable matter much less - we can add an additional verification stage to the end of any workflow, and when we see a problem, ask it either to try again or escalate the hardest cases to a human. 

By contrast, when verification is expensive, then automation production doesn’t help as nearly as much. Either it costs too much or takes too long to manage check everything - losing the supposed cost advantages of automation - or, in some cases, it’s just impossible, and we never know if the work is right or not. That’s not really practical for many economically critical tasks. 

In order to understand this better, we produced a preliminary new model based upon US data, comparing how easy AI we estimate tasks will be to automate in the economy with the difficult of verification. While related, we found that in some interesting ways these two measures diverged. 

If anything, this model reinforces the idea that it is office and administrative roles that are most likely to be affected by the first wave of AI automation.

Looking at knowledge worker tasks in particular, we find that around two-thirds of knowledge worker tasks (68%) are likely to be relatively hard to verify.

If verification does turn out to be a real bottleneck for reliable AI, we find that this could significantly AI’s potential impact - reducing it by over half in our median scenario.

Verification might turn out to be just one more perceived weakness - following common sense, agency, reasoning and mathematics - that scaling compute ends up solving on its own. But, if it doesn’t, this could have a significant impact on its medium term prospects.

Previous
Previous

Which countries are adopting AI most? It’s not just about GDP

Next
Next

Worries about superintelligent AI are worries about control