AI is the ultimate leaky abstraction 🪣

Artificial Intelligence has taken the world by storm. Especially generative AI with Large Language Models (LLM) with ChatGPT as frontrunner. It is not hard to see why. Like many others, I was blown away when I first started using them and realized what they could do. Like many others, I was then disappointed when I realized their limitations.

Because it is when things go bad that you really need to start thinking. How well do you actually understand these models?

Leaky abstractions #

More than 20 years ago, Joel Spolsky (one of the OG software development bloggers) coined the term “Law of Leaky Abstractions”.

All non-trivial abstractions, to some degree, are leaky.

It means that for any interesting abstraction, there are some conditions under which it fails. And when an abstraction fails, you see through to the underlying level. That is why you sooner or later still need to understand how things work “under the hood”.

For example, this is why you may need to understand manual memory management even when working in a programming language with garbage collection. Or that you may need to understand how a database works internally when writing SQL queries. But understanding what is under the hood with AI may be easier said than done!

Under the hood of AI #

Let’s take a look at OpenAI’s GPT-4, the current king of generative AI. Analysis by The Decoder estimates that GPT-4 has 1.8 trillion parameters and has been trained on 13 trillion tokens. So what does this mean?

The number of parameters expresses the number of “knobs” controlling the behavior of the model. The structure of LLMs are quite complex and it is borderline impossible to predict the effect of changing a single parameter. With 1,8 trillion parameters, it is enough to give each living person 220 knobs to play with. This means that how the full model actually works is incomprehensible to a human being. We can only look at larger-scale properties or at a tiny slice at a time.¹

Equally impossible is it to review the data the model has been trained on. Partly because it is a closely guarded secret, but also because of its sheer size. As a comparison, the entire Linux kernel repository contains some 30 million lines. It’s more than you could reasonably hope to understand in full. Assuming 20 tokens per line, the complete GPT-4 training data set is roughly 20 000 times larger! This is particularly worrying as any part of that training data can theoretically have an effect on the output it gives you.

The times they are a-changin' #

To put it bluntly, it is impossible to fully understand how a state-of-the-art AI model actually works. What then can we do when the AI abstraction leaks? What can we do when the AI does not do what we want?

As developers, we are used to computers behaving rationally and correctly. There are exceptions, but for the most case things make sense in the end.² With AI, the traditional modus operandi of “troubleshoot problem until you understand the root cause, then fix it” no longer works. You simply cannot always pinpoint the root cause. In some cases maybe, but in other cases you will be left with an unsatisfying “make adjustment, see if it helps” loop without ever knowing if you have fully resolved the situation.

Not only are their mechanics impossible to fully understand, but the data they encode may hold surprises. As the models are trained on human content, they capture all the dark side of human communication as well as the good. They exhibit bias and other cognitive distortions. They can change their behavior without explanation. They “hallucinate” and make things up. This means you are in for a whole new level of surprises, some of which may well take on ethical or regulatory dimensions.

Use with caution #

I will not claim that I know exactly when and how to use AI, but I have a suspicion that it is useful less often than one might think. It will be important to carefully consider for which problems AI is a suitable solution, and where traditional options are better. Steve Sewell captures this well in his article on how how to use AI.

Use AI for as little as possible. At the end of the day, “normal” code is the fastest, most reliable, most deterministic, most easy to debug, easy to fix, easy to manage, and easy to test code you will ever have. But the magic will come from the small but critical areas you use AI models for.

There are definitely use cases where AI gives fantastic results. But it is a complicated tool, so make sure the problem you are trying to solve is worth it. And don’t forget to ask yourself, what will you do when the AI abstraction inevitably leaks?

It is worth noting that this is very much true for human intelligence too. ↩︎
Even with cars which dislike vanilla ice cream or when you cannot send email more than 500 miles. ↩︎