AI Alignment
The term AI Alignment, like many new concepts associated with AI, is very broad and fuzzy. It means something like the techniques and processes to ensure that a LLM is aligned with human interests. It is usually discussed within the context of AI Safety, which can mean anything ranging from “content moderating LLM outputs” to “let’s not develop an AGI that takes over the planet and enslaves us.”
There are lots of important philosophical questions that arise when we think about this effort. Human history is a series of pluralistic and conflicting “interests”, so when we try to align AI with human interests, the obvious question is whose interests?
Even when we look at the good-faith attempts by some of the major AI companies to hire researchers with backgrounds in ethics and moral philosophy to help fine-tune LLMs, we can’t say either that the history of moral philosophy is in any sense settled. Should we work to perfect the general character of the LLM (virtue ethics), embed universal duties of action (deontology) or ensure that it always maximises outcomes to benefit the most and harm the least number of people (consequentialism/utilitarianism)?
However, this is not what I wanted to write about today. Instead, I wanted to try to think about alignment in a different sense. I want to consider the ways that LLMs are/are not aligned with the underlying computational processes they are manipulating. The problem may be the inverse of how it is usually represented, LLMs may be overly-aligned with human interests and modes of action. They are transforming what it means to be a computer “engineer”, but has something been lost along the way?
A detour through abstraction
One of the most interesting things about LLMs are that they mark a revolution in human-computer interaction.
When we look at this history, we see a progression in degrees of abstraction, resulting in more complex and emergent types of applications.
In the beginning was the punch card and batch processing. Then, we got consoles, where we could type programmatic commands and get instant feedback. Next, was the age of the “consumer” computer-user, the rise of the graphical user-interface (GUI), where the underlying commands where wrapped in fancy icons, buttons and sliders.
All the while, human-computer interaction became more user-friendly, more natural. UX design became its own art form (building a narrative/story from these technical operations).
As computer use expanded and became more accessible (more “human”) the types of things computers were made to do also expanded.
“Abstraction”, which in its etymology means something like “tearing away”, is classically thought of as a method of simplifying and clarifying. The opposite of abstract is “concrete”. The concrete is messy, dynamic, mixed-together, the abstract is static, formalised, separated out.
For example, in nature, chemical compounds are the complex, interwoven structures that combine and interact to make up matter in miraculous ways, but on the chart in a classroom they are drawn out as flat diagrams that make them easier to understand and reason about. A picture of nature as it really is, were it even possible, would not be a very good tool for actually understanding what is going on. To do so, we need to make decisions and “cuts”, to say this protein has this name, and another has that, and to theorise about their interactions and so on.
It is the same with computers. If everyone who used computers needed an understanding of boolean algebra or electronics, we would have made far less progress. Instead, the abstractions made through different interfaces allowed for a broader range of participants and users, which in turn leads to more applications being reasoned about and developed. However, not all abstractions are equal.
In an ideal sense, these abstractions or representations of nature are made according to the “scientific method”. The scientific method presupposes a distance between the scientist (subject) and the phenomena they are tying to study (object). This distance allows for neutrality and clarity.
In reality however, science is a practice and, as with all practices, it is shaped by many forces; material conditions of laboratories, funding schemes, institutional politics, etc. As the 20th century progressed, the demand for applied science increased more and more, which is yet another pressure on the scientific method. Now, when making abstractions, we also have to think about how they will allow us to see/reason about the world not so much in a way that reveals its truth, but in a way that reveals a truth that allows us better manipulate or utilise it as a resource.
To use Heidegger’s example, a Holderlin poem about the Rhine is a way of viewing reality that draws out various truths and understandings about the river, whereas an engineer who is measuring the Rhine in order to build a damn is basing their “abstractions” (calculations of depths, forces, etc.) around the aim of better harnessing the power of the river for other ends. Both are dealing with the reality of the river in a way that tries to pull out meaning from the concrete context, but only one (the engineer) is pulling out that meaning with a view to serving other, technological purposes.
In other words, abstractions are also ways of establishing control and shaping the world.
Good abstractions allow for great leaps in innovation. It is somewhat of a paradox; the further away you are from something the more you are able to get from it. That is why, even though human-computer interactions became more “abstract”, that is, as we moved further away from the actual machinery that was manipulating the bits and bytes, we gained greater control and expanded the practical possibilities of computers further.
Think of the practical impact a javascript developer (or an army of javascript developers) has on the world, versus someone who learns how to write in Assembly for a particular chip architecture. The abstraction afforded by a higher-level language like javascript frees developers from the messy complexities of computational architectures and allows them to focus on the logic of the application.
Of all the truly “scientific” ways that computers could be used (and are to some extent being used), it is important that the most powerful companies in the world were build though things like social media and online shopping, things which were well beyond the imagination of the someone changing a vacuum-tube in the 40s or a punch-card operators in the 50s, people who knew the internals, the “truth” of computation better than any of these CEOs. The point here is that “truth” doesn’t matter, it’s what you do with it that does.
The new human-computer interface
To go back to the initial question about alignment. In a sense, what we have seen over the past 80 years or so is an increase in the misalignment between humans and machines, and a corresponding increase in the human power to wield these machines for ever greater purposes. Layers and layers of good abstractions led to this situation.1 So, how do LLMs fit into this picture?
LLMs mark the highest form of computer-human interaction we have seen to date. They get rid of the “graphical user interface” and replace it with the interface that is most fundamental to humans - natural language.
At the same time, there is also a more radical break that has happened - even though the GUI is very far removed from the underlying computation, there is a degree of “traceability” between the button you click and the sequence of bytes that are sent to a CPU or over a wire. Yes, there is lots of indeterminate and ill-defined behaviours in a lot of software design (which become easier to introduce due to the abstraction), but these can generally be reasoned about and understood with enough analysis and attention.
With an LLM, hooked up to an agent, you write your intention using natural language (a very “inefficient” more of input, from a computational/informational perspective, owing to its ambiguity and noisiness) and the LLM “interprets” it, translates it into code or other functions it has access to, and makes something happen.
Most of the time, your exact intention is translated into something that you wanted, and there is indeed something magical about this (though, I’m sure the early GUI web browsers also seemed magical).
Some of the time something different happens. Something goes wrong. Unlike the GUI behaviour, it is impossible to reason about why it went wrong, beyond perhaps some understanding of the statistical models behind the action. In these cases, the solution to the problem isn’t to study the underlying system more and develop a better engineering solution, instead, we have to build in more layers of redundancy (“humans-in-the-loop”, further LLMs to verify, etc).
So, we have increased the level of abstraction to the extent that we are no longer even trying to “represent” an underlying semblance that this is a computer. With a video game or user-interface, the person usually has to learn how to use it to some extent. It is not the same as learning how to programme with punch cards, but you are learning to interact with a machine according to certain constraints of that machine. If the game is stuttering, you may have to go into the menu and change async rendering, etc. You might not fully understand the technical implications of this (though I am sure a lot of gamers do), but you at least understand that it is some kind of computational/engineering phenomena. With an LLM, you talk to it as you would talk to a friend.
It is like the diagrams of the molecules on the classroom wall have been replaced by an agent which you can simply ask to “make me a cure for the disease x”. Even though the diagrams were abstractions of the real processes, you still had a sense of something “non-human”, something of the other reality that the molecules depict. The LLM, on the other hand, is all too human.
In the age of LLMs, we should not only think of “AI Alignment”, but also “human misalignment” with computers. Using the supreme interface of natural language, we have abstracted from the underlying processes to such an extent that computer users of the future may no longer understand the full implications of what they are doing.
As mentioned above, greater abstraction can also lead to greater control. It remains to be seen whether LLMs are “good” abstractions from an engineering perspective, but if they are, then the new possibilities that could be unlocked are both fascinating and terrifying. If they aren’t good abstractions, then we are bound to get an ever-increasing number of programs and actions that fail occasionally due to statistical probabilities.
Yes, from a moral perspective we need to think more about how best to “align” LLMs to certain goals and values, but from an engineering perspective, perhaps we should think more about how over-aligned LLMs are with human language, and consider whether human language really is a good abstraction for human-computer interaction.
-
“Good” abstractions in that they have been very effective from an engineering/applied perspective ↩︎