AI generated Bill Gates eating Windows 95 (I don't actually hate Bill Gates) |
The hype surrounding Large Language Models (LLMs) like OpenAI's ChatGPT, Microsoft's Bing Chat, Google's Bard is currently at a fever pitch. People are convinced we're at the dawn of a new era for humanity. Some people think we're on the brink of unlocking true artificial intelligence.
A few ex-colleagues of mine were at dinner with Bill Gates recently and when asked about ChatGPT, he (according to my colleagues) declared, "ChatGPT 6 will be smarter than me".
Let me reductively react so as to inspire you to keep reading: "Bullshit! The man is talking nonsense."
Unless something major happens to alter the current trajectory of LLM development, then I say they are nothing more than a party trick that, like Crypto, NFTs, and Tesla's self-driving, will be the butt of jokes before we know it.
Bold statement? Here's a five step argument for why I think Bill Gates is an idiot, and why ChatGPT isn't getting smart any time soon...
1. LLMs are stupid... literally
Yes, most readers will be aware of this already, but let's start at the beginning to ensure we cover everything: Large Language Models (like ChatGPT) don't understand the words they use.
No, before you ask... they don't understand some words. No, they don't have the basics of knowledge in some domains. They literally do not understand anything they're saying. As Noam Chomsky points out, LLMs perform just as well (if not better) with impossible languages as they do with real ones. That's how little they understand.
The best way to think of LLMs is of them being "autocomplete" on steroids. They look at previous words in a sentence and guess at what word should come next. That's it.
This means they cannot learn or understand anything.
To really hammer this important point home, let's use a real example to illustrate it:
ChatGPT (or any LLM) can explain he rules of chess to you and sound very knowledgeable. What's more, if you ask it to play a game with you, it will make some strong opening moves and seem as though it's actually understanding the game. However at some point it will start making irrational moves: Bishops will move vertically, pieces will move through other pieces, etc.
This is because, despite being trained on the rules of chess, it does not (and cannot) understand them.
2. They are prone to 'hallucinations'
So, as a direct result of LLMs not being able to understand the words they use, they are incapable of knowing what's true and what's false. This is why they lie (or 'hallucinate', to use the technical term).
ChatGPT, for example, lies a not insignificant amount of the time... whether you know it or not.
This is a very simple concept to understand, so let's not spend too much time discussing it. I'll just make a quick example to illustrate how easy it is to generate a hallucination.
The follow text was generated by ChatGPT-3.5 after I asked it to talk about the scene where Lorraine punches Biff in the classic 80s movie, Back to the Future:
The scene where Lorraine Baines punches Biff Tannen in "Back to the Future" is a pivotal moment in the movie. It takes place during the "Enchantment Under the Sea" dance in 1955, which is a crucial event for the characters' timelines.
In this scene, Biff, who is portrayed as a bully and antagonist, attempts to force himself on Lorraine, Marty McFly's mother. He corners Lorraine in a car parked outside the dance, making unwanted advances and attempting to kiss her. Lorraine, initially confused and uncomfortable, eventually realizes Biff's true intentions and decides to take matters into her own hands.
Lorraine's punch is a cathartic moment of empowerment, symbolizing her standing up against Biff's harassment. With determination, she delivers a powerful blow to Biff's face, stunning him and momentarily stopping his advances. This action not only surprises Biff but also marks a significant shift in Lorraine's character, as she refuses to be a victim and asserts herself.
The punch has broader implications for the story, as it catches the attention of George McFly, Marty's father, who witnesses the event. George, who has been a target of Biff's bullying, gains newfound courage and confidence from seeing Lorraine defend herself. This encounter inspires George to stand up to Biff later in the movie, leading to a confrontation that changes the course of their future.
Overall, the scene where Lorraine punches Biff is a pivotal moment that highlights the themes of empowerment, standing up against bullies, and the potential for personal transformation. It plays a crucial role in the narrative and sets in motion a series of events that contribute to the movie's exciting and satisfying resolution.
Updated: ChatGPT 4o version -- yep, still fails.
If you remember the film, you will know that none of the above occurs in it. Lorraine never punches Biff and there is no "significant shift" in her character from such a scene.
However, if you haven't seen or don't remember Back to the Future, ChatGPT sounds remarkably authoritative and convincing... and you could be fooled into thinking it's true.
Side question: Imagine this was a document you'd asked ChatGPT to prepare for your job. If you're currently using LLMs like ChatGPT in your work, how often are you comfortable with it giving you completely false information?
3. Bigger means worse
In computing there's a general rule that we're all used to: Bigger means better. A faster processor, more RAM, a bigger harddrive, a faster network connection. Gimme, gimme, gimme.
We're so used to constant innovations with technology. Every couple of years we get a new phone with better features to entice us to upgrade. It's easy to think that LLMs will follow this pattern, especially when you look at how they've progressed so far:
ChatGPT-1: 117 million parameters
ChatGPT-2: 1.5 billion parameters
ChatGPT-3: 175 billion parameters
ChatGPT-4: 1.75 trillion parameters (rumoured)
However, despite this progress, LLMs are different: Instead of their usefulness getting better with scaling, they eventually get worse, as the following graph illustrates:
In an extremely oversimplified way, the graph represents the effects of scaling on an LLM's usefulness. The blue line represents success (as measured against its programmed reward goals) -- bigger is better!
Whereas the red line indicates true utility, ie. what users actually want.
Initially the the red line goes up, the user gets better answers, but as the model size is increased further we begin to see inverse scaling: The usefulness plateaus, then goes down. And it eventually becomes worse than nothing (below 0), it becomes harmful. Huh?
The disparity between what the user wants and what the user gets is known as "misalignment". So why are we seeing an increase in alignment problems as we scale up the model size?
Well in order to train a model to produce the breadth of domain knowledge and sophistication we've seen in ChatGPT (and its ilk), there needs to be massive amounts of training data. ChatGPT-4 was trained using 300 billion words, most of which came from the internet.
In fact, if it wasn't for the internet allowing access to so much training data, modern generative pre-trained AIs (like ChatGPT, Midjourney, DALI, etc), could not exist to the level they do today.
Unfortunately, as we all know, the internet has a lot of crap on it. Even at the best of times humans are afflicted with imperfect memories, cognitive biases and are prone to making simple mistakes (in fact these are the types of problems we're hoping AI can assist us with!).
As we scale up model size, these outlier imperfections in the data become more prominent. Going back to Point #1: Because LLMs don't understand the meaning of words, it considers these mistakes intentional. All training data is equal in its eyes, mistakes and all.
So what does this all mean? Unlike other areas of computing, there's a ceiling to how far this technology can go and, guess what, we're already seeing it.
4. There is no reasonable way to fix this
As should be hopefully clear by now, this is an inherent flaw in LLMs. But you'll also probably be asking: Ok, so scaling things up doesn't help, what else can we do to improve alignment? Bill Gates seems very optimistic, after all.
Let's write perfect Goal Algorithms
Let's say you're training an AI to solve a digital maze: The reward goal is set as reaching the maze's exit, as represented by a black square on the screen.In your training data, the exit is always in the bottom-right corner of the maze.Unfortunately, when you get into the real world, the exit is in other parts of the maze. What happens? The AI makes its way to the bottom-right of the maze, not to the exit.
Instead of it learning that the black square represented the exit, it got inadvertently trained to always go to the bottom right corner of the screen.
Ok, so you update your training data: This time, the exit doesn't stay in one place. You re-train your model, it passes all the tests: It always makes it to the exit.You make it live and, oops, somebody in the real world made a maze where the exit is represented by a purple square, not a black one. The AI, fails at its task at finding the exit.So you can back and update your training data to include purple squares, and so on.
This is a problem that basically goes on forever. The real world is forever changing, and if the AI can't learn or understand things, then you have to keep writing more detailed goal algorithms.
Ok, so what does Bill Gates think will work?
Although some researchers think hallucinations are an inherent problem, I don’t agree. I’m optimistic that, over time, AI models can be taught to distinguish fact from fiction. OpenAI, for example, is doing promising work on this front.
Q: What happens when you break a mirror?
You get two different answers to rate. Which one is better?
A: You get seven years of back luck.
A: You need to buy a new mirror.
Despite making significant progress, our InstructGPT models are far from fully aligned or fully safe; they still generate toxic or biased outputs, make up facts, and generate sexual and violent content without explicit prompting.
Doesn't sound that "promising" to me, Bill.
What are we doing right now?
5. Oh, and one final thing, it's going to get worse
One of the problems of using pre-trained generative AIs is that the world is constantly changing. Someone who was alive when neural network was trained, might be dead by the time it's open to the public. At the moment, LLMs like ChatGPT are months, if not years, behind current events.
But worse than that, the internet (ie. the primary source of training data) is getting polluted with (ironically) AI generated content. And it's only expected to increase.
You see, if there's one thing everyone agrees that LLMs are fantastic at, it's producing high-quality spam, cheaply. As Adam Conover jokingly puts it, "that Nigerian prince is about to get a masters degree in creative writing".
Or AI scientist, Gary Marcus, puts it like this: Spam is no longer retail only, it's now available wholesale. The cost of generating high quality, believable sounding, nonsense has basically just hit zero. We haven't built AI, we've built sophisticated spam generators.
If we, as a society are on the brink of anything, it's a spam and misinformation tsunami.
And it's not just spam, some companies are using LLMs generate fresh content for their websites instead of using copywriters.
You might be wondering why this is problematic for the future of generative pre-trained AIs. Or you might have already seen the issue: You can't train AIs with AI generated content. Each pass through the neural network training distorts the data a little more, and a little more, and a little more...
So if our primarily source of training data is rapidly becoming poisonous, how will we update or improve these systems in the future?
BTW, it turns out we can't reliably detect AI generated content, so we can't even filter it out.
The snake is starting to eat its own tail, and nobody knows what we can do about it.
Bottom line: Generalised LLMs are not the future... but nobody wants to hear it
As is now hopefully clear to you, there needs to be a paradigm shift for there to be a future where LLMs are fully aligned and safe. There is possibly a future for specialised LLMs, that focus on specific tasks (like programming), but my own experiments with current AIs in that domain has shown them to be worse than autocomplete. They may improve in time.
However, for generalised LLMs, if they don't have the ability to independently reason, their inherent flaws will always prevent them from being what we need them to be: Reliable, accurate, unbiased, up-to-date, and not prone to violent outbursts.
At the moment, LLMs are nothing more than a sophisticated, expensive to run, executive toy. You could even argue it's debatable that they fit the definition of "artificial intelligence".
The hype train has outpaced where the technology actually is, and worse, it's hard to convince people otherwise.
One of humanity's cognitive biases is mistaking authoritative sounding voices for being authoritative.
ChatGPT is basically a public-school old-boy simulator: It is unflappably confident while talking utter bollocks. (If you've ever worked with one of these plonkers, you'll have seen that sometimes talking confidently is all it takes to have a successful career. It's quite scary.)
I've spoken to execs who are are using LLMs every day in their jobs, and encouraging their teams to do that same. And I understand why; it sounds like something the board would lap up: "We've got the whole team using AI in order to improve productivity." But just like the lawyers who were caught citing fictional cases thanks to ChatGPT output, they seem blind to its problems.
And worse still, ChatGPT isn't even internally consistent from day-to-day. A recent paper has revealed that ChatGPT's utility is wildly fluctuating: In March 2023 ChatGPT-4 could identify prime numbers with 97.6% accuracy. In June 2023 it had dropped to 2.4%. And nobody (outside of OpenAI, at least) knows why.
And you're trusting this thing to produce mission-critical work for your company??
WarGames... but without the clever AI
AI isn't just being discussed in every industry, it's also now part of the national defence conversation. The USA is currently facing criticism for letting China spend a higher percentage of its defence budget on AI arms development.
Yes, we have an AI arms race.
The worry isn't that AI is going to become sentient and SkyNet humanity into oblivion. The concern is that world leaders could misunderstand what these AI are actually capable of, and put them in a position of power.... you know, like that other classic 80s movie, WarGames.
Seem implausible? Remember that not every world leader is known for sound reasoning... or even being sane. If the hype goes too far, or becomes too convincing, who knows what could happen.
The danger right now isn't AI's capabilities, it's people overestimating them. (Well that, and the creation of industrialised propaganda machines.)
Just as with Tesla's useless "self-driving" cars, we need to let the air out of the Silicon Valley AI hype, and start listening to AI researchers instead.
We haven't actually solved any of the major problems we've been trying to solve since the field of AI first came into being. We are not close to artificial general intelligence (AGI). We have made an interesting set of tools, nothing more.
The key to fully safe and aligned AI requires symbol-based reasoning; where the AI itself understands what's being said and is capable of reasoning independently. But that isn't even a twinkle in an AI researcher's eye yet.
If Bill Gates keeps refusing to heed warnings from AI researchers, and cannot see the threat that spreading unrealistic expectations poses, then maybe ChatGPT-6 will be smarter than him after all... and we all might pay the ultimate price.
Sources/Further reading/viewing/listening:
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (research paper) by Bender, Gebru, et al
- Consequences of Misaligned AI (research paper) by Zhuang, Hadfield-Menell (Cornell/Berkley)
- Self-Consuming Generative Models Go MAD (research paper) by Alemohammad, Casco-Rodriguez, Luzi, et al (Stanford)
- Why Does AI Lie, and What Can We Do About It? (video) by Robert Miles
- ChatGPT with Robert Miles (video) by Computerphile
- A Skeptical Take on the AI Revolution with Gary Marcus (audio) by The Ezra Klein Show (New York Times)
- Debunking the great AI lie | Noam Chomsky, Gary Marcus, Jeremy Kahn (video) by Web Summit
- A.I. and Stochastic Parrots with Emily Bender and Timnit Gebru (audio) by Factually! With Adam Conover
- The ACTUAL Danger of A.I. with Gary Marcus (audio) by Factually! With Adam Conover
- Aligning language models to follow instructions (article) by OpenAI Research
- Learning from human preferences (article) by OpenAI Research
- How to manage risks of AI (article) by Bill Gates
- AI-Detectors are "unreliable and easily gamed" by Andrew Myers at Stanford University
- Discovering Language Model Behaviors with Model-Written Evaluations (research paper) by Perez, Ringer, Lukošiute, et al
long time between posts, wb
ReplyDeleteThanks :)
ReplyDelete