I received an interesting email from Steve Gibson this morning talking about "AI."
Steve’s End of the Year AI Update
December 30th, 2024
When I first set about writing this email, my plan was to share what I had learned during the first half of our 3-week hiatus from the podcast. But it quickly grew long (even longer than this) because I've learned quite a lot about what’s going on with AI. Since I suspect no one wants to read a podcast-length piece of email which I would largely need to repeat for the podcast, I’m going to distill this into an historical narrative to summarize a few key points. Then I’m going to point everyone to a 22-minute YouTube video that should serve to raise everyone’s eyebrows.
So here it is:
Everything that's going on is about neural networks. This has become so obvious to those in the business that they no longer talk about it. It would be like making a point of saying that today's computers run on electricity. (Duh!)
AI computation can be divided into “pre-training” and “test-time” (also called “inference-time”). Pretraining is the monumental task of putting information into a massive and initially untrained neural network. Information is “put into” the network by comparing the network's output against the correct output, then tweaking the network's neural parameters to move the network's latest output more toward the correct output. A modern neural network might have 185 billion parameters interlinking its neurons, each which requires tweaking. This is done over and over and over (many millions of times) across a massive body of “knowledge” to gradually train the network to generate the proper output for any input.
Counterintuitive though it may be, the result of this training is a neural network that actually contains the knowledge that was used to train it; it is a true knowledge representation. If that's difficult to swallow, consider human DNA as an analogy. DNA contains all of the knowledge that's required to build a person. The fact that DNA is not itself intelligent or sentient doesn't mean that it's not jam-packed with knowledge.
The implementation of neural networks is surprisingly simple, requiring only a lot of simple multiplication and addition with massive parallelism. This is exactly what GPUs were designed to do. They were originally designed to perform the many simple 3D calculations needed for modern gaming, then they were employed to solve hash problems to mine cryptocurrency. But they now lie at the heart of all neural network AI.
Even when powered by massive arrays of the fastest GPUs rented from cloud providers, this “pretraining” approach was becoming prohibitively expensive and time consuming. But five years ago, in 2019, a team of eight Google AI researchers published a ground breaking paper titled “Attention is all you need.” The title was inspired by the famous Beatles song “Love is all you need” and the paper introduced the technology they named “Transformers” (because one of the researchers liked the sound of the word). The best way to think of “Transformer” technology is that it allows massive neural networks to be trained much more efficiently “in parallel” and it also introduced the idea that not all of the training tokens needed to be considered equally because they were not all equally important. More “Attention” could be given to some than others. This breakthrough resulted in a massive overall improvement in training speed which, in turn, allowed vastly larger networks to be created and trained in reasonable time. Thus, it became practical and possible to train much larger neural networks and LLM's – Large Language Models – were born.
The “GPT” of ChatGPT stands for Generative Pre-trained Transformer.
But over time, once again, researchers began running into new limitations. They wanted even bigger networks because bigger networks provided more accurate results. But the bigger the network, the slower and more time consuming – and thus costly – was its training. It would have been theoretically possible to keep pushing that upward, but a better solution was discovered: Post-training computation.
Traditional training of massive LLM's was very expensive. The breakthrough “Transformer” tech that made LLM-scale neural networks feasible for the first time was now being taken for granted. But at least the training was a one-time investment. After that, a query of the network could be made almost instantly and, therefore, for almost no money. But the trouble was that even with the largest practical networks the results could be unreliable – known as hallucinations. Aside from just being annoying, any neural network that was going to hallucinate and just “make stuff up” could never be relied upon to build “chains of inference” where its outputs could be used to explore the consequences of new inputs when seeking solutions to problems. Being able to do that would begin to look a lot like thinking.
But a few years ago researchers began to better appreciate what could be done if a neural network's answer was not needed immediately. They began exploring what could be accomplished post-training if, when making a query, some time and computation – and thus money – could be spent working with the pre-trained network. By making a great many queries of the network and comparing multiple results, the overall reliability could be improved so much that it would be possible to create reliable inference chains for true problem solving. This is often referred to as Chains of Thought (CoT). Inference chains would allow for true problem solving using the stored knowledge that had been trained into these networks, and the pre-trained model could also be used for the correction of it own errors.
I should note that the reason asking the same question multiple times results in multiple different answers is that researchers long ago discovered that introducing just a bit of “random factor” – which is called “the temperature” – into neural networks resulted in superior performance. (And, yes... if this all sounds suspiciously like VooDoo, you're not wrong – but it works anyway.)
OpenAI's o1 model is the first of these more expensive inference-chain AI's to be made widely available. It offers a truly astonishing improvement over the previous ChatGPT 4o models. Since o1 is expensive for OpenAI to offer on a per-query basis, subscribers are limited to 7 full queries per day. But the o1-mini model, which is better but not as good, can be used without limit.
Here's the big news: OpenAI just revealed that they have an o3 model that blows away their o1 model. It's not yet available, but it's coming. What IS available are the results of its benchmarks and that's why I believe you need to make time to watch this YouTube video:
Is it AGI? OpenAI is saying not yet, but that they're closing in on it – and all of the evidence suggests that they are. The independent benchmarks and other tested performance cited in that video above are quite compelling.
AGI only means that over a wide range of cognitive problem solving tasks an AI can outperform a knowledgeable person. Computers can already beat the best Chess, Go and Poker players. I think it's very clear that today's AIs are not far from being superior to humans at general problem solving. That doesn't make them a Frankenstein to be feared; it only makes AI a new and exceedingly useful tool.
Many years ago I grabbed the domain clevermonkies.com because I thought it was fun. It occurs to me that it takes very clever monkies, indeed, to create something even more clever than themselves. All the evidence I've seen indicates that we're on the cusp of doing just that. (Check out that video and other videos about OpenAI's o3 model.)
See you in 2025!!