Tuesday, June 25, 2024

Move 37

I was asked recently while I was working on some autonomous agents if in the course of their interactions they would come up with new ideas. I immediately thought well - Move 37 (I'll get to what I mean by that shortly). And I said I think these agents will be creative to the extent that based on how I have them set up and the nature of their "back stories" I had given them. But that brought up ideas I have thought a lot about which is the broader questions of can an AI and specifically LLM based models alone create novel insight? Can AIs do research? Can AIs do scientific discovery? How creative can AIs be? Can AIs improve themselves based on novel insights about themselves and objectives they need to pursue? 

Let's take it further. Can AIs start the equivalence of the Scientific Revolution that occurred in the 16th and 17th century? Can AIs create a paradigm shift akin to something like what Thomas Kuhn described in his classic book The Structure of Scientific Revolutions? If you were to give a generative AI model training data that only consisted of pre-Renaissance art would it be able to make the jump in linear perspective to create paintings that would have volume and buildings and landscape that would recede with distance? The first painting below is "The Madonna and Child in Majesty Surrounded by Angels" (1280) by Cimabue. The second and third paintings are examples of paintings that have the perspective associated with the Renaissance: "The Tribute Money (1426-1427) by Masaccio and "The School of Athens" by Raphael (1510-1511). Could an AI paint like Raphael if it had only been trained on data before the Renaissance?



So what is Move 37? 

After the conversation I had about autonomous agents and creativity, I started thinking about the debate about LLMs being able to do novel scientific discovery. And Move 37 is the first time I had ever seen a machine do something truly creative and emergent. So again what is Move 37? Move 37 refers to a move in game 2 in 2016 between Google's AI Go program named AlphaGo and Lee Sedol of South Korea a 9 dan player and one of the best, if not the best, player in the world at the time.

There is a great documentary on YouTube about this match and if you go to minute 49:00, you can see this move and the complete shock this move causes. A move that "no human would make" that at first looks like a mistake to all of the commentators. A move that even AlphaGo said that there was only a 1:10000 probability that a human would make that move. 

Humans would not make this move based on their intuition of what was and was not a good move in this position. AlphaGo came up with this move not because it was a strategy directly programmed by the Google team, but because of reinforcement learning and self play, it was able to see this move as the best move to meet it's objective of winning. Interestingly, as pointed out in the movie, AlphaGo is not trying to maximize its lead at specific moments, but instead is looking for its best chance of winning, even if that win is by a small margin. So sometimes what looks like a passive move, is actually the best move for the long term goal. 

This scenario played out again in 2017 with Google's program named AlphaZero in chess, which was a variant of AlphaGo to be more generalized across different types of games including chess. A match was set up to play the strongest computer chess program at the time which was called Stockfish. AlphaZero made many surprising moves. Often it would play moves that initially looked passive, eschewing what was thought to be the best move, but when it did this type of move it led to a crushing vise like position over time. One commentator looking at the games afterwards said that when it made these types of unexpected moves was when as its opponent one should be the most afraid. And just like in Go, AlphaZero would give up temporary material advantage for its long term goal which was to simply win. Sacrificing material for an attack or a positional advantage is part of chess, but AlphaZero took that to a different level. Like AlphaGo, AlphaZero would decline going after an immediate advantage in position or material and take a smaller advantage if that smaller advantage had a greater possibility to win. It had a long horizon view. 

AlphaGo accomplished this feat through the three main ideas of reinforcement learning, Monte Carlo Tree Search, and deep neural networks. AlphaGo Zero begins with a neural network with random weights to represent both the policy and value function. During self-play, it uses Monte Carlo Tree Search to simulate each move. After the rollout concludes, it gathers data from the play and minimizes a loss function that incorporates this data for both the policy and value function. This process involves policy evaluation and policy improvement, effectively performing a step of policy iteration. Over time, Alpha Go develops an exceptionally strong policy. 

Its self-play generated a vast amount of data without requiring human expertise. It essentially learned from its own games. Doing this allowed it to in effect explore different strategies and exploit the most successful ones and recognize these patterns. A great book for understanding this is by Max Pumperia and Kevin Ferguson called Deep Learning and the Game of Go. I highly recommend this book not for just learning how you can possibly program a two player game like Go, but maybe more importantly, because it teaches you the ideas of deep learning neural networks, reinforcement learning, and Monte Carlo Tree Search from first principles showing the math and the needed Python code. And by first principles, I mean that it shows the python code and math for everything like back propagation without abstracting to PyTorch, so you really understand what's going on. And it is very accessible. 

However, Alpha Go is an example of narrow AI - an AI trained for a specific task. But could a more generalized model like a large language model show broad creativity. I think by now most people would answer "yes" because they have seen it write poetry, long essays, music, and generate images. But there are different levels of creativity - there is creativity "influenced" from other sources and then there are truly original ideas. Truly creative ideas are novel and innovative - they are paradigm shifts, often at conflict with long held beliefs. They upset established dogma and re-align the patterns with which we see the world. The shift in art during the Renaissance is an example of one such shift.

For an LLM, its creating derivative writing, music, and art - it's being influenced by the data it's been trained on. There's nothing particularly wrong with that kind of creativity - it is fairly easy to identify influences in human created popular music. The difference is popular music is based directly on real human experiences while LLM works are based on a compressed projection of human experiences onto language. 

When this LLM creativity goes off the rails, it is derisively referred to as hallucinations. And some will argue that these hallucinations are equivalent to the ability to imaginatively create novel ideas. But these hallucinations are no Joan of Arc type of hallucinations based on inspiration. Instead it's based on the GPT architecture of a "temperature" setting as a hyperparameter  to control the randomness of generated text. A hyperparameter that influences the probability distribution of the next words in the sequence. Temperature adjust the softmax function applied to the logits (raw predictions) before sampling. Lowering the temperature will make the model more confident in its predictions, leading to more deterministic and potentially repetitive outputs. Increasing the temperature will make the model more less confident, leading to more diverse and creative text. This doesn't sound like Galileo type inspired discovery. 

In the short story "The Library of Babel" by Jorge Luis Borges, he describes a library of hexagonal rooms that go on forever and because of this vastness contain every combination of letters and symbols, which means they contain every possible text that could ever be written, including all existing knowledge, all false information, and by definition all nonsensical information. Of course LLMs are better than what Borges imagined or the Infinite Monkey Theorem - that a monkey given an infinite amount of time will create the works of Shakespeare. But given that an LLM is predictive and isn't producing random text can we be satisfied with its creativity? That it's not creating ideas like a human does directly from experiences? I'll argue here that it doesn't matter how an LLM creates an idea. What matters is the outcome - how good is the idea. If we accept that argument, then what is missing from current LLMs to make them better at creating ideas in, for example, scientific discovery?

I think there are three ideas that are ongoing that should greatly expand the possibilities for novel creativity and scientific discovery: 1) Improve existing LLMs 2) Incorporate ideas beyond transformers 3) Architectures of AI agents.

Improve Existing LLMs

It has become readily apparent to everyone that competitive pressures are driving the pace of innovation in LLMs since the advent of GPT 3.5 and subsequent models. As LLMs improve, they become increasingly capable not only of generating original ideas but also of expanding upon them. The power of scaling these models is somewhat controversial, with debates on whether these models have hit or will soon hit an asymptotic limit. However, the seminal paper "Scaling Laws for Neural Language Models" by Kaplan et al. (2020) has been quite prescient in this regard. In this paper, the authors investigated the scaling laws governing LLM performance, particularly concerning cross-entropy loss (a measurement of how well the model performs). They found that the loss scales predictably as a power law with model size, dataset size, and compute used for training, with trends spanning more than several orders of magnitude.

The current state of the art models use about 1-3 trillion parameters, but models will move to 10 trillion and 100 trillion parameter models and beyond. Moreover, much discussion is now on just not quantity of data but quality of data. Smaller models have been created that can rival or outperform models trained on larger data just by being trained on higher quality of data. So the goal is to not just train on larger and larger data sets but high quality data sets. To that end, companies have entered into contracts to get access to media company data. In recent months these deals have included Reddit, The Financial Times, Barron's, Wall Street Journal, New York Post, The Atlantic, with many more to come. This is often written up as a way for these AI companies to get access to data without having accusations of potentially using data without permission. While there may be some truth to that, the real value is getting access to more high quality data.

Something that is often stated is that we are running out of data or that we are hitting a data wall in training models and that we need synthetic data. This keeps getting repeated like it's a truism that everyone should accept. This is just not true. In reality, data is constantly being created every second on social media, broadcast media, IoT devices, communication tools, internal business data, government data, and everyday human interactions are creating data. What is really meant is that we ran out of easily downloaded and scraped public data. But there is an infinite amount of potential data out there for models to be trained on - data is constantly being created. What is true is that data acquisition is becoming more costly. Synthetic data can help reduce that cost, but the financial incentives are also there to get the more costly data.

But even with improvements due to scaling and better data, much of the improvements in models have been not in replacing the transforming architecture but in some "low hanging fruit" type adjustments and in continuous refinements of Reinforcement Learning Human Feedback (RLHF).

With all of these improvements in existing LLMs, we will see these models exceed their training data with creativity. In an interesting paper that came out last week by Zhang et al. (2024), it is shown that LLMs "transcend" their training data. In this paper, an LLM model was trained on only chess data, but the data was chosen such that the individual games were no higher than a specific level. After training, this model was able to play at a level higher than any of the training data - it transcended the training data. The requirements were that for the individual chess games it trained on, there was a diverse set of positions and styles represented and then for inference time the temperature was set to be low. The explanation for the transcendence is that the model was drawing on a data set that was essentially a mixture of experts at a certain level and it surpassed those experts by putting together the knowledge of weak, non-overlapping learners to create a stronger model. This is a similar effect to how bagging and boosting ideas work in machine learning algorithms like random forest and gradient boost. 

All of these types of improvements could lead to the discovery of novel and creative insights.

Incorporate Ideas beyond transformers

There are many ongoing efforts to augment or replace the limitations of transformer autoregressive based LLMs. Limitations such as hallucinations or a lack of fidelity, the increase in prediction time quadratically as context lengthens, and a lack of long term planning are particularly problematic when it comes to expectations of an AI to creatively come up with novel ideas. I'll mention three such efforts to overcome these limitations:

1) Mamba

Mamba is part of an emerging class of models known as State Space Models (SSMs), offering a promising alternative to transformers in large language models (LLMs). Unlike transformers, which utilize the attention mechanism, Mamba employs a control theory-inspired state space model to facilitate communication between tokens while retaining multilayer perceptron (MLP) projections for computation. 

Mamba is capable of processing long sequence lengths efficiently by eliminating the quadratic bottleneck present in the transformer attention mechanism and offers much higher higher inference speeds. It uses a selective state space mode, which dynamically compresses and retains relevant information, leading lower memory requirements.

Having a model that is faster and handles memory better while it also doesn't lose fidelity in longer contexts is absolutely necessary for creating innovation.

2) Q* or LLM Combined with Monte Carlo Tree Refinement

In order for an AI to come up with novel ideas in needs to do better planning than what is currently possible in LLMs. It can appear that an LLM can do planning, because if given a problem it can describe steps to solve that problem. And if you use autonomous agents or chain of thought you can get it to build on some of that simulated planning it can do. But this is not the type of predictive planning that humans can do.

What a current LLM can do is to simulate what a human does when they make intuitive decisions. But an AI needs to have the ability to do long horizon prediction. It is the difference between an LLM doing first order thinking and not being able to do second order thinking.

First order thinking involves making decisions based on immediate and straightforward cause-and-effect relationships, focusing on short-term outcomes without considering deeper implications or longer-term effects. It is a direct, linear approach that is often reactive and simplistic. In contrast, second order thinking involves a more complex analysis of potential consequences, considering both immediate and indirect impacts over the long term. This type of thinking is proactive and strategic, anticipating the broader implications and potential ripple effects of decisions to ensure sustainable outcomes. 

Back in November, at the time of the shakeup at OpenAI, there were rumors of a Q* implementation as part of new AGI capabilities in a model they were testing. Now the reports of this were almost certainly overblown as part of the hysteria that was occurring at OpenAI at the time. But the idea of a way for an AI to do long term modeling and the basics of Q* or a Monte Carlo Tree Search has been around for a long time to do more predictive planning.

The hiring of Noam Brown by OpenAI last year is significant as Noam Brown is a leader in developing algorithms in the past that did this exact kind of multiple step type of prediction that can do real planning and self improvement.

A couple of weeks ago an interesting paper came out called "Accessing GPT-4 Level Mathematical Olympiad Solutions via Monte Carlo Tree Self-Refine with Llama-3 8B: a Technical Report." This paper shows a very small model outperforming a state of the art model in a "Q*" type of way that has a Monte Carlo approach that incorporates human-like trial and error.

In the paper, they construct a search tree using iterative processes of selection, self-refine, self-evaluation, and backpropagation, employing an enhanced Upper Confidence Bound (UCB) formula. The algorithm significantly improves success rates on mathematical Olympiad-level problems, advancing the application of LLMs in complex reasoning tasks. 

This integration of MCTS with LLMs as demonstrated by the MCTSr algorithm offers a robust approach to second-order thinking and planning. By systematically refining and evaluating solutions, this method provides deeper insights into the long-term consequences of decisions, making it a valuable tool for complex reasoning and strategic planning across various fields.

This is the type of deep planning approach that could be used to solve complicated problems, and if necessary, solve them with potentially unique solutions.

3) Energy Based Models

There has been much work in what are called "energy based models." The main motivation behind this type of model is that other approaches arguably fall short in achieving human-like autonomous intelligence. Current systems are limited by their reliance on supervised learning and reinforcement learning, which require extensive labeled data and trials. Also, there's an inefficiency in learning from limited data and inability to generalize across different tasks.

Although there have been many researchers working on energy models, some of the most promising work has been done by renowned AI researcher Yann Lecun, the Chief AI Scientist at Meta. In the influential paper he co-authored titled "Introduction to Latent Variable Energy-Based Models: A Path Towards Autonomous Machine Intelligence", they describe a framework where the dependencies between variables are represented through an energy function. Unlike probabilistic models, Energy Based Models (EBMs) do not require normalization and can handle multimodal distributions. Training EBMs involves creating an energy landscape such that compatible data points have low energy while incompatible ones have high energy. These models introduce latent variables to capture information not readily available from the observed variables, allowing the system to handle uncertainty and structured prediction tasks. 

The paper discusses contrastive methods (which scale poorly with data dimension) and regularized methods (which limit the volume of low-energy space and are more scalable). Furthermore, it discusses a "Joint Energy Prediction Architecture" (JEPA) This model combines the advantages of EBMs and latent variable models, using two parallel encoders to learn representations of the input data. These representations are then compared and used to predict future states. These JEPAs can be stacked in a hierarchical fashion enabling hierarchical planning. 

This structured approach to learning and prediction, allows for both fine-grained and abstract representations of data, which is crucial for achieving the complex, adaptive behaviors required for autonomous intelligence. or future AI systems that can learn autonomously, reason, and plan in a manner similar to human intelligence.

Architectures of Agents

The final direction that can enable a greater level of novel research and creativity is agentic frameworks. Currently there are many agent frameworks such as Autogen, CrewAI, and LangGraph to name a few that have different levels of autonomy. Agents can be created who can collaborate and work autonomously. They can be given tools like access to various APIs and write code all in the pursuit of an objective.

With agents collaborating and discussing possible ideas and improvements to ideas, a greater level of creativity can be achieved. Autonomous agents is a very promising field, which is only going to get better as models improve.

A variation of collaboration of agents is a paper that just came out called "Mixture-of-Agents Enhances Large Language Capabilities" by Wang et al. (2024). This methodology is designed to harness the strengths of multiple large language models (LLMs) for improved performance in natural language understanding and generation tasks. The MoA framework constructs a layered architecture where each layer consists of multiple LLM agents, which iteratively refine responses by using outputs from the previous layer. This approach achieves state-of-the-art performance on benchmarks. The MoA framework is inspired by the Mixture-of-Experts (MoE) technique. It consists of multiple layers, each with several LLMs acting as agents. Each agent generates responses based on inputs from the previous layer’s agents. The process iteratively refines responses until a high-quality output is achieved. Two main criteria guide the selection of LLMs for each layer: performance metrics and diversity considerations. 

Interestingly, a phenomenon emerges called "collaborativeness", where LLMs generate better responses when given outputs from other models. This improvement occurs even if the auxiliary responses are of lower quality. Through this approach, significant performance gains were observed and can be applied to various LLMs without requiring fine-tuning.

It's a promising approach that is similar to ideas like chain of thought, but much more flexible.

Final Thoughts

In this discussion, I explore how advancements in AI technology can lead to groundbreaking insights. The progress in large language models (LLMs), with their increasing scale and data quality, demonstrates the potential for AI to generate original ideas and surpass its training data. Integrating techniques beyond transformers, such as state space models and energy-based models, offers promising solutions to current limitations like hallucinations and lack of long-term planning. Additionally, incorporating sophisticated planning approaches, like Monte Carlo Tree Search, allows AI to engage in more complex, second-order thinking, essential for true innovation.

Furthermore, the development of agentic frameworks enables autonomous AI agents to collaborate and refine ideas collectively, enhancing their creative capabilities. The Mixture-of-Agents approach exemplifies this, where multiple AI models work together iteratively to produce high-quality outputs, showing that collaboration among AI agents can lead to superior performance.

As these technologies evolve, AI’s role transitions from a supportive tool to a potential partner in discovery, capable of contributing novel insights and driving paradigm shifts. The potential for AI to achieve true creativity and novel scientific discovery I believe is possible. And when AI can achieve superior creativity that is when artificial super intelligence (ASI) becomes possible as then AI can create ideas to improve itself.

The future of AI in scientific and artistic endeavors is bright, with the potential to redefine our understanding of creativity and innovation.

So ... Move 37


No comments:

Post a Comment

Elements of Monte Carlo Tree Search - Typical and Non-typical Applications

Monte Carlo Tree Search (MCTS) offers a very intuitive way of tackling challenging decision making problems. In essence, MCTS combines the...