billparker.ai: June 2024

Tuesday, June 25, 2024

Move 37

I was asked recently while I was working on some autonomous agents if in the course of their interactions they would come up with new ideas. I immediately thought well - Move 37 (I'll get to what I mean by that shortly). And I said I think these agents will be creative to the extent that based on how I have them set up and the nature of their "back stories" I had given them. But that brought up ideas I have thought a lot about which is the broader questions of can an AI and specifically LLM based models alone create novel insight? Can AIs do research? Can AIs do scientific discovery? How creative can AIs be? Can AIs improve themselves based on novel insights about themselves and objectives they need to pursue?

Let's take it further. Can AIs start the equivalence of the Scientific Revolution that occurred in the 16th and 17th century? Can AIs create a paradigm shift akin to something like what Thomas Kuhn described in his classic book The Structure of Scientific Revolutions? If you were to give a generative AI model training data that only consisted of pre-Renaissance art would it be able to make the jump in linear perspective to create paintings that would have volume and buildings and landscape that would recede with distance? The first painting below is "The Madonna and Child in Majesty Surrounded by Angels" (1280) by Cimabue. The second and third paintings are examples of paintings that have the perspective associated with the Renaissance: "The Tribute Money (1426-1427) by Masaccio and "The School of Athens" by Raphael (1510-1511). Could an AI paint like Raphael if it had only been trained on data before the Renaissance?

So what is Move 37?

After the conversation I had about autonomous agents and creativity, I started thinking about the debate about LLMs being able to do novel scientific discovery. And Move 37 is the first time I had ever seen a machine do something truly creative and emergent. So again what is Move 37? Move 37 refers to a move in game 2 in 2016 between Google's AI Go program named AlphaGo and Lee Sedol of South Korea a 9 dan player and one of the best, if not the best, player in the world at the time.

There is a great documentary on YouTube about this match and if you go to minute 49:00, you can see this move and the complete shock this move causes. A move that "no human would make" that at first looks like a mistake to all of the commentators. A move that even AlphaGo said that there was only a 1:10000 probability that a human would make that move.

Humans would not make this move based on their intuition of what was and was not a good move in this position. AlphaGo came up with this move not because it was a strategy directly programmed by the Google team, but because of reinforcement learning and self play, it was able to see this move as the best move to meet it's objective of winning. Interestingly, as pointed out in the movie, AlphaGo is not trying to maximize its lead at specific moments, but instead is looking for its best chance of winning, even if that win is by a small margin. So sometimes what looks like a passive move, is actually the best move for the long term goal.

This scenario played out again in 2017 with Google's program named AlphaZero in chess, which was a variant of AlphaGo to be more generalized across different types of games including chess. A match was set up to play the strongest computer chess program at the time which was called Stockfish. AlphaZero made many surprising moves. Often it would play moves that initially looked passive, eschewing what was thought to be the best move, but when it did this type of move it led to a crushing vise like position over time. One commentator looking at the games afterwards said that when it made these types of unexpected moves was when as its opponent one should be the most afraid. And just like in Go, AlphaZero would give up temporary material advantage for its long term goal which was to simply win. Sacrificing material for an attack or a positional advantage is part of chess, but AlphaZero took that to a different level. Like AlphaGo, AlphaZero would decline going after an immediate advantage in position or material and take a smaller advantage if that smaller advantage had a greater possibility to win. It had a long horizon view.

AlphaGo accomplished this feat through the three main ideas of reinforcement learning, Monte Carlo Tree Search, and deep neural networks. AlphaGo Zero begins with a neural network with random weights to represent both the policy and value function. During self-play, it uses Monte Carlo Tree Search to simulate each move. After the rollout concludes, it gathers data from the play and minimizes a loss function that incorporates this data for both the policy and value function. This process involves policy evaluation and policy improvement, effectively performing a step of policy iteration. Over time, Alpha Go develops an exceptionally strong policy.

Its self-play generated a vast amount of data without requiring human expertise. It essentially learned from its own games. Doing this allowed it to in effect explore different strategies and exploit the most successful ones and recognize these patterns. A great book for understanding this is by Max Pumperia and Kevin Ferguson called Deep Learning and the Game of Go. I highly recommend this book not for just learning how you can possibly program a two player game like Go, but maybe more importantly, because it teaches you the ideas of deep learning neural networks, reinforcement learning, and Monte Carlo Tree Search from first principles showing the math and the needed Python code. And by first principles, I mean that it shows the python code and math for everything like back propagation without abstracting to PyTorch, so you really understand what's going on. And it is very accessible.

However, Alpha Go is an example of narrow AI - an AI trained for a specific task. But could a more generalized model like a large language model show broad creativity. I think by now most people would answer "yes" because they have seen it write poetry, long essays, music, and generate images. But there are different levels of creativity - there is creativity "influenced" from other sources and then there are truly original ideas. Truly creative ideas are novel and innovative - they are paradigm shifts, often at conflict with long held beliefs. They upset established dogma and re-align the patterns with which we see the world. The shift in art during the Renaissance is an example of one such shift.

For an LLM, its creating derivative writing, music, and art - it's being influenced by the data it's been trained on. There's nothing particularly wrong with that kind of creativity - it is fairly easy to identify influences in human created popular music. The difference is popular music is based directly on real human experiences while LLM works are based on a compressed projection of human experiences onto language.

When this LLM creativity goes off the rails, it is derisively referred to as hallucinations. And some will argue that these hallucinations are equivalent to the ability to imaginatively create novel ideas. But these hallucinations are no Joan of Arc type of hallucinations based on inspiration. Instead it's based on the GPT architecture of a "temperature" setting as a hyperparameter to control the randomness of generated text. A hyperparameter that influences the probability distribution of the next words in the sequence. Temperature adjust the softmax function applied to the logits (raw predictions) before sampling. Lowering the temperature will make the model more confident in its predictions, leading to more deterministic and potentially repetitive outputs. Increasing the temperature will make the model more less confident, leading to more diverse and creative text. This doesn't sound like Galileo type inspired discovery.

In the short story "The Library of Babel" by Jorge Luis Borges, he describes a library of hexagonal rooms that go on forever and because of this vastness contain every combination of letters and symbols, which means they contain every possible text that could ever be written, including all existing knowledge, all false information, and by definition all nonsensical information. Of course LLMs are better than what Borges imagined or the Infinite Monkey Theorem - that a monkey given an infinite amount of time will create the works of Shakespeare. But given that an LLM is predictive and isn't producing random text can we be satisfied with its creativity? That it's not creating ideas like a human does directly from experiences? I'll argue here that it doesn't matter how an LLM creates an idea. What matters is the outcome - how good is the idea. If we accept that argument, then what is missing from current LLMs to make them better at creating ideas in, for example, scientific discovery?

I think there are three ideas that are ongoing that should greatly expand the possibilities for novel creativity and scientific discovery: 1) Improve existing LLMs 2) Incorporate ideas beyond transformers 3) Architectures of AI agents.

Improve Existing LLMs

It has become readily apparent to everyone that competitive pressures are driving the pace of innovation in LLMs since the advent of GPT 3.5 and subsequent models. As LLMs improve, they become increasingly capable not only of generating original ideas but also of expanding upon them. The power of scaling these models is somewhat controversial, with debates on whether these models have hit or will soon hit an asymptotic limit. However, the seminal paper "Scaling Laws for Neural Language Models" by Kaplan et al. (2020) has been quite prescient in this regard. In this paper, the authors investigated the scaling laws governing LLM performance, particularly concerning cross-entropy loss (a measurement of how well the model performs). They found that the loss scales predictably as a power law with model size, dataset size, and compute used for training, with trends spanning more than several orders of magnitude.

The current state of the art models use about 1-3 trillion parameters, but models will move to 10 trillion and 100 trillion parameter models and beyond. Moreover, much discussion is now on just not quantity of data but quality of data. Smaller models have been created that can rival or outperform models trained on larger data just by being trained on higher quality of data. So the goal is to not just train on larger and larger data sets but high quality data sets. To that end, companies have entered into contracts to get access to media company data. In recent months these deals have included Reddit, The Financial Times, Barron's, Wall Street Journal, New York Post, The Atlantic, with many more to come. This is often written up as a way for these AI companies to get access to data without having accusations of potentially using data without permission. While there may be some truth to that, the real value is getting access to more high quality data.

Something that is often stated is that we are running out of data or that we are hitting a data wall in training models and that we need synthetic data. This keeps getting repeated like it's a truism that everyone should accept. This is just not true. In reality, data is constantly being created every second on social media, broadcast media, IoT devices, communication tools, internal business data, government data, and everyday human interactions are creating data. What is really meant is that we ran out of easily downloaded and scraped public data. But there is an infinite amount of potential data out there for models to be trained on - data is constantly being created. What is true is that data acquisition is becoming more costly. Synthetic data can help reduce that cost, but the financial incentives are also there to get the more costly data.

But even with improvements due to scaling and better data, much of the improvements in models have been not in replacing the transforming architecture but in some "low hanging fruit" type adjustments and in continuous refinements of Reinforcement Learning Human Feedback (RLHF).

With all of these improvements in existing LLMs, we will see these models exceed their training data with creativity. In an interesting paper that came out last week by Zhang et al. (2024), it is shown that LLMs "transcend" their training data. In this paper, an LLM model was trained on only chess data, but the data was chosen such that the individual games were no higher than a specific level. After training, this model was able to play at a level higher than any of the training data - it transcended the training data. The requirements were that for the individual chess games it trained on, there was a diverse set of positions and styles represented and then for inference time the temperature was set to be low. The explanation for the transcendence is that the model was drawing on a data set that was essentially a mixture of experts at a certain level and it surpassed those experts by putting together the knowledge of weak, non-overlapping learners to create a stronger model. This is a similar effect to how bagging and boosting ideas work in machine learning algorithms like random forest and gradient boost.

All of these types of improvements could lead to the discovery of novel and creative insights.

Incorporate Ideas beyond transformers

There are many ongoing efforts to augment or replace the limitations of transformer autoregressive based LLMs. Limitations such as hallucinations or a lack of fidelity, the increase in prediction time quadratically as context lengthens, and a lack of long term planning are particularly problematic when it comes to expectations of an AI to creatively come up with novel ideas. I'll mention three such efforts to overcome these limitations:

1) Mamba

Mamba is part of an emerging class of models known as State Space Models (SSMs), offering a promising alternative to transformers in large language models (LLMs). Unlike transformers, which utilize the attention mechanism, Mamba employs a control theory-inspired state space model to facilitate communication between tokens while retaining multilayer perceptron (MLP) projections for computation.

Mamba is capable of processing long sequence lengths efficiently by eliminating the quadratic bottleneck present in the transformer attention mechanism and offers much higher higher inference speeds. It uses a selective state space mode, which dynamically compresses and retains relevant information, leading lower memory requirements.

Having a model that is faster and handles memory better while it also doesn't lose fidelity in longer contexts is absolutely necessary for creating innovation.

2) Q* or LLM Combined with Monte Carlo Tree Refinement

In order for an AI to come up with novel ideas in needs to do better planning than what is currently possible in LLMs. It can appear that an LLM can do planning, because if given a problem it can describe steps to solve that problem. And if you use autonomous agents or chain of thought you can get it to build on some of that simulated planning it can do. But this is not the type of predictive planning that humans can do.

What a current LLM can do is to simulate what a human does when they make intuitive decisions. But an AI needs to have the ability to do long horizon prediction. It is the difference between an LLM doing first order thinking and not being able to do second order thinking.

First order thinking involves making decisions based on immediate and straightforward cause-and-effect relationships, focusing on short-term outcomes without considering deeper implications or longer-term effects. It is a direct, linear approach that is often reactive and simplistic. In contrast, second order thinking involves a more complex analysis of potential consequences, considering both immediate and indirect impacts over the long term. This type of thinking is proactive and strategic, anticipating the broader implications and potential ripple effects of decisions to ensure sustainable outcomes.

Back in November, at the time of the shakeup at OpenAI, there were rumors of a Q* implementation as part of new AGI capabilities in a model they were testing. Now the reports of this were almost certainly overblown as part of the hysteria that was occurring at OpenAI at the time. But the idea of a way for an AI to do long term modeling and the basics of Q* or a Monte Carlo Tree Search has been around for a long time to do more predictive planning.

The hiring of Noam Brown by OpenAI last year is significant as Noam Brown is a leader in developing algorithms in the past that did this exact kind of multiple step type of prediction that can do real planning and self improvement.

A couple of weeks ago an interesting paper came out called "Accessing GPT-4 Level Mathematical Olympiad Solutions via Monte Carlo Tree Self-Refine with Llama-3 8B: a Technical Report." This paper shows a very small model outperforming a state of the art model in a "Q*" type of way that has a Monte Carlo approach that incorporates human-like trial and error.

In the paper, they construct a search tree using iterative processes of selection, self-refine, self-evaluation, and backpropagation, employing an enhanced Upper Confidence Bound (UCB) formula. The algorithm significantly improves success rates on mathematical Olympiad-level problems, advancing the application of LLMs in complex reasoning tasks.

This integration of MCTS with LLMs as demonstrated by the MCTSr algorithm offers a robust approach to second-order thinking and planning. By systematically refining and evaluating solutions, this method provides deeper insights into the long-term consequences of decisions, making it a valuable tool for complex reasoning and strategic planning across various fields.

This is the type of deep planning approach that could be used to solve complicated problems, and if necessary, solve them with potentially unique solutions.

3) Energy Based Models

There has been much work in what are called "energy based models." The main motivation behind this type of model is that other approaches arguably fall short in achieving human-like autonomous intelligence. Current systems are limited by their reliance on supervised learning and reinforcement learning, which require extensive labeled data and trials. Also, there's an inefficiency in learning from limited data and inability to generalize across different tasks.

Although there have been many researchers working on energy models, some of the most promising work has been done by renowned AI researcher Yann Lecun, the Chief AI Scientist at Meta. In the influential paper he co-authored titled "Introduction to Latent Variable Energy-Based Models: A Path Towards Autonomous Machine Intelligence", they describe a framework where the dependencies between variables are represented through an energy function. Unlike probabilistic models, Energy Based Models (EBMs) do not require normalization and can handle multimodal distributions. Training EBMs involves creating an energy landscape such that compatible data points have low energy while incompatible ones have high energy. These models introduce latent variables to capture information not readily available from the observed variables, allowing the system to handle uncertainty and structured prediction tasks.

The paper discusses contrastive methods (which scale poorly with data dimension) and regularized methods (which limit the volume of low-energy space and are more scalable). Furthermore, it discusses a "Joint Energy Prediction Architecture" (JEPA) This model combines the advantages of EBMs and latent variable models, using two parallel encoders to learn representations of the input data. These representations are then compared and used to predict future states. These JEPAs can be stacked in a hierarchical fashion enabling hierarchical planning.

This structured approach to learning and prediction, allows for both fine-grained and abstract representations of data, which is crucial for achieving the complex, adaptive behaviors required for autonomous intelligence. or future AI systems that can learn autonomously, reason, and plan in a manner similar to human intelligence.

Architectures of Agents

The final direction that can enable a greater level of novel research and creativity is agentic frameworks. Currently there are many agent frameworks such as Autogen, CrewAI, and LangGraph to name a few that have different levels of autonomy. Agents can be created who can collaborate and work autonomously. They can be given tools like access to various APIs and write code all in the pursuit of an objective.

With agents collaborating and discussing possible ideas and improvements to ideas, a greater level of creativity can be achieved. Autonomous agents is a very promising field, which is only going to get better as models improve.

A variation of collaboration of agents is a paper that just came out called "Mixture-of-Agents Enhances Large Language Capabilities" by Wang et al. (2024). This methodology is designed to harness the strengths of multiple large language models (LLMs) for improved performance in natural language understanding and generation tasks. The MoA framework constructs a layered architecture where each layer consists of multiple LLM agents, which iteratively refine responses by using outputs from the previous layer. This approach achieves state-of-the-art performance on benchmarks. The MoA framework is inspired by the Mixture-of-Experts (MoE) technique. It consists of multiple layers, each with several LLMs acting as agents. Each agent generates responses based on inputs from the previous layer’s agents. The process iteratively refines responses until a high-quality output is achieved. Two main criteria guide the selection of LLMs for each layer: performance metrics and diversity considerations.

Interestingly, a phenomenon emerges called "collaborativeness", where LLMs generate better responses when given outputs from other models. This improvement occurs even if the auxiliary responses are of lower quality. Through this approach, significant performance gains were observed and can be applied to various LLMs without requiring fine-tuning.

It's a promising approach that is similar to ideas like chain of thought, but much more flexible.

Final Thoughts

In this discussion, I explore how advancements in AI technology can lead to groundbreaking insights. The progress in large language models (LLMs), with their increasing scale and data quality, demonstrates the potential for AI to generate original ideas and surpass its training data. Integrating techniques beyond transformers, such as state space models and energy-based models, offers promising solutions to current limitations like hallucinations and lack of long-term planning. Additionally, incorporating sophisticated planning approaches, like Monte Carlo Tree Search, allows AI to engage in more complex, second-order thinking, essential for true innovation.

Furthermore, the development of agentic frameworks enables autonomous AI agents to collaborate and refine ideas collectively, enhancing their creative capabilities. The Mixture-of-Agents approach exemplifies this, where multiple AI models work together iteratively to produce high-quality outputs, showing that collaboration among AI agents can lead to superior performance.

As these technologies evolve, AI’s role transitions from a supportive tool to a potential partner in discovery, capable of contributing novel insights and driving paradigm shifts. The potential for AI to achieve true creativity and novel scientific discovery I believe is possible. And when AI can achieve superior creativity that is when artificial super intelligence (ASI) becomes possible as then AI can create ideas to improve itself.

The future of AI in scientific and artistic endeavors is bright, with the potential to redefine our understanding of creativity and innovation.

So ... Move 37

Sunday, June 23, 2024

Thoughts on the Article: "My Last Five Years of Work"

An interesting article came out last month titled "My Last Five Years of Work" by Avital Balwit who is Chief of Staff to the CEO at Anthropic. The article is a good companion piece to the paper that recently came out titled "Situational Awareness" by Leopold Aschenbrenner as they both concern the rapid change that AI will affect on society in a few short years - and that most people will not be prepared for it.

The main point of her article is that advancements in AI are rapidly progressing towards a point where they may render many traditional forms of employment obsolete in just a few years. These systems will be capable of performing tasks previously reserved for humans, particularly in knowledge-based fields. And we as a society are going to have to grapple with the potential psychological and social impacts of widespread unemployment. Can people find happiness and meaning in a future where work is no longer central to their lives? Will AI cause existential crisis for people?

To further explore the implications of this shift, she writes that both too little and too much discretionary time can negatively impact a person's well-being, with moderate amounts being ideal and cites a paper by Sharif et al. (2021) that concludes that more discretionary time can be beneficial, or at least not harmful, if it is spent on social or productive leisure activities. Of course this would vary widely from individual to individual and also depend on how much they enjoyed traditional employment. And would depend on how a person utilizes free time, rather than the amount of free time they have. She gives examples of engaging in activities like exercising, spending time with family, and socializing that can lead to positive well being.

However, the assumption being made here is that AI will not create new types of jobs. But every major technological change in history has created new jobs. The nature of what types of jobs are valued changes. Many of these types of jobs that are created over a long time period are simply not imaginable at the time of the innovation. Also, I believe AI will create some degree of new wealth and if AI's promises are fulfilled, a more equitable distribution of wealth which will result in an overall greater demand for goods and services - some of these goods and services will not be AI oriented.

If I were to speculate on what types of jobs would not be in danger in the first wave of an AI economic major shift, it would be physical jobs where a human touch is valued, and she makes a similar point in her paper. So even though an AI based robot could be created to replace many nursing tasks and an AI could have an incredibly empathetic demeanor, nursing, and the medical field in general, is an occupation where many people value a human connection.

I also think there will be an increased value placed on human created work, even when an AI could do something as well as a human or even better than a human. For example, Gary Kasparov, world chess champion, lost to Deep Blue in 1997. And since then chess computers have continued to get better and no one thinks that a grand master today is able to consistently beat the best computer. However, that hasn't ended the popularity of chess. In fact, chess is probably at its highest popularity now with sites like lichess and chess.com as people want to play other people who have the same skill level as them. And the greatest grand masters' games such as those of Magnus Carlsen are followed as closely as any great grand master in the past.

Another example I believe is creative work. AI created fiction, poetry, music, and soon film is something that has caused much consternation among creatives, but I believe that once the market becomes saturated with AI created work, humans will create a demand for work that can be verified as human created. Even though an AI can create art or fiction as well or better than an human, it was not created from specific human experiences and emotions that happened to real humans. I believe works that have autobiographical content in them will be especially highly valued - because those experiences cannot be artificially created.

Beyond these examples where people will have access to an AI generated product or service but instead opt, at least in some cases, for the more human experience, I think humans will create interactions that are particularly human in ways that might not be apparent yet.

Another issue I have is the timeline. I don't disagree with the prospect that AGI is very close that both Avital is making and that Leopold makes in his paper, but I do disagree with a vision of a rapid transition to a workless future driven by AI advancements in that it may not fully account for the intricate realities of implementing new technologies across diverse industries.

Stuff just doesn't happen fast across an economy with complex interdependencies. Large company bureaucracies, supply chains, government red tape, costs of manufacturing re-tooling can be slower than expected even when there is an obvious advantage to a new technology.

Historically, the adoption of breakthrough technologies has been a slow and uneven process due to existing complexities and interdependencies within the economy. For instance, the Industrial Revolution saw the gradual spread of stream power and mechanized manufacturing over several decades, hindered by high costs and the need for worker retraining. Similarly, the transition from steam to electric power in factories took many years, with widespread use only achieved in the 1920s and 1930s. The adoption of computer technology, which began in the 1950s, only became widespread in offices by the 1980s and 1990s due to integration challenges and the need for new workflows. The internet, commercially available in the 1990s, took nearly two decades to become a ubiquitous business tool as supporting infrastructure and services developed slowly.

Even with the contemporary example of AI, despite significant advancements, its widespread adoption remains uneven across sectors up to this point constrained by integration challenges and ethical concerns. We would have to believe that the attainment of AGI, leading to the elimination of most employment in a few short years, would be able to sidestep these historical adoption challenges - which seems unlikely.

Okay, so I do think the article is very good and thought provoking and I do think we are in a period of accelerating technological change - which is very exciting. But I don't think that we will all be forced to take up surfing because there will be no work left to do.

PyCon 2024

The main days of PyCon 2024 was held in Pittsburgh over the weekend of May 17-19. This year I stayed at the hotel connected to the conference center, which makes everything much more convenient. The hotel was connected to the David L. Lawrence Convention center by a walkway over Penn Street. I will admit though that after getting to the hotel I did get lost trying to find this walkway. But this is pretty much par for the course as it takes about a day to get used to navigating the hotel and where everything is in the convention center.

Day 1, Friday, May 17th

On Friday, the first talk I attended was by Jeff Glass, who introduced interactive documentation with PyScript. Imagine having a REPL right inside your documentation! Jeff demonstrated how developers can integrate interactive Python sessions into any webpage using PyScript. This not only enhances user engagement but also makes learning more dynamic and practical. It was fascinating to see how tools like Sphinx, MkDocs, and readthedocs can be seamlessly integrated, making the examples in your documentation come alive. The concept of a fully serverless interactive learning session is something that I found particularly interesting.

Next up was Swarna Swapna Ilamathy’s session on tackling the issue of dead code. She shed light on the importance of identifying and removing unused code to maintain project health and agility. The session was packed with practical solutions, focusing on Python libraries like Vulture and Deadcode. Swarna emphasized the significance of integrating these tools into CI/CD pipelines for streamlined dead code removal.

By the time of the next talk, I had started to become comfortable with finding my way around the conference center. The next talk was about Ruff, a Python linter and code formatter written in Rust. I have attended talks before that talked about Ruff and was impressed with it. It promises to replace dozens of static analysis tools while being exponentially faster. The speaker, Charlie Marsh, took us through the internals of Ruff, highlighting specific optimizations and design decisions that contribute to its remarkable performance. Learning about how Ruff powers static analysis for major Python projects like NumPy and Pandas was inspiring. Even though Ruff is written in Rust, Charlie made the session accessible to everyone, focusing on the broader aspects of building performant developer tools.

By this time, I was getting pretty hungry and headed down to the convention hall for lunch. They had a lot of great options. After I finished eating I wandered around the hall listening to the various exhibitors. The big names were there Snowflake, Microsoft, MongoDB, Microsoft, Nvidia, Google, JetBrains and many more. Most were doing scheduled quick demos like Microsoft doing demos on Github Copilot and Snowflake with Streamlit.

After lunch, the first session I went to was Sydney Runkle's talk on Pydantic. As a software engineer at Pydantic, she was well versed in talking about the optimizations available with Pydantic V2, now leveraging Rust for core validation logic. The performance improvements seemed substantial, with speedups ranging from 5 to 50 times compared to V1. Sydney shared a range of tips, from one-line fixes to larger design modifications, to get the most out of Pydantic. I think her focus on tagged unions for efficient validation of union types was particularly useful.

Next up was a talk on game development by Esther Alter. This is a topic I'm not well versed in but it was fun and informative. She walked us through the process of developing a procedurally-generated monster collecting game, which she successfully released on Steam. Unlike typical game dev talks that focus on minimal examples, Esther covered a wide array of topics including fast development, robust testing, unique graphical effects, and engaging gameplay. Her insights into why Python is an excellent choice for game development were eye-opening, making me consider Python for future game projects.

The next talk I went to was by Valerio Maggio’s talk which expanded on Jeff Glass’s earlier session, exploring the use of PyScript for interactive documentation. Valerio emphasized the importance of good documentation in open-source projects and demonstrated how PyScript can enhance the interactivity of software docs. With features like multiple interpreters, a Pythonic API for the DOM, and non-blocking execution threads, PyScript offers a robust platform for creating engaging documentation. Seeing these features in action was particularly useful, showcasing how they can transform static documentation into a dynamic learning environment.

The final talk of the day was by Juan Altmayer Pizzorno, who introduced SlipCover, a new tool for Python code coverage with minimal overhead. Traditional tools like coverage.py can slow down tests significantly, but SlipCover reduces this overhead to just 5%.

Day 2, Saturday, May 18th

Day two started with a session by Junya Fukuda, who talked about the intricacies of using asyncio for event-driven programming. Junya shared insights on how Python’s asynchronous capabilities are harnessed to streamline decision-making processes in robotics. The talk covered various asynchronous libraries, including trio, trio-util, anyio, and asyncio. It was cool to see how these tools can operate efficiently within a single thread, making them ideal for resource-constrained environments.

As a heavy FastAPI user, I was very interested in Mia Bajić’s talk on combining Django ORM with FastAPI. Mia shared her unique experience of using Django ORM within a FastAPI application, a setup not commonly documented. She highlighted the practical implementation, benefits, and challenges of this hybrid approach. The session was aimed at developers of all levels and provided valuable insights into integrating asynchronous and synchronous frameworks. This talk made me think about some possibilities for leveraging the strengths of both Django and FastAPI in a single application.

Next up for me was a session by Sebastián Flores who presented an engaging session on enhancing presentations with Python and data storytelling. He showcased various tools and libraries, such as Jupyter Notebook, RISE, Streamlit, and Quarto, alongside graphic libraries like matplotlib, seaborn, altair, PowerBI, Tableau, and Vizzu. This talk was great in that it encouraged people to think beyond traditional PowerPoint slides. I personally think PowerPoint is one of those necessary evils but it is what everyone expects, but data communication should be so much more than talking over a bunch of slides.

It was now time for lunch, I quickly grabbed lunch on the convention floor and headed outside to explore downtown a little bit. I was particularly searching for a bookstore and had seen on my phone that there was a Barnes & Noble within a few blocks of the convention center that I think is associated with Duquesne University. I was looking for something to read on the plane ride home. Unfortunately, there was really nothing in the store. The basement had textbooks and supplies for the school and the top floor was just some shelves of books that were discounted. So I left empty handed. I'm sure there's a bookstore somewhere downtown if I had more time to find it.

After lunch I decided to go to Jodie Burchell’s session on large language models (LLMs) addressed the critical issue of hallucinations in AI. I honestly almost skipped this and went to a talk about using LLMs as part of graph flow, but the talk had a lot of interesting information she had compiled about the statistics of types of hallucinations. Jodie explained the reasons behind these hallucinations and introduced tools like TruthfulQA, Hugging Face’s datasets, transformers packages, and LangChain to measure and mitigate them. The talk also covered retrieval augmented generation (RAG) as a technique to reduce hallucinations and improve the reliability of LLMs.

I'm very familiar with the set of data tools in the next talk and am a big fan of both Dask and Polars. Patrick Hoefler’s talk compared the latest developments in Pandas and Dask with other big data tools like Spark, DuckDB, and Polars. Patrick highlighted the improvements in Dask, including a new shuffle algorithm, a logical query planning layer, and a more efficient data model thanks to Pandas 2.0. The session used TPC-H benchmarks to demonstrate the performance gains and robustness of the new Dask DataFrame.

The next talk I went to was different. It was a fun mix of art and mathematics. Alastair Stanley demonstrated how the principles of paper folding can be used to perform complex calculations, from basic arithmetic to solving cubic equations and proving irrationality. Using a custom Python library to simulate fold sequences, Alastair showed the surprising power of origami in computational contexts. This session was very cool.

The day concluded with Naveed Mahmud’s session on hybrid quantum-classical machine learning using Qiskit. I have had an interest in quantum computing for years and in particular Qiskit, so I was very interesting to see Naveed's use. Naveed explained how to build quantum circuits and integrate them with classical models for tasks like text classification and sentiment analysis. I thought sentiment analysis was an interesting test case for quantum computing and machine learning.

Day 3, Sunday, May 19th

The first talk I attended on the third day was by Alla Barbalat, who tackled the critical issue of intellectual property (IP) in the context of AI-generated code. As AI becomes more prevalent in code generation, the question of ownership arises. Alla addressed the legal implications of using open-source code as training data and the potential infringement of IP rights by AI systems. This talk was particularly enlightening, as it demystified the complex legal landscape surrounding AI and IP law.

Next up was Saksham Sharma’s session on optimizing Python performance with Cython. It was a deep dive into the practicalities of speeding up Python code. Saksham demonstrated how to write Python data analysis code in Cython, explore the generated code, and run microbenchmarks to understand performance bottlenecks. It was interesting to see how simple hints to Cython can significantly enhance performance.

The final session I went to was Arun Suresh Kumar’s talk on synchronous and asynchronous programming in Python was highly informative. Arun explained the key differences between sync and async programming, explored tools like Uvicorn and uvloop, and provided benchmarks to compare their performance. The session also covered the evolution and importance of ASGI and WSGI servers in Python web development. Practical insights from real-world scenarios highlighted effective strategies for both sync and async programming.

At that point, it was time to head back home, but not before picking up my PyCon 2024 t-shirt. I don't think I have a purple t-shirt, but I really like this one.

Wednesday, June 19, 2024

Ilya Sutskever's New Company

Ilya Sutskever's new company announced today - Safe Superintelligence Inc. Ilya, as a cofounder at OpenAI, is arguably one of the biggest visionaries around scaling and AI. This one page site has to be up there as one of the best company launches ever. One page with no styling that looks like it came out of the 90's with a mission to create "safe" Artificial Super Intelligence.

Thursday, June 13, 2024

Thoughts on the Paper "Situational Awareness" by Leopold Aschenbrenner

So if you haven't read the paper Situation Awareness: The Decade Ahead by Leopold Aschenbrenner you should. Everyone should. This is not to say you need to agree with every point he makes, but by reading papers like this it will make you more mentally prepared to think about the accelerating period of change that we are in. We are going through a technological shift of rapid change that people are finding difficult and are not planning for it accordingly. Because most humans are resistant to change or if they do have expectations of change it's gradual change or linear change and are not prepared for the explosion of change that is possible with the advent of AGI or artificial super intelligence (ASI). There is too much thinking that is looking at the current state and believing that upcoming changes will be gradual. And that what we have will be improvements to "chatbots" and building wrappers around GPT when the reality is that people need to start thinking about AI becoming "agentic" and the impact of "drop in" technical workers and every part of the landscape changing. So it's okay to disagree with parts of his reasoning or emphasis, but this paper does serve the purpose for helping realign mindsets.

In the paper, he is predicting AGI by 2027 and ASI by 2028-2030.

But before we get into his predictions, who is Leopold Aschenbrenner? He graduated from Columbia University at the age of 19 as valedictorian. Worked at OpenAI until being fired for "leaking" information - those back and forth allegations between them are debatable, but both sides were probably at fault or it could have been handled better. But regardless, since then he has been a prolific writer on the long term future of AI and has founded an investment company focused on AGI. Here is a very good recent interview.

So the prediction of AGI by 2027, I think the prediction of around 2027 for AGI, give or take a year, is very reasonable. Part of the issue of any prediction of AGI, is that the definition of AGI has always been pretty fuzzy. I'll save my thoughts on a discussion of the definitions of AGI for another post, but suffice it to say there is no definitive test for something being AGI, but what I expect is that around 2025-2026 what many people will be experiencing when working with these models is that they "feel" like AGI. Then there will be more improvements to the point that around 2027 a majority of reasonable people will be agreeing that AGI is here.

What I do disagree with is that after AGI arrives that almost immediately ASI models will start to show up. This is the hard takeoff scenario that many people talk about. It seems like to me there is going to have to be several breakthroughs needed before ASI, which might possibly could be in a year after AGI, but more likely much longer - maybe 10 - 20 years after AGI.

Regardless, Aschenbrenner predicts an "intelligence explosion" where AI systems rapidly evolve from human-level intelligence to vastly superhuman capabilities. He says this transition could happen extremely quickly, potentially within less than a year. If true, the implications would be staggering, suggesting a world where AI can automate AI research, compressing decades of progress into mere months.

This superintelligence could be "alien" in its architecture and training processes, potentially being incomprehensible to humans. This could lead to scenarios where humans are entirely reliant on these systems, unable to understand their operations or motivations.

He talks quite a bit about orders of magnitude "OOMs" to project progress, but my disagreement here is that technological progress of any kind is never smooth curve fitting. There are often plateaus, and sometimes these plateaus can be quite long.

He also makes one of his more astonishing economic claims of the projection of trillion-dollar compute clusters by the end of the decade. This massive industrial mobilization would involve significant increases in U.S. electricity production and the construction of large-scale GPU datacenters, which begs the question of energy as a limiting factor to all of this, which is a point Mark Zuckerberg has recently made.

As part of OpenAI's "super alignment" team he worked on ways of understanding on how the models are working internally and how they can be aligned to meet human objectives. He is of the firm belief that "super alignment" is possible. Much has been made of the abandonment in OpenAI of their alignment team with people like the brilliant Ilya Sutskever and Jan Leike leaving. I suspect that, yes they left because of perceived lack of commitment of OpenAI management to alignment and wanting more resources than what they were given, but I think the realization is that something like "super alignment" is not possible. I base this on reading some of their work and listening to them talk about what they had done for "super alignment" and it doesn't seem like much progress. And I know their group was formed with the mission that "super alignment" was a multi-year project, but still the progress doesn't seem like it was keeping pace with their expectations or the pace in general around the models. I'm not disagreeing that alignment isn't possible or guardrails needed, but some idea of "super alignment" may not be possible and that was the source of some of the conflict in OpenAI. So I think he is still overly optimistic in this paper about alignment and especially about achieving something like "super alignment."

The paper also underscores the importance of securing AI labs and their research from state-actor threats. He believe that currently leading AI labs treat security as an afterthought, leaving critical AGI secrets vulnerable. The effort to secure these secrets against espionage and other threats will be immense. Along these lines, he explores the geopolitical ramifications of superintelligence. The economic and military advantages converred by superintelligent systems could be decisive, raising concerns about an all-out race of even conflict with authoritarian regimes like China. The survival of democracy could hinge on maintaining technological preeminence and avoiding self-destruction.

He suggests that American AI labs are currently vulnerable, with inadequate security measures akin to "random startup security." The notion that Chinese intelligence could easily infiltrate these labs and exfiltrate critical AGI research is presented as a significant risk. Not to dismiss the threat of China, but he has been criticized for hyper focusing on China, when there are no shortage of governments who could be bad actors. But this is probably the best articulation I've read on the possible security threats.

He then continues this line of thought by saying that as the race to AGI intensifies, national security will become increasingly involved. By 2027/28, a government-led AGI project may emerge, as no startup can handle the complexities and risks associated with superintelligence. This scenario will likely involve unprecedented levels of mobilization and coordination within the US national security apparatus.

The paper foresees a scenario where the U.S. government takes over AGI projects, forming a "Manhattan Project" for AI by 2027/28. This would involve national security forces and intense governmental oversight to ensure the United States remains a leader in AGI development, securing AI research against espionage and potential misuse. However, I think this is really optimistic on how quickly government can move to do anything, let alone something complicated and technical like AI. I do believe the idea of AGI will become a very political issue and there will be "populist" politicians who will try and take advantage of it as a divisive issue especially to capitalize on people's fear of change - fear that will be very rational when supported by employment upheavals. But the idea that government will move at the same pace as the accelerating speed of AI seems naive.

Whatever points in this document you may agree with or disagree with, it's to be commended for its amount of detail and how thought provoking it is in considering both the extraordinary potential and the significant risks. What is important is that it challenges us to consider the rapid pace of AI development and the profound changes it could bring to society, security, and global dynamics. While some of the claims may seem far-fetched, they underscore the importance of proactive measures and robust strategies to navigate the coming period of AI advancements.

Wednesday, June 12, 2024

GPT-4o and Google I/O 2024

It's been a few weeks since both the OpenAI announcements and Google I/O which has given us some time to step back and assess the big announcements at both these events.

Google introduced several new AI products and features, including an AI-powered search engine, AI helpers in Workspace apps (Gmail, Drive, Docs, etc.), and a future AI vision called Project Astra. Gemini, Google's AI model, was featured prominently with updates like Gemini Live, Gemini Nano, and Gemini 1.5 Pro, showcasing capabilities in voice interaction, image analysis, and on-device AI functionalities.

They also announced an expanded context window of 2 million tokens that will be rolled out at some point. Right now the context windows is 1 million tokens which is much larger than either OpenAI's or Anthropic's largest token windows of 128,000 and 200,000 respectively. With 2 million tokens that would be about 64,000 lines of code, 1.4 million words, or a couple hours of video. In a blog post in February, they talked about internally testing a 10 million context window.

If these large context windows can keep track of all of that information without having to do retrieval augmented generation (RAG), then obviously they have major implications from everything code development to video summarization. For example, a developer tasked with a large feature could give a very large code base, design documents, and even mock ups and have it create the code for that major feature with the developer acting more as a code reviewer. And to work with video, large context windows are a necessity.

However, this expansion of the context window is only available in the Gemini 1.5 Pro model for now and then in private preview and then generally available later in the year. And there's the main problem with most of Google's announcements in that they won't be available until some time later in the year. With so many announcements happening, that's far too distant in the future to be touting most of your new AI features.

OpenAI's announcement mainly centered around their multi-modal capabilities and the speed of GPT4o. After using it regularly since the announcement, it does seem to be slightly better than GPT4-Turbo. Having it be able to work across voice, audio, and text with the same API endpoint is very welcome, instead of having a different API model for vision. I have also been using the released Mac App which is a nice convenience.

By this point, everyone has probably seen the voice and vision capabilities of GPT4o, which were crazy. The multi-modal capabilities because it's trained on sound, images, and text together means things like not just improved speed means variations in sound within voice context and understanding emotion and sound/voice speed. Like everyone with OpenAI, they don't say exactly how they accomplished, but it clearly looks like a different architecture from GPT 3.5/4.0. To get a rough idea of how this is probably being done we can look at some other projects like Chameleon from Meta that uses "early fusion" to train on different modalities.

However, this new voice capability wasn't released at the announcement, but it's supposed to be this month. It's half way through June and it's not here yet, but I'm very much looking forward to it when it is released. I currently have code that's based on the existing models that do voice->text->voice, which caused some delay and getting rid of that delay would be a big deal. Of course, it opens up a lot open possibilities beyond just being faster.

With both Google's and OpenAI's big announcements these are exciting times to be working in AI - and that's not to mention the great things happening at Anthropic and in open source. What will happen in the next six months? Who knows? But I expect Google to release what they previewed at IO and OpenAI to release the voice features. Will OpenAI release a new model before the end of the year? The rumored 5.0 or maybe they call it 4.5? Or they release incremental improvements to 4.0o. No one knows for sure, but I believe OpenAI will release a new model by October or November that is a significant jump in capabilities.

Oh, and that's not to ignore the partnership between OpenAI and Apple - "Apple Intelligence" and everything that partnership means, which immediately caused Elon Musk to throw a tantrum. But that would be a topic for another post.

billparker.ai