Monday, September 9, 2024

"Superhuman" Forecasting?

This just came out from the Center for AI Safety called Superhuman Automated Forecasting. This is very exciting to me, because I've been saying this was possible for quite some time. It could probably be done even better and by next year it will be much better, because I believe at some point the training cutoff date will be eliminated through ideas like Mixture of a Million Experts, so it won't have to rely on Web searches. And the reasoning capabilities will be much better in the models coming out by end of year or next year.

Combine this idea with autonomous agents who I believe can do simulated market research from traditional questionnaire research to focus groups and the costs of market research plummets in addition to getting real-time insights. Right now scaling autonomous agents up to a large number can be difficult without them losing sight of their goals, but that will soon change and the number of agents will scale from the tens to the millions - all built from sampled detailed data.

Here is another paper on the topic that I need to dig in on called "Approaching Human-Level Forecasting with Language Models."

Oh, and CAIS is calling it "539", which is hilarious.

Friday, August 9, 2024

IRAK4: The Immune System’s Key Player and Its Growing Role in Cancer Therapy

In our work in using artificial intelligence in small molecule drug discovery, we look at potential protein targets for specific disease indications and one of the more interesting potential targets we've been working on is called IRAK4 (interleukin 1 receptor associated kinase 4). 

But before I get into some of the interesting characteristics of IRAK4, let's step back and talk about the general role of proteins in disease. Proteins are the workhorses of the cell, playing essential roles in just about every biological process you can think of. Whether they’re signaling, transporting, or catalyzing reactions, proteins are critical to keeping our bodies functioning properly. Because of their central role, they often become prime targets for new drugs, especially when things go wrong and contribute to diseases.

So, IRAK4 is one of these proteins that is a key player in our immune system, especially when it comes to signaling inside our cells. It’s really important when our body needs to respond to threats, working with Toll-like receptors (TLRs) and interleukin-1 receptors (IL-1R) to kickstart our immune response. Basically, when these receptors get activated, IRAK4 teams up with other proteins like IRAK1/IRAK2 and MyD88 to form this big complex called the myddosome. This whole setup then triggers signals that lead to the activation of pathways like NF-κB and MAPK, which are very important for inflammation, cell survival, and keeping our immune system on point.

Now, while IRAK4 is essential for keeping our immune system in check, things can go south if its signaling gets messed up. This has been linked to various cancers, especially blood-related ones like myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML). In these cases, the IRAK4 pathway gets overactive, helping cancer cells survive and multiply. Recent studies have found that mutations in certain splicing factors (like U2AF1 and SF3B1) can lead to the production of a longer and more active version of IRAK4, called IRAK4-L, which really cranks up the NF-κB signaling.

Because of its role in cancer, scientists are now eyeing IRAK4 as a potential target for new treatments. Early studies have shown that blocking IRAK4 can have anti-cancer effects in different models. Plus, when you combine IRAK4 inhibitors with other treatments (like FLT3 inhibitors in AML or BTK inhibitors in certain lymphomas), they seem to work even better together. This has led to the search of small molecules that bind to to the protein IRAK4 to inhibit its abnormal activity.

Research by Dr. Daniel Starcyzynowski at Cincinnati Children's Hospital and many others are showing that IRAK4 inhibition on its own or in combination with other drugs in various blood cancers show potential for treating diseases like non-Hodgkin lymphomas, AML, and high-risk MDS. Furthermore, if a drug or combination of drugs can target the related other kinases of FLT3 and CLK there is potentially an even greater benefit.

But it’s not just blood cancers that might benefit from IRAK4 inhibition. Researchers are also looking into its role in solid tumors, especially tough ones like pancreatic ductal adenocarcinoma (PDAC) and colorectal cancer. In these cases, IRAK4 activation has been linked to worse outcomes and resistance to treatment, which means blocking IRAK4 could be a new way to tackle these cancers.

IRAK4’s role in immune signaling and inflammation also means it could be useful beyond just cancer. It might even help in treating other inflammatory or autoimmune conditions such as rheumatoid arthritis. As research continues, we’re learning more about how IRAK4 works and its potential in new therapies, not just for cancer but for other diseases too.

We believe that artificial intelligence can be used in a variety of ways to help researchers in their work in small molecule drug discovery for protein targets like IRAK4. And because of this fairly recent research that has been going on in the importance of IRAK4 in these diseases and the seriousness of these diseases, it is imperative to use all available tools to develop drug therapies as quickly and as safely as possible for patients.


Wednesday, July 24, 2024

Unleashing the Power of a Million Experts: A Breakthrough in Language Model Efficiency

Perhaps one of the most important papers for large language models (LLMs) has been released this month titled: "Mixture of a Million Experts" by  Xu Owen He. I think this paper may be the most important paper in LLM advancement since the publication of "Attention is All You Need" by Vaswani et al (2017). The idea in this paper is what I was referring to in my recent post called "Move 37" where I talked about the needed future possible improvements to LLMs.

The idea of a million experts is an extension or an improvement over current "Mixture of Experts" (MoE) architectures. MoE has emerged as a favored approach for expanding the capabilities of large language models (LLMs) while managing computational expenses. Rather than engaging the full model for each input, MoE systems direct data to compact, specialized "expert" components. This strategy allows LLMs to grow in parameter count without a corresponding surge in inference costs. Several prominent LLMs incorporate MoE architecture and it reportedly is being used in GPT-4.

Despite these advantages, existing MoE methods face constraints that limit their scalability to a relatively modest number of experts. These limitations have prompted researchers to explore more efficient ways to leverage larger expert pools.

Xu Owen He from Google DeepMind introduces an innovative approach that could dramatically improve the efficiency of these models while maintaining or even enhancing their performance. Interestingly, the "Attention" paper also came out of DeepMind.

The historical problem of LLMs is that as these models grow larger, they become more capable but also more computationally expensive to train and run. This creates barriers to their widespread use and further development. The paper proposes a Parameter Efficient Expert Retrieval (PEER) architecture that addresses this challenge by enabling models to efficiently utilize over a million tiny "expert" neural networks, potentially unlocking new levels of performance without proportional increases in computational costs. 

These fine-grained mixture of experts, unlike traditional approaches that use a small number of large experts, PEER employs a vast number (over a million) of tiny experts, each consisting of just a single neuron. By activating only a small subset of experts for each input, PEER maintains a low computational cost while having access to a much larger total parameter count. It accomplishes this by introducing a novel product key technique for expert retrieval, allowing the model to efficiently select relevant experts from this huge pool. 

The implications of this architectures are far reaching:

  • Scaling Language Models: PEER could enable the development of much larger and more capable language models without proportional increases in computational requirements. This could accelerate progress in natural language processing and AI more broadly.
  • Democratization of AI: By improving efficiency, this approach could make advanced language models more accessible to researchers and organizations with limited computational resources.
  • Lifelong Learning: The authors suggest that this architecture could be particularly well-suited for lifelong learning scenarios, where new experts can be added over time to adapt to new data without forgetting old information. Imagine a model that has no knowledge cutoff. It is constantly learning and knowledgeable about what is going on in the world. This would open up new use cases for LLMs.
  • Energy Efficiency: More efficient models could lead to reduced energy consumption in AI applications, contributing to more sustainable AI development. These models could help reduce inference cost and money.
  • Overcome model forgetting: With over a million tiny experts, PEER allows for a highly distributed representation of knowledge. Each expert can specialize in a particular aspect of the task or domain, reducing the likelihood of new information overwriting existing knowledge.

Of course this is still early-stage research. Further work will be needed to fully understand the implications and potential limitations of this approach across a wider range of tasks and model scales. But this paper represents a significant step forward in improving the efficiency of large language models. By enabling models to efficiently leverage vast numbers of specialized neural networks, it could pave the way for the next generation of more powerful and accessible AI systems. 

Wednesday, July 10, 2024

Bridge RNAs: The Next Frontier in Precision Genome Editing Beyond CRISPR

In a very interesting new paper that was just published in Nature titled “Bridge RNAs direct programmable recombination of target and donor DNA” by Durant et al. (2024), they introduce what looks like a groundbreaking discovery in genetic engineering that uses a new class of non-coding RNAs (ncRNAs) called bridge RNAs that enable programmable DNA recombination. This expands the capabilities of nucleic-acid-guided systems beyond existing technologies like CRISPR.

This study reveals that IS110 insertion sequences, which are minimal mobile genetic elements, express structured ncRNAs that specifically bind to their encoded recombinase. These bridge RNAs contain two internal loops that base-pair with target and donor DNA, facilitating sequence-specific recombination. This discovery is particularly significant because the target-binding and donor-binding loops of the bridge RNA can be independently reprogrammed, allowing for programmable DNA insertion, excision, and inversion. 

I think this discovery of bridge RNAs as programmable tools for DNA recombination could be the next big thing in genome editing, taking us beyond what RNA interference (RNAi) and CRISPR have done so far. Bridge RNAs allow insertion, removal, or flipping DNA sequences without breaking the DNA strands, which means fewer mistakes and a more stable genome.

What would the future of genetic tools be building on this concept look like? Complex DNA changes could be designed with comparably high precision, combining bridge RNAs with other mechanisms to not only edit genes but also control their activity. For example, genes could be turned on or off or even tweaked their expression levels, giving a powerful way to study how genes work and develop new therapies.

Speculating even further, these new tools might also interact with more than just DNA. Think about targeting RNA transcripts to edit RNA sequences or modulate RNA splicing, or even interacting with proteins to change their activity or where they are in the cell. This would open up a whole new world of possibilities.

In medicine, this third generation of RNA-guided tools could lead to new treatments for genetic diseases. By making targeted and reversible changes to the genome and transcriptome, it could create more effective and personalized therapies with fewer side effects. There could also be an improvement in the safety and efficacy of advanced cell and gene therapies by controlling genomic rearrangements and gene expression more precisely.

Overall, these new RNA-guided tools could revolutionize genome engineering, offering new possibilities for research, biotechnology, and medicine. By building on the principles of RNAi, CRISPR, and bridge RNAs, it could be possible to manipulate biological systems with greater accuracy and flexibility, leading to innovative applications and groundbreaking advancements.

Tuesday, June 25, 2024

Move 37

I was asked recently while I was working on some autonomous agents if in the course of their interactions they would come up with new ideas. I immediately thought well - Move 37 (I'll get to what I mean by that shortly). And I said I think these agents will be creative to the extent that based on how I have them set up and the nature of their "back stories" I had given them. But that brought up ideas I have thought a lot about which is the broader questions of can an AI and specifically LLM based models alone create novel insight? Can AIs do research? Can AIs do scientific discovery? How creative can AIs be? Can AIs improve themselves based on novel insights about themselves and objectives they need to pursue? 

Let's take it further. Can AIs start the equivalence of the Scientific Revolution that occurred in the 16th and 17th century? Can AIs create a paradigm shift akin to something like what Thomas Kuhn described in his classic book The Structure of Scientific Revolutions? If you were to give a generative AI model training data that only consisted of pre-Renaissance art would it be able to make the jump in linear perspective to create paintings that would have volume and buildings and landscape that would recede with distance? The first painting below is "The Madonna and Child in Majesty Surrounded by Angels" (1280) by Cimabue. The second and third paintings are examples of paintings that have the perspective associated with the Renaissance: "The Tribute Money (1426-1427) by Masaccio and "The School of Athens" by Raphael (1510-1511). Could an AI paint like Raphael if it had only been trained on data before the Renaissance?



So what is Move 37? 

After the conversation I had about autonomous agents and creativity, I started thinking about the debate about LLMs being able to do novel scientific discovery. And Move 37 is the first time I had ever seen a machine do something truly creative and emergent. So again what is Move 37? Move 37 refers to a move in game 2 in 2016 between Google's AI Go program named AlphaGo and Lee Sedol of South Korea a 9 dan player and one of the best, if not the best, player in the world at the time.

There is a great documentary on YouTube about this match and if you go to minute 49:00, you can see this move and the complete shock this move causes. A move that "no human would make" that at first looks like a mistake to all of the commentators. A move that even AlphaGo said that there was only a 1:10000 probability that a human would make that move. 

Humans would not make this move based on their intuition of what was and was not a good move in this position. AlphaGo came up with this move not because it was a strategy directly programmed by the Google team, but because of reinforcement learning and self play, it was able to see this move as the best move to meet it's objective of winning. Interestingly, as pointed out in the movie, AlphaGo is not trying to maximize its lead at specific moments, but instead is looking for its best chance of winning, even if that win is by a small margin. So sometimes what looks like a passive move, is actually the best move for the long term goal. 

This scenario played out again in 2017 with Google's program named AlphaZero in chess, which was a variant of AlphaGo to be more generalized across different types of games including chess. A match was set up to play the strongest computer chess program at the time which was called Stockfish. AlphaZero made many surprising moves. Often it would play moves that initially looked passive, eschewing what was thought to be the best move, but when it did this type of move it led to a crushing vise like position over time. One commentator looking at the games afterwards said that when it made these types of unexpected moves was when as its opponent one should be the most afraid. And just like in Go, AlphaZero would give up temporary material advantage for its long term goal which was to simply win. Sacrificing material for an attack or a positional advantage is part of chess, but AlphaZero took that to a different level. Like AlphaGo, AlphaZero would decline going after an immediate advantage in position or material and take a smaller advantage if that smaller advantage had a greater possibility to win. It had a long horizon view. 

AlphaGo accomplished this feat through the three main ideas of reinforcement learning, Monte Carlo Tree Search, and deep neural networks. AlphaGo Zero begins with a neural network with random weights to represent both the policy and value function. During self-play, it uses Monte Carlo Tree Search to simulate each move. After the rollout concludes, it gathers data from the play and minimizes a loss function that incorporates this data for both the policy and value function. This process involves policy evaluation and policy improvement, effectively performing a step of policy iteration. Over time, Alpha Go develops an exceptionally strong policy. 

Its self-play generated a vast amount of data without requiring human expertise. It essentially learned from its own games. Doing this allowed it to in effect explore different strategies and exploit the most successful ones and recognize these patterns. A great book for understanding this is by Max Pumperia and Kevin Ferguson called Deep Learning and the Game of Go. I highly recommend this book not for just learning how you can possibly program a two player game like Go, but maybe more importantly, because it teaches you the ideas of deep learning neural networks, reinforcement learning, and Monte Carlo Tree Search from first principles showing the math and the needed Python code. And by first principles, I mean that it shows the python code and math for everything like back propagation without abstracting to PyTorch, so you really understand what's going on. And it is very accessible. 

However, Alpha Go is an example of narrow AI - an AI trained for a specific task. But could a more generalized model like a large language model show broad creativity. I think by now most people would answer "yes" because they have seen it write poetry, long essays, music, and generate images. But there are different levels of creativity - there is creativity "influenced" from other sources and then there are truly original ideas. Truly creative ideas are novel and innovative - they are paradigm shifts, often at conflict with long held beliefs. They upset established dogma and re-align the patterns with which we see the world. The shift in art during the Renaissance is an example of one such shift.

For an LLM, its creating derivative writing, music, and art - it's being influenced by the data it's been trained on. There's nothing particularly wrong with that kind of creativity - it is fairly easy to identify influences in human created popular music. The difference is popular music is based directly on real human experiences while LLM works are based on a compressed projection of human experiences onto language. 

When this LLM creativity goes off the rails, it is derisively referred to as hallucinations. And some will argue that these hallucinations are equivalent to the ability to imaginatively create novel ideas. But these hallucinations are no Joan of Arc type of hallucinations based on inspiration. Instead it's based on the GPT architecture of a "temperature" setting as a hyperparameter  to control the randomness of generated text. A hyperparameter that influences the probability distribution of the next words in the sequence. Temperature adjust the softmax function applied to the logits (raw predictions) before sampling. Lowering the temperature will make the model more confident in its predictions, leading to more deterministic and potentially repetitive outputs. Increasing the temperature will make the model more less confident, leading to more diverse and creative text. This doesn't sound like Galileo type inspired discovery. 

In the short story "The Library of Babel" by Jorge Luis Borges, he describes a library of hexagonal rooms that go on forever and because of this vastness contain every combination of letters and symbols, which means they contain every possible text that could ever be written, including all existing knowledge, all false information, and by definition all nonsensical information. Of course LLMs are better than what Borges imagined or the Infinite Monkey Theorem - that a monkey given an infinite amount of time will create the works of Shakespeare. But given that an LLM is predictive and isn't producing random text can we be satisfied with its creativity? That it's not creating ideas like a human does directly from experiences? I'll argue here that it doesn't matter how an LLM creates an idea. What matters is the outcome - how good is the idea. If we accept that argument, then what is missing from current LLMs to make them better at creating ideas in, for example, scientific discovery?

I think there are three ideas that are ongoing that should greatly expand the possibilities for novel creativity and scientific discovery: 1) Improve existing LLMs 2) Incorporate ideas beyond transformers 3) Architectures of AI agents.

Improve Existing LLMs

It has become readily apparent to everyone that competitive pressures are driving the pace of innovation in LLMs since the advent of GPT 3.5 and subsequent models. As LLMs improve, they become increasingly capable not only of generating original ideas but also of expanding upon them. The power of scaling these models is somewhat controversial, with debates on whether these models have hit or will soon hit an asymptotic limit. However, the seminal paper "Scaling Laws for Neural Language Models" by Kaplan et al. (2020) has been quite prescient in this regard. In this paper, the authors investigated the scaling laws governing LLM performance, particularly concerning cross-entropy loss (a measurement of how well the model performs). They found that the loss scales predictably as a power law with model size, dataset size, and compute used for training, with trends spanning more than several orders of magnitude.

The current state of the art models use about 1-3 trillion parameters, but models will move to 10 trillion and 100 trillion parameter models and beyond. Moreover, much discussion is now on just not quantity of data but quality of data. Smaller models have been created that can rival or outperform models trained on larger data just by being trained on higher quality of data. So the goal is to not just train on larger and larger data sets but high quality data sets. To that end, companies have entered into contracts to get access to media company data. In recent months these deals have included Reddit, The Financial Times, Barron's, Wall Street Journal, New York Post, The Atlantic, with many more to come. This is often written up as a way for these AI companies to get access to data without having accusations of potentially using data without permission. While there may be some truth to that, the real value is getting access to more high quality data.

Something that is often stated is that we are running out of data or that we are hitting a data wall in training models and that we need synthetic data. This keeps getting repeated like it's a truism that everyone should accept. This is just not true. In reality, data is constantly being created every second on social media, broadcast media, IoT devices, communication tools, internal business data, government data, and everyday human interactions are creating data. What is really meant is that we ran out of easily downloaded and scraped public data. But there is an infinite amount of potential data out there for models to be trained on - data is constantly being created. What is true is that data acquisition is becoming more costly. Synthetic data can help reduce that cost, but the financial incentives are also there to get the more costly data.

But even with improvements due to scaling and better data, much of the improvements in models have been not in replacing the transforming architecture but in some "low hanging fruit" type adjustments and in continuous refinements of Reinforcement Learning Human Feedback (RLHF).

With all of these improvements in existing LLMs, we will see these models exceed their training data with creativity. In an interesting paper that came out last week by Zhang et al. (2024), it is shown that LLMs "transcend" their training data. In this paper, an LLM model was trained on only chess data, but the data was chosen such that the individual games were no higher than a specific level. After training, this model was able to play at a level higher than any of the training data - it transcended the training data. The requirements were that for the individual chess games it trained on, there was a diverse set of positions and styles represented and then for inference time the temperature was set to be low. The explanation for the transcendence is that the model was drawing on a data set that was essentially a mixture of experts at a certain level and it surpassed those experts by putting together the knowledge of weak, non-overlapping learners to create a stronger model. This is a similar effect to how bagging and boosting ideas work in machine learning algorithms like random forest and gradient boost. 

All of these types of improvements could lead to the discovery of novel and creative insights.

Incorporate Ideas beyond transformers

There are many ongoing efforts to augment or replace the limitations of transformer autoregressive based LLMs. Limitations such as hallucinations or a lack of fidelity, the increase in prediction time quadratically as context lengthens, and a lack of long term planning are particularly problematic when it comes to expectations of an AI to creatively come up with novel ideas. I'll mention three such efforts to overcome these limitations:

1) Mamba

Mamba is part of an emerging class of models known as State Space Models (SSMs), offering a promising alternative to transformers in large language models (LLMs). Unlike transformers, which utilize the attention mechanism, Mamba employs a control theory-inspired state space model to facilitate communication between tokens while retaining multilayer perceptron (MLP) projections for computation. 

Mamba is capable of processing long sequence lengths efficiently by eliminating the quadratic bottleneck present in the transformer attention mechanism and offers much higher higher inference speeds. It uses a selective state space mode, which dynamically compresses and retains relevant information, leading lower memory requirements.

Having a model that is faster and handles memory better while it also doesn't lose fidelity in longer contexts is absolutely necessary for creating innovation.

2) Q* or LLM Combined with Monte Carlo Tree Refinement

In order for an AI to come up with novel ideas in needs to do better planning than what is currently possible in LLMs. It can appear that an LLM can do planning, because if given a problem it can describe steps to solve that problem. And if you use autonomous agents or chain of thought you can get it to build on some of that simulated planning it can do. But this is not the type of predictive planning that humans can do.

What a current LLM can do is to simulate what a human does when they make intuitive decisions. But an AI needs to have the ability to do long horizon prediction. It is the difference between an LLM doing first order thinking and not being able to do second order thinking.

First order thinking involves making decisions based on immediate and straightforward cause-and-effect relationships, focusing on short-term outcomes without considering deeper implications or longer-term effects. It is a direct, linear approach that is often reactive and simplistic. In contrast, second order thinking involves a more complex analysis of potential consequences, considering both immediate and indirect impacts over the long term. This type of thinking is proactive and strategic, anticipating the broader implications and potential ripple effects of decisions to ensure sustainable outcomes. 

Back in November, at the time of the shakeup at OpenAI, there were rumors of a Q* implementation as part of new AGI capabilities in a model they were testing. Now the reports of this were almost certainly overblown as part of the hysteria that was occurring at OpenAI at the time. But the idea of a way for an AI to do long term modeling and the basics of Q* or a Monte Carlo Tree Search has been around for a long time to do more predictive planning.

The hiring of Noam Brown by OpenAI last year is significant as Noam Brown is a leader in developing algorithms in the past that did this exact kind of multiple step type of prediction that can do real planning and self improvement.

A couple of weeks ago an interesting paper came out called "Accessing GPT-4 Level Mathematical Olympiad Solutions via Monte Carlo Tree Self-Refine with Llama-3 8B: a Technical Report." This paper shows a very small model outperforming a state of the art model in a "Q*" type of way that has a Monte Carlo approach that incorporates human-like trial and error.

In the paper, they construct a search tree using iterative processes of selection, self-refine, self-evaluation, and backpropagation, employing an enhanced Upper Confidence Bound (UCB) formula. The algorithm significantly improves success rates on mathematical Olympiad-level problems, advancing the application of LLMs in complex reasoning tasks. 

This integration of MCTS with LLMs as demonstrated by the MCTSr algorithm offers a robust approach to second-order thinking and planning. By systematically refining and evaluating solutions, this method provides deeper insights into the long-term consequences of decisions, making it a valuable tool for complex reasoning and strategic planning across various fields.

This is the type of deep planning approach that could be used to solve complicated problems, and if necessary, solve them with potentially unique solutions.

3) Energy Based Models

There has been much work in what are called "energy based models." The main motivation behind this type of model is that other approaches arguably fall short in achieving human-like autonomous intelligence. Current systems are limited by their reliance on supervised learning and reinforcement learning, which require extensive labeled data and trials. Also, there's an inefficiency in learning from limited data and inability to generalize across different tasks.

Although there have been many researchers working on energy models, some of the most promising work has been done by renowned AI researcher Yann Lecun, the Chief AI Scientist at Meta. In the influential paper he co-authored titled "Introduction to Latent Variable Energy-Based Models: A Path Towards Autonomous Machine Intelligence", they describe a framework where the dependencies between variables are represented through an energy function. Unlike probabilistic models, Energy Based Models (EBMs) do not require normalization and can handle multimodal distributions. Training EBMs involves creating an energy landscape such that compatible data points have low energy while incompatible ones have high energy. These models introduce latent variables to capture information not readily available from the observed variables, allowing the system to handle uncertainty and structured prediction tasks. 

The paper discusses contrastive methods (which scale poorly with data dimension) and regularized methods (which limit the volume of low-energy space and are more scalable). Furthermore, it discusses a "Joint Energy Prediction Architecture" (JEPA) This model combines the advantages of EBMs and latent variable models, using two parallel encoders to learn representations of the input data. These representations are then compared and used to predict future states. These JEPAs can be stacked in a hierarchical fashion enabling hierarchical planning. 

This structured approach to learning and prediction, allows for both fine-grained and abstract representations of data, which is crucial for achieving the complex, adaptive behaviors required for autonomous intelligence. or future AI systems that can learn autonomously, reason, and plan in a manner similar to human intelligence.

Architectures of Agents

The final direction that can enable a greater level of novel research and creativity is agentic frameworks. Currently there are many agent frameworks such as Autogen, CrewAI, and LangGraph to name a few that have different levels of autonomy. Agents can be created who can collaborate and work autonomously. They can be given tools like access to various APIs and write code all in the pursuit of an objective.

With agents collaborating and discussing possible ideas and improvements to ideas, a greater level of creativity can be achieved. Autonomous agents is a very promising field, which is only going to get better as models improve.

A variation of collaboration of agents is a paper that just came out called "Mixture-of-Agents Enhances Large Language Capabilities" by Wang et al. (2024). This methodology is designed to harness the strengths of multiple large language models (LLMs) for improved performance in natural language understanding and generation tasks. The MoA framework constructs a layered architecture where each layer consists of multiple LLM agents, which iteratively refine responses by using outputs from the previous layer. This approach achieves state-of-the-art performance on benchmarks. The MoA framework is inspired by the Mixture-of-Experts (MoE) technique. It consists of multiple layers, each with several LLMs acting as agents. Each agent generates responses based on inputs from the previous layer’s agents. The process iteratively refines responses until a high-quality output is achieved. Two main criteria guide the selection of LLMs for each layer: performance metrics and diversity considerations. 

Interestingly, a phenomenon emerges called "collaborativeness", where LLMs generate better responses when given outputs from other models. This improvement occurs even if the auxiliary responses are of lower quality. Through this approach, significant performance gains were observed and can be applied to various LLMs without requiring fine-tuning.

It's a promising approach that is similar to ideas like chain of thought, but much more flexible.

Final Thoughts

In this discussion, I explore how advancements in AI technology can lead to groundbreaking insights. The progress in large language models (LLMs), with their increasing scale and data quality, demonstrates the potential for AI to generate original ideas and surpass its training data. Integrating techniques beyond transformers, such as state space models and energy-based models, offers promising solutions to current limitations like hallucinations and lack of long-term planning. Additionally, incorporating sophisticated planning approaches, like Monte Carlo Tree Search, allows AI to engage in more complex, second-order thinking, essential for true innovation.

Furthermore, the development of agentic frameworks enables autonomous AI agents to collaborate and refine ideas collectively, enhancing their creative capabilities. The Mixture-of-Agents approach exemplifies this, where multiple AI models work together iteratively to produce high-quality outputs, showing that collaboration among AI agents can lead to superior performance.

As these technologies evolve, AI’s role transitions from a supportive tool to a potential partner in discovery, capable of contributing novel insights and driving paradigm shifts. The potential for AI to achieve true creativity and novel scientific discovery I believe is possible. And when AI can achieve superior creativity that is when artificial super intelligence (ASI) becomes possible as then AI can create ideas to improve itself.

The future of AI in scientific and artistic endeavors is bright, with the potential to redefine our understanding of creativity and innovation.

So ... Move 37


Sunday, June 23, 2024

Thoughts on the Article: "My Last Five Years of Work"


An interesting article came out last month titled
"My Last Five Years of Work" by Avital Balwit who is Chief of Staff to the CEO at Anthropic. The article is a good companion piece to the paper that recently came out titled "Situational Awareness" by Leopold Aschenbrenner as they both concern the rapid change that AI will affect on society in a few short years - and that most people will not be prepared for it. 

The main point of her article is that advancements in AI are rapidly progressing towards a point where they may render many traditional forms of employment obsolete in just a few years. These systems will be capable of performing tasks previously reserved for humans, particularly in knowledge-based fields. And we as a society are going to have to grapple with the potential psychological and social impacts of widespread unemployment. Can people find happiness and meaning in a future where work is no longer central to their lives? Will AI cause existential crisis for people?

To further explore the implications of this shift, she writes that both too little and too much discretionary time can negatively impact a person's well-being, with moderate amounts being ideal and cites a paper by Sharif et al. (2021) that concludes that more discretionary time can be beneficial, or at least not harmful, if it is spent on social or productive leisure activities. Of course this would vary widely from individual to individual and also depend on how much they enjoyed traditional employment. And would depend on how a person utilizes free time, rather than the amount of free time they have. She gives examples of engaging in activities like exercising, spending time with family, and socializing that can lead to positive well being. 

However, the assumption being made here is that AI will not create new types of jobs. But every major technological change in history has created new jobs. The nature of what types of jobs are valued changes. Many of these types of jobs that are created over a long time period are simply not imaginable at the time of the innovation. Also, I believe AI will create some degree of new wealth and if AI's promises are fulfilled, a more equitable distribution of wealth which will result in an overall greater demand for goods and services - some of these goods and services will not be AI oriented.

If I were to speculate on what types of jobs would not be in danger in the first wave of an AI economic major shift, it would be physical jobs where a human touch is valued, and she makes a similar point in her paper. So even though an AI based robot could be created to replace many nursing tasks and an AI could have an incredibly empathetic demeanor, nursing, and the medical field in general, is an occupation where many people value a human connection.

I also think there will be an increased value placed on human created work, even when an AI could do something as well as a human or even better than a human. For example, Gary Kasparov, world chess champion, lost to Deep Blue in 1997. And since then chess computers have continued to get better and no one thinks that a grand master today is able to consistently beat the best computer. However, that hasn't ended the popularity of chess. In fact, chess is probably at its highest popularity now with sites like lichess and chess.com as people want to play other people who have the same skill level as them. And the greatest grand masters' games such as those of Magnus Carlsen are followed as closely as any great grand master in the past.

Another example I believe is creative work. AI created fiction, poetry, music, and soon film is something that has caused much consternation among creatives, but I believe that once the market becomes saturated with AI created work, humans will create a demand for work that can be verified as human created. Even though an AI can create art or fiction as well or better than an human, it was not created from specific human experiences and emotions that happened to real humans. I believe works that have autobiographical content in them will be especially highly valued - because those experiences cannot be artificially created.

Beyond these examples where people will have access to an AI generated product or service but instead opt, at least in some cases, for the more human experience, I think humans will create interactions that are particularly human in ways that might not be apparent yet.

Another issue I have is the timeline. I don't disagree with the prospect that AGI is very close that both Avital is making and that Leopold makes in his paper, but I do disagree with a vision of a rapid transition to a workless future driven by AI advancements in that it may not fully account for the intricate realities of implementing new technologies across diverse industries. 

Stuff just doesn't happen fast across an economy with complex interdependencies. Large company bureaucracies, supply chains, government red tape, costs of manufacturing re-tooling can be slower than expected even when there is an obvious advantage to a new technology.

Historically, the adoption of breakthrough technologies has been a slow and uneven process due to existing complexities and interdependencies within the economy. For instance, the Industrial Revolution saw the gradual spread of stream power and mechanized manufacturing over several decades, hindered by high costs and the need for worker retraining. Similarly, the transition from steam to electric power in factories took many years, with widespread use only achieved in the 1920s and 1930s. The adoption of computer technology, which began in the 1950s, only became widespread in offices by the 1980s and 1990s due to integration challenges and the need for new workflows. The internet, commercially available in the 1990s, took nearly two decades to become a ubiquitous business tool as supporting infrastructure and services developed slowly.

Even with the contemporary example of AI, despite significant advancements, its widespread adoption remains uneven across sectors up to this point constrained by integration challenges and ethical concerns. We would have to believe that the attainment of AGI, leading to the elimination of most employment in a few short years, would be able to sidestep these historical adoption challenges - which seems unlikely.

Okay, so I do think the article is very good and thought provoking and I do think we are in a period of accelerating technological change - which is very exciting. But I don't think that we will all be forced to take up surfing because there will be no work left to do.

PyCon 2024

 

The main days of PyCon 2024 was held in Pittsburgh over the weekend of May 17-19. This year I stayed at the hotel connected to the conference center, which makes everything much more convenient. The hotel was connected to the David L. Lawrence Convention center by a walkway over Penn Street. I will admit though that after getting to the hotel I did get lost trying to find this walkway. But this is pretty much par for the course as it takes about a day to get used to navigating the hotel and where everything is in the convention center.

Day 1, Friday, May 17th

On Friday, the first talk I attended was by Jeff Glass, who introduced interactive documentation with PyScript. Imagine having a REPL right inside your documentation! Jeff demonstrated how developers can integrate interactive Python sessions into any webpage using PyScript. This not only enhances user engagement but also makes learning more dynamic and practical. It was fascinating to see how tools like Sphinx, MkDocs, and readthedocs can be seamlessly integrated, making the examples in your documentation come alive. The concept of a fully serverless interactive learning session is something that I found particularly interesting. 

Next up was Swarna Swapna Ilamathy’s session on tackling the issue of dead code. She shed light on the importance of identifying and removing unused code to maintain project health and agility. The session was packed with practical solutions, focusing on Python libraries like Vulture and Deadcode. Swarna emphasized the significance of integrating these tools into CI/CD pipelines for streamlined dead code removal. 

By the time of the next talk, I had started to become comfortable with finding my way around the conference center. The next talk was about Ruff, a Python linter and code formatter written in Rust. I have attended talks before that talked about Ruff and was impressed with it. It promises to replace dozens of static analysis tools while being exponentially faster. The speaker, Charlie Marsh, took us through the internals of Ruff, highlighting specific optimizations and design decisions that contribute to its remarkable performance. Learning about how Ruff powers static analysis for major Python projects like NumPy and Pandas was inspiring. Even though Ruff is written in Rust, Charlie made the session accessible to everyone, focusing on the broader aspects of building performant developer tools. 

By this time, I was getting pretty hungry and headed down to the convention hall for lunch. They had a lot of great options. After I finished eating I wandered around the hall listening to the various exhibitors. The big names were there Snowflake, Microsoft, MongoDB, Microsoft, Nvidia, Google, JetBrains and many more. Most were doing scheduled quick demos like Microsoft doing demos on Github Copilot and Snowflake with Streamlit.

After lunch, the first session I went to was Sydney Runkle's talk on Pydantic. As a software engineer at Pydantic, she was well versed in talking about the optimizations available with Pydantic V2, now leveraging Rust for core validation logic. The performance improvements seemed substantial, with speedups ranging from 5 to 50 times compared to V1. Sydney shared a range of tips, from one-line fixes to larger design modifications, to get the most out of Pydantic. I think her focus on tagged unions for efficient validation of union types was particularly useful. 

Next up was a talk on game development by Esther Alter. This is a topic I'm not well versed in but it was fun and informative. She walked us through the process of developing a procedurally-generated monster collecting game, which she successfully released on Steam. Unlike typical game dev talks that focus on minimal examples, Esther covered a wide array of topics including fast development, robust testing, unique graphical effects, and engaging gameplay. Her insights into why Python is an excellent choice for game development were eye-opening, making me consider Python for future game projects.

The next talk I went to was by Valerio Maggio’s talk which expanded on Jeff Glass’s earlier session, exploring the use of PyScript for interactive documentation. Valerio emphasized the importance of good documentation in open-source projects and demonstrated how PyScript can enhance the interactivity of software docs. With features like multiple interpreters, a Pythonic API for the DOM, and non-blocking execution threads, PyScript offers a robust platform for creating engaging documentation. Seeing these features in action was particularly useful, showcasing how they can transform static documentation into a dynamic learning environment.

The final talk of the day was by Juan Altmayer Pizzorno, who introduced SlipCover, a new tool for Python code coverage with minimal overhead. Traditional tools like coverage.py can slow down tests significantly, but SlipCover reduces this overhead to just 5%. 

Day 2,  Saturday, May 18th

Day two started with a session by Junya Fukuda, who talked about the intricacies of using asyncio for event-driven programming. Junya shared insights on how Python’s asynchronous capabilities are harnessed to streamline decision-making processes in robotics. The talk covered various asynchronous libraries, including trio, trio-util, anyio, and asyncio. It was cool to see how these tools can operate efficiently within a single thread, making them ideal for resource-constrained environments. 

As a heavy FastAPI user, I was very interested in Mia Bajić’s talk on combining Django ORM with FastAPI. Mia shared her unique experience of using Django ORM within a FastAPI application, a setup not commonly documented. She highlighted the practical implementation, benefits, and challenges of this hybrid approach. The session was aimed at developers of all levels and provided valuable insights into integrating asynchronous and synchronous frameworks. This talk made me think about some possibilities for leveraging the strengths of both Django and FastAPI in a single application.

Next up for me was a session by Sebastián Flores who presented an engaging session on enhancing presentations with Python and data storytelling. He showcased various tools and libraries, such as Jupyter Notebook, RISE, Streamlit, and Quarto, alongside graphic libraries like matplotlib, seaborn, altair, PowerBI, Tableau, and Vizzu. This talk was great in that it encouraged people to think beyond traditional PowerPoint slides. I personally think PowerPoint is one of those necessary evils but it is what everyone expects, but data communication should be so much more than talking over a bunch of slides.

It was now time for lunch, I quickly grabbed lunch on the convention floor and headed outside to explore downtown a little bit. I was particularly searching for a bookstore and had seen on my phone that there was a Barnes & Noble within a few blocks of the convention center that I think is associated with Duquesne University. I was looking for something to read on the plane ride home. Unfortunately, there was really nothing in the store. The basement had textbooks and supplies for the school and the top floor was just some shelves of books that were discounted. So I left empty handed. I'm sure there's a bookstore somewhere downtown if I had more time to find it.

After lunch I decided to go to Jodie Burchell’s session on large language models (LLMs) addressed the critical issue of hallucinations in AI. I honestly almost skipped this and went to a talk about using LLMs as part of graph flow, but the talk had a lot of interesting information she had compiled about the statistics of types of hallucinations. Jodie explained the reasons behind these hallucinations and introduced tools like TruthfulQA, Hugging Face’s datasets, transformers packages, and LangChain to measure and mitigate them. The talk also covered retrieval augmented generation (RAG) as a technique to reduce hallucinations and improve the reliability of LLMs.

I'm very familiar with the set of data tools in the next talk and am a big fan of both Dask and Polars. Patrick Hoefler’s talk compared the latest developments in Pandas and Dask with other big data tools like Spark, DuckDB, and Polars. Patrick highlighted the improvements in Dask, including a new shuffle algorithm, a logical query planning layer, and a more efficient data model thanks to Pandas 2.0. The session used TPC-H benchmarks to demonstrate the performance gains and robustness of the new Dask DataFrame. 

The next talk I went to was different. It was a fun mix of art and mathematics. Alastair Stanley demonstrated how the principles of paper folding can be used to perform complex calculations, from basic arithmetic to solving cubic equations and proving irrationality. Using a custom Python library to simulate fold sequences, Alastair showed the surprising power of origami in computational contexts. This session was very cool.

The day concluded with Naveed Mahmud’s session on hybrid quantum-classical machine learning using Qiskit. I have had an interest in quantum computing for years and in particular Qiskit, so I was very interesting to see Naveed's use. Naveed explained how to build quantum circuits and integrate them with classical models for tasks like text classification and sentiment analysis. I thought sentiment analysis was an interesting test case for quantum computing and machine learning.

Day 3, Sunday, May 19th

The first talk I attended on the third day was by Alla Barbalat, who tackled the critical issue of intellectual property (IP) in the context of AI-generated code. As AI becomes more prevalent in code generation, the question of ownership arises. Alla addressed the legal implications of using open-source code as training data and the potential infringement of IP rights by AI systems. This talk was particularly enlightening, as it demystified the complex legal landscape surrounding AI and IP law. 

Next up was Saksham Sharma’s session on optimizing Python performance with Cython. It was a deep dive into the practicalities of speeding up Python code. Saksham demonstrated how to write Python data analysis code in Cython, explore the generated code, and run microbenchmarks to understand performance bottlenecks. It was interesting to see how simple hints to Cython can significantly enhance performance. 

The final session I went to was Arun Suresh Kumar’s talk on synchronous and asynchronous programming in Python was highly informative. Arun explained the key differences between sync and async programming, explored tools like Uvicorn and uvloop, and provided benchmarks to compare their performance. The session also covered the evolution and importance of ASGI and WSGI servers in Python web development. Practical insights from real-world scenarios highlighted effective strategies for both sync and async programming. 

At that point, it was time to head back home, but not before picking up my PyCon 2024 t-shirt. I don't think I have a purple t-shirt, but I really like this one.


Wednesday, June 19, 2024

Ilya Sutskever's New Company

Ilya Sutskever's new company announced today - Safe Superintelligence Inc. Ilya, as a cofounder at OpenAI, is arguably one of the biggest visionaries around scaling and AI. This one page site has to be up there as one of the best company launches ever. One page with no styling that looks like it came out of the 90's with a mission to create "safe" Artificial Super Intelligence.

Thursday, June 13, 2024

Thoughts on the Paper "Situational Awareness" by Leopold Aschenbrenner

So if you haven't read the paper Situation Awareness: The Decade Ahead by Leopold Aschenbrenner you should. Everyone should. This is not to say you need to agree with every point he makes, but by reading papers like this it will make you more mentally prepared to think about the accelerating period of change that we are in. We are going through a technological shift of rapid change that people are finding difficult and are not planning for it accordingly. Because most humans are resistant to change or if they do have expectations of change it's gradual change or linear change and are not prepared for the explosion of change that is possible with the advent of AGI or artificial super intelligence (ASI). There is too much thinking that is looking at the current state and believing that upcoming changes will be gradual. And that what we have will be improvements to "chatbots" and building wrappers around GPT when the reality is that people need to start thinking about AI becoming "agentic" and the impact of "drop in" technical workers and every part of the landscape changing. So it's okay to disagree with parts of his reasoning or emphasis, but this paper does serve the purpose for helping realign mindsets. 

In the paper, he is predicting AGI by 2027 and ASI by 2028-2030. 

But before we get into his predictions, who is Leopold Aschenbrenner? He graduated from Columbia University at the age of 19 as valedictorian. Worked at OpenAI until being fired for "leaking" information - those back and forth allegations between them are debatable, but both sides were probably at fault or it could have been handled better. But regardless, since then he has been a prolific writer on the long term future of AI and has founded an investment company focused on AGI. Here is a very good recent interview.

So the prediction of AGI by 2027, I think the prediction of around 2027 for AGI, give or take a year, is very reasonable. Part of the issue of any prediction of AGI, is that the definition of AGI has always been pretty fuzzy. I'll save my thoughts on a discussion of the definitions of AGI for another post, but suffice it to say there is no definitive test for something being AGI, but what I expect is that around 2025-2026 what many people will be experiencing when working with these models is that they "feel" like AGI. Then there will be more improvements to the point that around 2027 a majority of reasonable people will be agreeing that AGI is here.

What I do disagree with is that after AGI arrives that almost immediately ASI models will start to show up. This is the hard takeoff scenario that many people talk about. It seems like to me there is going to have to be several breakthroughs needed before ASI, which might possibly could be in a year after AGI, but more likely much longer - maybe 10 - 20 years after AGI.

Regardless, Aschenbrenner predicts an "intelligence explosion" where AI systems rapidly evolve from human-level intelligence to vastly superhuman capabilities. He says this transition could happen extremely quickly, potentially within less than a year. If true, the implications would be staggering, suggesting a world where AI can automate AI research, compressing decades of progress into mere months.

This superintelligence could be "alien" in its architecture and training processes, potentially being incomprehensible to humans. This could lead to scenarios where humans are entirely reliant on these systems, unable to understand their operations or motivations.

He talks quite a bit about orders of magnitude "OOMs" to project progress, but my disagreement here is that technological progress of any kind is never smooth curve fitting. There are often plateaus, and sometimes these plateaus can be quite long.

He also makes one of his more astonishing economic claims of the projection of trillion-dollar compute clusters by the end of the decade. This massive industrial mobilization would involve significant increases in U.S. electricity production and the construction of large-scale GPU datacenters, which begs the question of energy as a limiting factor to all of this, which is a point Mark Zuckerberg has recently made

As part of OpenAI's "super alignment" team he worked on ways of understanding on how the models are working internally and how they can be aligned to meet human objectives. He is of the firm belief that "super alignment" is possible. Much has been made of the abandonment in OpenAI of their alignment team with people like the brilliant Ilya Sutskever and Jan Leike leaving. I suspect that, yes they left because of perceived lack of commitment of OpenAI management to alignment and wanting more resources than what they were given, but I think the realization is that something like "super alignment" is not possible. I base this on reading some of their work and listening to them talk about what they had done for "super alignment" and it doesn't seem like much progress. And I know their group was formed with the mission that "super alignment" was a multi-year project, but still the progress doesn't seem like it was keeping pace with their expectations or the pace in general around the models. I'm not disagreeing that alignment isn't possible or guardrails needed, but some idea of "super alignment" may not be possible and that was the source of some of the conflict in OpenAI. So I think he is still overly optimistic in this paper about alignment and especially about achieving something like "super alignment." 

The paper also underscores the importance of securing AI labs and their research from state-actor threats. He believe that currently leading AI labs treat security as an afterthought, leaving critical AGI secrets vulnerable. The effort to secure these secrets against espionage and other threats will be immense. Along these lines, he explores the geopolitical ramifications of superintelligence. The economic and military advantages converred by superintelligent systems could be decisive, raising concerns about an all-out race of even conflict with authoritarian regimes like China. The survival of democracy could hinge on maintaining technological preeminence and avoiding self-destruction. 

He suggests that American AI labs are currently vulnerable, with inadequate security measures akin to "random startup security." The notion that Chinese intelligence could easily infiltrate these labs and exfiltrate critical AGI research is presented as a significant risk. Not to dismiss the threat of China, but he has been criticized for hyper focusing on China, when there are no shortage of governments who could be bad actors. But this is probably the best articulation I've read on the possible security threats.

He then continues this line of thought by saying that as the race to AGI intensifies, national security will become increasingly involved. By 2027/28, a government-led AGI project may emerge, as no startup can handle the complexities and risks associated with superintelligence. This scenario will likely involve unprecedented levels of mobilization and coordination within the US national security apparatus.

The paper foresees a scenario where the U.S. government takes over AGI projects, forming a "Manhattan Project" for AI by 2027/28. This would involve national security forces and intense governmental oversight to ensure the United States remains a leader in AGI development, securing AI research against espionage and potential misuse. However, I think this is really optimistic on how quickly government can move to do anything, let alone something complicated and technical like AI. I do believe the idea of AGI will become a very political issue and there will be "populist" politicians who will try and take advantage of it as a divisive issue especially to capitalize on people's fear of change - fear that will be very rational when supported by employment upheavals. But the idea that government will move at the same pace as the accelerating speed of AI seems naive.

Whatever points in this document you may agree with or disagree with, it's to be commended for its amount of detail and how thought provoking it is in considering both the extraordinary potential and the significant risks. What is important is that it challenges us to consider the rapid pace of AI development and the profound changes it could bring to society, security, and global dynamics. While some of the claims may seem far-fetched, they underscore the importance of proactive measures and robust strategies to navigate the coming period of AI advancements.

Wednesday, June 12, 2024

GPT-4o and Google I/O 2024

It's been a few weeks since both the OpenAI announcements and Google I/O which has given us some time to step back and assess the big announcements at both these events.

Google introduced several new AI products and features, including an AI-powered search engine, AI helpers in Workspace apps (Gmail, Drive, Docs, etc.), and a future AI vision called Project Astra. Gemini, Google's AI model, was featured prominently with updates like Gemini Live, Gemini Nano, and Gemini 1.5 Pro, showcasing capabilities in voice interaction, image analysis, and on-device AI functionalities. 

They also announced an expanded context window of 2 million tokens that will be rolled out at some point. Right now the context windows is 1 million tokens which is much larger than either OpenAI's or Anthropic's largest token windows of 128,000 and 200,000 respectively. With 2 million tokens that would be about 64,000 lines of code, 1.4 million words, or a couple hours of video. In a blog post in February, they talked about internally testing a 10 million context window.

If these large context windows can keep track of all of that information without having to do retrieval augmented generation (RAG), then obviously they have major implications from everything code development to video summarization. For example, a developer tasked with a large feature could give a very large code base, design documents, and even mock ups and have it create the code for that major feature with the developer acting more as a code reviewer. And to work with video, large context windows are a necessity.

However, this expansion of the context window is only available in the Gemini 1.5 Pro model for now and then in private preview and then generally available later in the year. And there's the main problem with most of Google's announcements in that they won't be available until some time later in the year. With so many announcements happening, that's far too distant in the future to be touting most of your new AI features.

OpenAI's announcement mainly centered around their multi-modal capabilities and the speed of GPT4o. After using it regularly since the announcement, it does seem to be slightly better than GPT4-Turbo. Having it be able to work across voice, audio, and text with the same API endpoint is very welcome, instead of having a different API model for vision. I have also been using the released Mac App which is a nice convenience.

By this point, everyone has probably seen the voice and vision capabilities of GPT4o, which were crazy. The multi-modal capabilities because it's trained on sound, images, and text together means things like not just improved speed means variations in sound within voice context and understanding emotion and sound/voice speed. Like everyone with OpenAI, they don't say exactly how they accomplished, but it clearly looks like a different architecture from GPT 3.5/4.0. To get a rough idea of how this is probably being done we can look at some other projects like Chameleon from Meta that uses "early fusion" to train on different modalities.

However, this new voice capability wasn't released at the announcement, but it's supposed to be this month. It's half way through June and it's not here yet, but I'm very much looking forward to it when it is released. I currently have code that's based on the existing models that do voice->text->voice, which caused some delay and getting rid of that delay would be a big deal. Of course, it opens up a lot open possibilities beyond just being faster. 

With both Google's and OpenAI's big announcements these are exciting times to be working in AI - and that's not to mention the great things happening at Anthropic and in open source. What will happen in the next six months? Who knows? But I expect Google to release what they previewed at IO and OpenAI to release the voice features. Will OpenAI release a new model before the end of the year? The rumored 5.0 or maybe they call it 4.5? Or they release incremental improvements to 4.0o. No one knows for sure, but I believe OpenAI will release a new model by October or November that is a significant jump in capabilities.

Oh, and that's not to ignore the partnership between OpenAI and Apple - "Apple Intelligence" and everything that partnership means, which immediately caused Elon Musk to throw a tantrum. But that would be a topic for another post.

Tuesday, March 12, 2024

Build an Open Source Research Chat Assistant with Ollama and RAG

In my previous post titled, "Build a Chat Application with Ollama and Open Source Models", I went through the steps of how to build a Streamlit chat application that used Ollama to run the open source model Mistral locally on my machine. Refer to that post for help in setting up Ollama and Mistral. In this post, I will extend some of those ideas and show how to create a "Research Assistant" using Ollama, Mistral, RAG, LlamaIndex, and Streamlit.

This application will have two parts:

  1. Document retrieval: I will build a page that will use the arXiv repository API to pull the most relevant documents for a topic into a vector based index using LlamaIndex.
  2. Document chat: Based on all of the documents that have been pulled into the vector database, I will build a chat interface page that allows the user to chat on topics that are in the database using either Mistral or OpenAI - the user will be able to pick which LLM they want to use to chat with all of the documents that have been built up in the database.
Here are screenshots of the two pages in the application:

Document Chat


Data Acquisition (downloads data from arXiv)



But first what is RAG?

Retrieval Augmented Generated systems (RAG) are AI systems that enhance an output's relevance and accuracy by combining the strengths of large language models. The basic idea behind retrieval augmented generation is to enhance the language model's output by retrieving and incorporating relevant information from a large corpus of text or knowledge base. This approach aims to address the limitations of traditional language models, which can sometimes generate factually incorrect or inconsistent text due to their limited knowledge or understanding of the world.

The key advantages of retrieval augmented generation include:

  • Improved factual accuracy and consistency - reducing hallucinations: By incorporating relevant information from external sources, the generated text is more likely to be factually accurate and consistent with real-world knowledge.
  • Enhanced knowledge coverage: The model can leverage a vast amount of information from a knowledge base, effectively expanding its knowledge beyond what is encoded in a language model.
  • Adaptability: The retrieval can be tailored to specific domains or knowledge sources, allowing the model to generate text that is relevant and accurate for a particular domain or task.
  • Overcoming a model's training cutoff date: Language models have an effective cutoff date that they have been trained on and cannot respond accurately on events that happened after that date. By using RAG with new documents, the LLM can have access to knowledge past its cutoff date.

In this application I will use LlamaIndex to implement RAG. LlamaIndex is great at ingesting data from a wide variety of sources (PDFs, Word files, images, audio, PPT, etc.). LlamaIndex has a very convenient function called SimpleDirectoryReader that can read through all of the files in a directory and if it is one of the many files it can load it will load it. These files will be stored as vector based embeddings. From the LlamaIndex documentation:

"Embeddings are used in LlamaIndex to represent your documents using a sophisticated numerical representation. Embedding models take text as input, and return a long list of numbers used to capture the semantics of the text. These embedding models have been trained to represent text this way, and help enable many applications, including search!"

But before we can embed some documents to search and chat with, we need to get some documents for our database. This is the data acquisition page from above. On this page, the user will enter a topic such as "Mamba in AI" and the code will use the arXiv API to download the most relevant and recent PDF documents. arXiv is a repository of several million scholarly documents from everything from computer science to physics.

The code will then create embeddings for those documents and make those embeddings available to chat with. In this application, I am using OpenAI embeddings (text-embeddings-ada-002) and QDrant embeddings. The QDrant embeddings will be used with Mistral. In a real application you would most likely only be doing one kind of LLM with one kind of embedding. But for illustration purposes, I'm doing both OpenAI and Mistral. QDrant is an open source set of embeddings and the LlamaIndex documentation for QDrant can be found here. So if you want to use a completely open source version of both the LLM and the embeddings and not have to worry about token pricing then using Mistral and QDrant is one of many possible options.

The code to pull the documents for the topic from arXiv is in "research.py." It uses a python module to make using their API a little easier appropriately called arxiv.

pip install arxiv

research.py:

''' Get papers from arXiv using the arXiv API '''

def get_arxiv(query, num_documents):

  search = arxiv.Search( query = query, max_results = num_documents, sort_by = arxiv.SortCriterion.Relevance, sort_order = arxiv.SortOrder.Descending )    

  titles = [] summaries = [] authors = [] published = [] links = []

  for result in search.results():    

    titles.append(result.title) summaries.append(result.summary) authors.append(', '.join(author.name for author in result.authors)) published.append(result.published) links.append(', '.join(str(link) for link in result.links))

    result.download_pdf(dirpath="./documents")

  df = pd.DataFrame({'title': titles, 'summary': summaries, 'authors': authors, 'published': published, 'links': links})

  df = df.sort_values(by='published', ascending = False) df = df.reset_index(drop=True)

  if os.path.exists('documents.csv'):

    df.to_csv('documents.csv', mode='a', header=False, index=False)

  else:

    df.to_csv('documents.csv', index=False)    

  return df

The code uses the arxiv "Search" function to get the most relevant articles based on the user inputted topic and number of documents to retrieve and store those documents in the documents folder. I then write the meta data for the articles appending it to a "documents.csv" file. One change that you could make here is to instead store this metadata in a database.

This arxiv function is called from the Streamlit UI page: "1 - Data Acquistion.py." This Streamlit page will ask the user for a topic and the maximum number of documents to retrieve. After receiving the documents from the arxiv function, it will create the embeddings and the Llamaindex client (query_engine) in "client.py."

1 - Data Acquistion.py:

import streamlit as st import pandas as pd import os

from pages.utilities.research import get_arxiv from pages.utilities.client import get_mistral_query_engine, get_gpt_query_engine

if __name__ == "__main__":

    st.set_page_config(layout="wide") st.title('Research Assistant')

st.divider()

    with st.sidebar:

        max_documents = st.number_input("Max number of documents:", value=10)  

    topic = st.text_input('Research Topic:')

    with st.spinner('Thinking...'):

        if len(topic) > 0:                            

            get_arxiv(topic, max_documents)      

        if os.path.exists('documents.csv'):

            df = pd.read_csv("documents.csv")             df = df.drop_duplicates(subset=['title'])

            st.dataframe(df)

        if topic:

            try:                 query_engine = get_mistral_query_engine(True) query_engine = get_gpt_query_engine(True)

            except:                 pass

    

client.py:

import streamlit as st import os import qdrant_client

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader from llama_index.core import (      load_index_from_storage, ServiceContext)

from llama_index.llms.ollama import Ollama from llama_index.core.storage.storage_context import StorageContext from llama_index.vector_stores.qdrant import QdrantVectorStore

@st.cache_resource def get_mistral_query_engine(data_changed):

    llm_model = Ollama(model="mistral") collection_name = "storage"

    if 'qdrant_client' not in st.session_state:

      st.session_state.qdrant_client = qdrant_client.QdrantClient(path="./qdrant_data")   

    if 'vector_store' not in st.session_state:

      st.session_state.vector_store = QdrantVectorStore(client=st.session_state.qdrant_client, collection_name=collection_name)

    if 'service_context' not in st.session_state:

      st.session_state.service_context = ServiceContext.from_defaults(llm=llm_model, embed_model="local")

    if 'storage_context' not in st.session_state:

      st.session_state.storage_context = StorageContext.from_defaults(vector_store=st.session_state.vector_store)

    qdrant_persist_dir= "./qdrant_data/collection/storage"

    if not os.path.exists(qdrant_persist_dir) or data_changed:                      

documents = SimpleDirectoryReader("documents").load_data()

      index = VectorStoreIndex.from_documents(documents,                                    service_context=st.session_state.service_context,                                    storage_context=st.session_state.storage_context)

      index.storage_context.persist()     

    else:

      index = VectorStoreIndex.from_vector_store(vector_store=st.session_state.vector_store, service_context=st.session_state.service_context)

    query_engine = index.as_query_engine(streaming=False)

    return query_engine

@st.cache_resource def get_gpt_query_engine(data_changed):

  gpt_persist_dir = "./storage"

  if not os.path.exists(gpt_persist_dir) or data_changed:  

    documents = SimpleDirectoryReader("documents").load_data() index = VectorStoreIndex.from_documents(documents) index.storage_context.persist()

  else:

    storage_context = StorageContext.from_defaults(persist_dir=gpt_persist_dir) index = load_index_from_storage(storage_context)

  query_engine = index.as_query_engine() 

  return query_engine

There are two functions: one for GPT (get_gpt_query_engine) and one for Mistral/QDrant (get_mistral_query_engine). The GPT function is the more straightforward of the two. If the data has changed (which it has when a user has gotten more documents with a new topic), SimpleDirectoryReader will read all of the documents in the "documents" folder, index them, and persist that vector database to the "storage" folder. If the data has not changed, it will not try and re-index all of the documents again, but use the existing vector database. We only want to re-index when necessary - saving time and tokens.

For Mistral/QDrant we follow the same pattern with a couple of exceptions. One, we will be saving the indexed data in a "qdant_data" folder. Second, we will be using QDrantClient and QDrantVectorStore to be doing our vector embeddings. The code around "data_changed" is important because we only want to get one instance of the QDrantClient per session - if we don't we will get errors.

Both functions return a "query_engine" that can be used the same way to chat with our vector database of documents.

In our "main" program that has the chat interface called "Reseach Assistant.py" we can use "query_engine."

Research Assistant.py

import streamlit as st

from pages.utilities.client import get_mistral_query_engine, get_gpt_query_engine

if 'message_list' not in st.session_state:
  st.session_state.message_list = []    
      
if __name__ == "__main__":

    st.title('Research Assistant')
    st.divider()
    
    with st.sidebar:
      
      st.markdown('# Models')
      
      selected_model = st.selectbox('s', ['Mistral', 'GPT-4'], label_visibility='hidden')

    message = st.chat_message("assistant")
    message.write("Hello human!")
    
    prompt = st.chat_input("Ask a question")
    
    try:
      if selected_model == 'Mistral':
        query_engine = get_mistral_query_engine(False)
      else:
        query_engine = get_gpt_query_engine(False)
    except:
        with st.chat_message("assistant"):
          st.write(str("Error loading model. Please try again."))
          st.stop()
        
    for l in st.session_state.message_list:
                
      if 'user' in l:
        with st.chat_message("user"):
          st.write(l['user'])
      if 'assistant' in l:
        with st.chat_message("assistant"):
          st.write(l['assistant'])
        
    if prompt:
            
      with st.spinner('Thinking...'):

        response = query_engine.query(prompt)
                
        with st.chat_message("user"):
          st.write(prompt)
        with st.chat_message("assistant"):
          st.write(str(response))

        a = {
          "user": prompt,
          "assistant": str(response)         
        }
            
        st.session_state.message_list.append(a)

We can call query_engine.query with the user prompt, i.e. the question the user wants to ask of the database and it will use the chosen LLM/vector database to answer the question.

Let's try it out!

In the data acquistion page I've asked it to get documents related to Mamba in AI and separately to get documents related to QLoRa - both are relatively new topics in AI.


As you can see, it used information from our database and not information it had been trained on. We know this because Mamba came out in December of 2023 - after the training cutoff for both models.

Let's ask it a very specific question that our database knows about:


This is answer is not only correct, but it's about a paper that came out in January of 2024.

Even though it does everything we set out to do, there are several improvements that could be done to this application. For example, beyond the excellent ideas laid out in the LlamaIndex documentation, one obvious improvement would be to not re-index all of the documents each time a user enters a new topic and gets a new batch of documents, but instead index just the new documents and add those to the existing embeddings. Furthermore, you could store the meta data in a database and not in a csv file and also instead of pulling documents from arXiv you could have a drag and drop file dialog box that allows the user to add their own documents.

All of the code for this application can be found at this Github repository. Because this is a Streamlit application that uses pages, the directory structure is important and that structure can be seen in the repository. I also included in the repository a notes.md that includes the pip installs that need to be done.

And that's it!

We now have a Streamlit application that can retrieve documents on topics, build up a vector based embedding database, and allow us to chat with those topics in that database.

"Superhuman" Forecasting?

This just came out from the Center for AI Safety  called Superhuman Automated Forecasting . This is very exciting to me, because I've be...