Sunday, January 12, 2025

Elements of Monte Carlo Tree Search - Typical and Non-typical Applications

Monte Carlo Tree Search (MCTS) offers a very intuitive way of tackling challenging decision making problems. In essence, MCTS combines the structure of a tree search with the power of random sampling to navigate large, complex search spaces effectively. I've been interested in MCTS for a long time - this interest was cemented in 2016 by the victory of Alpha Go over Lee Sedol and the famous "Move 37." By the way, I wrote an entire post about Move 37 and the role of innovation in AI here. Move 37, a move that was thought at best to be "unique" but was at the moment the move was played thought to be a horrible move - a move that no respectable Go player would make. But the move was both effective and innovative. It was this balance between exploiting an immediated advantage and exploring the space for potentially even better solutions that gave Alpha Go the victory in a game that because of its complexity and need for imagination was thought impossible for AI. This exploitation vs exploration dynamic is the crux of MCTS and it along with neural network evaluations and reinforcement learning gave Alpha Go the victory.

In this post, I want to talk about the main ideas behind MCTS, where MCTS can be used, and then walk through the code for implementing a MCTS in a game of Connect 4. But if you want to skip right to playing the game: it is here and the Github code is here.

So what kind of applications is MCTS appropriate for? Historically the typical scenario is around games and specifically finite two-person zero-sum sequential games. In other words, in these types of games, there are two players competing directly against each other, the gain of one player is exactly the loss of the other (zero-sum), the game progresses through sequential moves, and the game has a finite number of states and actions, leading to a clear end.

These two player games are the typical application of MCTS, but I've long been interested in applications beyond these types of games. Any problem where there is a large search space, that has tree like decision nodes that operate sequentially under limited resource or time constraints. Any kind of optimization problem with these characteristics MCTS could be applicable. Robotics is an obvious example. A robot has to complete some objective, but there are many different paths to complete that objective. But how about some not so obvious applications. In education, MCTS could help in what I'm calling Personalized Learning Path Optimization (PLPO). A PLPO could as a student engages with learning activities customize a personalized curriculum for a student in an adaptive learning platform, balancing engagement, challenge, and skill acquisition. In this situation, the state represents the student’s current skill level, engagement, and performance history. Actions include presenting different types of learning activities (e.g., videos, quizzes, projects). The MCTS simulations predict the student’s response to a sequence of learning activities, optimizing for long-term improvement and retention.

In biotech, an example would be to optimize the sequence of biological reactions in a synthetic pathway for producing potential drugs. The states would represent the current pathway, including enzymes and intermediates. Actions involve adding, removing, or modifying reactions in the pathway. Then MCTS simulations predict yield, efficiency, and stability, optimizing for production goals and feasibility.

Another non-typical example I want to talk at length about is optimizing a customer journey designed for e-commerce platforms. In this example, the goal would be to guide consumers through a personalized journey (e.g., product recommendations, promotional offers, and website interactions) to maximize conversions, repeat purchases, and customer satisfaction.

I want to come back to this example of using MCTS in optimizing the customer journey, but first I want to go into a larger explanation of how MCTS works in the typical case of two person games.

MCTS combines the structure of a tree search with random sampling to navigate large, complex search spaces. Unlike traditional search methods that often rely on complete or near complete look-ahead, MCTS focuses on sampling potential outcomes and incrementally refining its estimates of how good or bad certain moves or actions are. This makes MCTS particularly appealing in domains where exhaustive searches become prohibitively large.

One way to understand MCTS is by tracing the iterative loop through four stages:

  1. Selection
  2. Expansion
  3. Simulation
  4. Backpropagation

Each of these stages contributes to building up a more accurate picture of the search space. During selection, the algorithm starts at the root of the search tree (representing the current state) and travels down along existing paths, choosing actions that optimize the balance between exploitation (picking actions that have worked well so far) and exploration (investigating actions that remain relatively untested). A common way to manage this trade off is by using the Upper Confidence Bound applied to Trees (UCT) formula. If \( N \) is the total number of visits to the current node and \( n \) is the number of visits to a child node, while \( \overline{X} \) is the current estimate of the child’s average reward, then the child node’s UCT value can be computed as:

\( \text{UCT} = \overline{X} + c \sqrt{\frac{\ln(N)}{n}} \)

\( \overline{X} \) = current estimate of the child’s average reward
\( c \) = constant controlling the exploration-exploitation balance
\( N \) = total number of visits to the current node
\( n \) = number of visits to a child node

Here a larger \( c \) encourages more exploration of unexplored moves. The goal of this formula is to favor actions that have high average rewards while still allocating some attempts to actions with few visits (because they might lead to surprising or better results once explored more thoroughly). This is how we get moves that might look to a bystander as "unusual" or "non-standard."

After reaching a leaf node, either a node that has not been explored yet, or one that is not fully expanded with potential child nodes, the algorithm proceeds to the expansion phase. In this stage, it creates one or more child nodes corresponding to unvisited actions. This expansion step gradually grows the search tree over time, ensuring that new parts of the state space are discovered rather than staying confined to what has already been visited.

The third stage, simulation, or you will often hear it referred to as the playout, is where the “Monte Carlo” part comes into play. From one of the newly added child nodes, MCTS simulates a sequence of moves (often random or guided by a light heuristic) all the way to a terminal outcome. For example, in a game, that outcome might be a win, a loss, or a draw. In a more general planning or scheduling context, it could be a success or failure, or a particular reward value. These random playouts, repeated many times, serve as samples of what might realistically happen if a particular action path is chosen.

Once the simulation is complete, the algorithm performs backpropagation to propagate the result of the simulation back up the tree to the root. Along the path taken in the search tree, each node’s statistics - like how many simulations passed through that node, how many ended in a favorable outcome, or the average return are updated. Over many iterations, these accumulated statistics become increasingly meaningful, revealing which branches of the search tree appear promising and which ones do not.

MCTS is successful in many applications because it doesn’t require a specially devised evaluation function for intermediate states. Given enough playouts, the sampling can approximate the true value of each action. This versatility is why MCTS has been popular in board games like Go, where it formed the backbone of AlphaGo’s decision making system. It is also employed in video game AI, robotics for path planning, and even scheduling problems; anywhere the decision space is complex and uncertain.

Yet, MCTS is not without challenges. Its effectiveness depends significantly on the quality of the playouts - because remember these playouts are taken place under a constraint, which is often a time limit. If the simulations are purely random in a very large or complex domain, they might produce misleading estimates. For this reason, many implementations add a small amount of domain knowledge or heuristics to steer simulations toward more likely or more relevant outcomes (I will do this in the Connect 4 example below). Another concern is the size of the search tree itself: while MCTS is more scalable than exhaustive methods, it can still blow up in expansive problem spaces if it does not prune or limit growth effectively. Implementation details such as how states are represented, how child nodes are stored, and how results are backpropagated can significantly impact its performance.

In practice, the so-called “anytime” property of MCTS is one of its greatest strengths. You can let the algorithm run for as many or as few iterations as time allows, and it will offer the best solution it has found up to that point. Longer run times translate into more simulations, deeper exploration, and more refined estimates of action quality, but even with limited time, MCTS can generate a decent approximation of the optimal move.

So by blending random sampling with incremental, iterative updates to a tree of possibilities, MCTS circumvents the need for elaborate heuristics or full look-ahead. Through repeated applications of selection, expansion, simulation, and backpropagation, MCTS transforms what might otherwise be an intractable search into a manageable process, making it a "go to" strategy for complex game spaces like Go, Chess, and Othelo.

Before moving on to the Connect 4 code, I should mention the algorithm of minimax search with alpha/beta pruning. Historically, minimax has been the "go to" algorithm for games like Chess and the best Chess engines in the world have been built around minimax like Stockfish. Although recently a MCTS type Chess engine named Leela has competed very well with Stockfish (although technically it's not a pure MCTS - it uses a neural network to simulate the rollouts). Minimax with alpha/beta pruning is a very deterministic search. It systematically explores the game tree to a certain depth (or until terminal states). Typically, games like Chess with large branching factors must limit search depth. At the leaf nodes (or at the cutoff depth), an evaluation function (heuristic) estimates the position’s value, which is where pruning comes in. Pruning uses α (alpha) and β (beta) bounds to prune branches that cannot affect the final minimax decision.

The main difference between MCTS and minimax is that MCTS is probablistic and minimax is deterministic. Minimax is effective in games with reasonable branching factor like Chess and Checkers. But with games like Go, the number of states grows exponentially with depth of search.

So based on this discussion, you're probably thinking that minimax would be the best algorithm for Connect 4, since search space of possible moves and the depth of search is very limited - and you would be right. And certainly anyone building a game of Connect 4 would almost automatically use a minimax type algorithm. But with the following code using MCTS, I'm showing MCTS can do as well as minimax given a time constraint.

The Github repository can be found here. The most important function for the implementation is in the mcts.py file in a function unsurprisingly called mcts. This function steps through the four stages I outlined above of selection, expansion, simulation, and backpropagation. It iteratively does this under a time constraint. I have set my time constraint for 1 second - it has to come up with a move in under 1 second. In the selection stage, it is looking for the child node with the best UCT value. Once that node is found, the expansion stage expands that node by taking one untried move and creating a child node. Then the simulation stage simulates a random game of moves (rollout) from that child node until there is a winner or a draw. The backpropagation stage collects the simulation results up the nodes in that tree path. Once the while loop is finished, the node with the highest node count is the move that will be taken (which in the case of Connect 4 is the column that the piece will be dropped).

def mcts(root_board, current_player, simulations=500, time_limit=1.0):
    """
    Perform MCTS from the root state and return the column of the best move.
    """
    start_time = time.time()
    root_node = MCTSNode(root_board, current_player)
    
    while (time.time() - start_time) < time_limit:
        # 1. Selection
        node = root_node
        while not node.untried_moves and node.children:
            node = best_child(node)
        
        # 2. Expansion
        if node.untried_moves:
            node = expand_node(node)
        
        # 3. Simulation
        winner = simulate_game(node.board, node.current_player)
        
        # 4. Backpropagation
        backpropagate(node, winner)
    
    # After time is up, pick the child with the highest visit count.
    best_move_node = max(root_node.children, key=lambda c: c.visits) if root_node.children else None
    
    if best_move_node is None:
        # fallback if somehow no children
        return random.choice(get_valid_moves(root_board))
    
    # Return the column that leads to best_move_node
    for col in get_valid_moves(root_board):
        candidate_board = make_move(root_board, col, current_player)
        if candidate_board == best_move_node.board:
            return col
    
    return random.choice(get_valid_moves(root_board))

When the computer is looking to make its move, it doesn't automatically call this mcts function to get its next move. As I mentioned above, many games will include heuristics to steer the decision to look for a forced Connect 4 win or to block the opponent's forced win. Chess engines will incorporate special case checks for forced checkmates or forced draws and then do their algorithm search like the minimax. Similarly, I created a find_immediate_win_or_blockade function so as to not miss a forced win or loss that it will look at before doing the mcts function.

So this function will look to make sure the computer doesn't miss a move before it does its time budgeted move search:
def find_immediate_win_or_block(board, current_player):
    """
    Check if current_player can immediately win,
    or if the opponent can immediately win next turn (then block).
    """
    # Immediate win
    for col in get_valid_moves(board):
        temp_board = make_move(board, col, current_player)
        if check_winner(temp_board, current_player):
            return col
    
    # Block opponent
    opponent = get_next_player(current_player)
    for col in get_valid_moves(board):
        temp_board = make_move(board, col, opponent)
        if check_winner(temp_board, opponent):
            return col
    
    return None  
One more function I think I need to explain looked like the code below. This ai_move function would call the find_immediate_win_or_block function to look for an immediate win or block and if there wasn't an immediate win or block would then call the mcts function to look for a column to drop its piece. However, when I gave the link to family and friends, they complained that they couldn't ever win. So I added a 1 through 5 difficulty setting that the user can set. And if you look at the updated Github code, level 5 will do this original code, but levels 1-4 will choose a probabilistic amount of times that it will not do the MCTS and instead it will just drop a piece randomly. There are other ways to handle difficulty level, such as lessening the search time or space, but for Connect 4, mixing in a random move seems to work well.
def ai_move(board, current_player, difficulty):
    """
    Decide AI move:
    1. Check immediate win/block
    2. Otherwise use MCTS
    """
        
    col = find_immediate_win_or_block(board, current_player)
                  
    if col is not None:
        return col
                          
    return mcts(board, current_player, simulations=300, time_limit=1.0)
Initially, I wrote this Connect 4 code where the output was displayed in the terminal, but I wanted to share it with other people. But at the same time I wanted a minimalist implementation, so I did the unusual idea of doing it as a Streamlit application using emoticons. Then end result is here, which you can play.


Non-typical MCTS: Customer Journey

Now that we have an explanation and example code of the typical application of MCTS two player game, we can talk about an example of non-typical application. There are many, many possible applications, but let's just look at one, which is an e-commerce platform where the goal is to guide consumers through a personalized journey (e.g., product recommendations, promotional offers, and website interactions) to maximize conversions, repeat purchases, and customer satisfaction.

In this type of example we have:
  • States in the customer journey:
    • Customer demographics and preferences.
    • Interaction history (e.g., clicks, time spent, abandoned carts).
    • Contextual data (e.g., time of day, device type, or browsing session).
  • Actions
    • Recommending specific products.
    • Sending a targeted promotional email or notification.
    • Offering discounts or free shipping.
    • Displaying upsell or cross-sell opportunities.
  • Reward Function
    • Conversion events (e.g., purchases or sign-ups).
    • Total revenue or profit generated.
    • Customer satisfaction and engagement metrics (e.g., session length, feedback).
  • Simulations
    • Rollouts simulate different sequences of actions to predict customer responses, using probabilistic models of consumer behavior (e.g., likelihood of purchase or churn)
  • Balancing of Exploration and Exploitation
    • MCTS balances exploring new strategies (e.g., trying novel product bundles or marketing messages) with exploiting known successful approaches (e.g., strategies that previously led to high-value purchases).

For example, a customer visits an online store and views a product but doesn’t add it to their cart. The platform needs to decide: should it recommend a similar product?, offer a discount on the viewed product? or highlight customer reviews or provide an FAQ link to address hesitation? Using MCTS, the platform simulates potential outcomes of these actions (e.g., conversion likelihood) and dynamically selects the one with the highest long-term reward, considering both immediate revenue and customer loyalty.

Using MCTS for the customer journey provides dynamic personalization, long term optimization, and is scalable and probabilistic. The MCTS driven system adapts to real-time customer interactions, creating highly tailored journeys. Instead of focusing on one off conversions, the system optimizes for lifetime customer value. MCTS handles the uncertainty and variability in consumer behavior effectively, especially for large scale platforms. Using a MCTS algorithm could enable smarter decision making and better shopping experiences.

Conclusion

Monte Carlo Tree Search (MCTS) is a robust and adaptable algorithm capable of addressing a diverse range of decision making situations. It has been successfully applied in domains such as board games, but could be applied atypically in areas like robotics, education, biotechnology, and e-commerce. This is primarily because of its versatility and efficacy in complex and uncertain search spaces. By balancing exploration and exploitation, MCTS optimizes decision making in contexts where exhaustive searches are computationally infeasible.

Additionally, there are many other variations of MCTS besides the ones I have presented as outlined in this paper by Browne et al. and organized in a really nice presentation by Bobak Pezeshki here. These variations involve different ways to explore the tree, expand nodes, simulate the playouts, policies, and backpropation.

And very, very recently Microsoft came out with this exciting paper called "rStar-Math: Small LLMs Can Master Math Reasoningwith Self-Evolved Deep Thinking." The paper shows how a small model could be trained to rival the metrics of the OpenAI o1 model in math. It does this by "exercising deep thinking” through Monte Carlo Tree Search (MCTS)" - once again showing the importance of MCTS in AI.

MCTS with its flexibility, probabilistic nature, and adaptability to domain specific heuristics, excels in both the typical uses of traditional games and atypically in complex real world problems making it an important part of AI innovation and optimizing diverse systems.

Wednesday, January 1, 2025

Machine Leaning Classification: Scikit-Learn, PyTorch, and TensorFlow Examples

Although one can find machine learning examples using scikit-learn, PyTorch, and TensorFlow separately, there aren't really examples where one can see a comparison for these different frameworks on a standard dataset all in one place. But I think it is instructive to see how to use the same variables from the same dataset to accomplish the same prediction task. And that is what we're going to do in this post. We're going to use a very standard diabetes dataset to create basic example classification models based on seven variables to predict if someone is likely to be diabetic. The Github repository is here, the full notebook used to build the models is here, and the deployed Streamlit app can be accessed here.

The seven predictor variables are:

  • Number of pregnancies (the data was trained on all female respondents)
  • Glucose
  • Skin thickness
  • Age
  • BMI
  • Blood Pressure
  • Insulin
The dataset contains an "outcome" column that indicates the presence or absence of diabetes. We'll build four separate models:
  1. scikit-learn (Random Forest)
  2. scikit-learn (Gradient Boost)
  3. PyTorch (Neural Network)
  4. TensorFlow (Neural Network)
The models we will build in this post will focus on basic implementations emphasizing the mechanics and not on other topics like data cleaning, optimization, or fine tuning - although they are also important.

First let's read in the data and create the train/test splits. We'll use the same splits for all four models.

df = pd.read_csv('diabetes.csv')
X = df.drop("Outcome", axis=1)
y = df["Outcome"]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=78)

Random Forest Classifier:

Create the random forest classifier instance:

rf_model = RandomForestClassifier(n_estimators=100, random_state=78)
Fit the model:

rf_model= rf_model.fit(X_train, y_train)
Make predictions using the testing data:

predictions = rf_model.predict(X_test)

Save the model for the Streamlit app:
filename = 'rf.sav'

pickle.dump(rf_model, open(filename, 'wb'))

Create classification report:
filename = 'rf.sav'
print(classification_report(y_test, predictions))


              precision    recall  f1-score   support

           0       0.80      0.86      0.83       129
           1       0.66      0.56      0.60        63

    accuracy                           0.76       192
   macro avg       0.73      0.71      0.72       192
weighted avg       0.75      0.76      0.75       192

cm = confusion_matrix(y_test, predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
As you can see the scikit-learn implementation is pretty straightforward and follows a "model, fit, predict" pattern. Likewise, gradient boost follows the same pattern.

Gradient Boost Classifier:

# Create the gradient boost classifier instance
gb_model = GradientBoostingClassifier(random_state=78)
# Fit the model
gb_model = gb_model.fit(X_train, y_train)
# Make predictions using the testing data
predictions = gb_model.predict(X_test)
# Save the model for the Streamlit app
filename = 'gb.sav'
pickle.dump(gb_model, open(filename, 'wb'))
# Create classification report
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

           0       0.79      0.87      0.83       129
           1       0.66      0.52      0.58        63

    accuracy                           0.76       192
   macro avg       0.72      0.70      0.71       192
weighted avg       0.75      0.76      0.75       192  

cm = confusion_matrix(y_test, predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()

Neural Network (PyTorch):

For the neural networks for both PyTorch and TensorFlow, we will scale the data. Scaling the data wasn't really necessary for the tree-based models, but it is for the neural networks. We use the same number of layers and nodes per layer for both the PyTorch and TensorFlow models - 1 feature input layer, 2 hidden layers (16 and 8 nodes with RELU activation functions), and 1 output node that uses a sigmoid activation function. Each of the two networks will run with 100 epochs. And both are using binary cross-entropy for the loss function and Adam for optimization. There is some flexibility of how binary cross-entropy can be set with regards to the sigmoid function between the two frameworks, but how we are doing it here, it is effectively the same.

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import StandardScaler

Standardize the features:

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Now I'll convert the data to PyTorch tensors:

X_train_scaled = torch.tensor(X_train_scaled, dtype=torch.float32)
X_test_scaled = torch.tensor(X_test_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).unsqueeze(1)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32).unsqueeze(1)

Define the neural network model:

class DiabetesPTModel(nn.Module):
    def __init__(self):
        super(DiabetesPTModel, self).__init__()
        self.fc1 = nn.Linear(7, 16)
        self.fc2 = nn.Linear(16, 8)
        self.fc3 = nn.Linear(8, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.sigmoid(self.fc3(x))
        return x

Initialize the model, loss function, and optimizer:

pt_model = DiabetesPTModel()
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Train the model:

num_epochs = 100
for epoch in range(num_epochs):
    pt_model.train()
    optimizer.zero_grad()
    outputs = pt_model(X_train_scaled)
    loss = criterion(outputs, y_train_tensor)
    loss.backward()
    optimizer.step()

    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Evaluate the model:

pt_model.eval()
with torch.no_grad():
    predictions = pt_model(X_test_scaled)
    predictions = predictions.round()
    accuracy = (predictions.eq(y_test_tensor).sum() / float(y_test_tensor.shape[0])).item()
    print(f'Accuracy: {accuracy:.4f}')

Save the model and scaler - we will need them for the Streamlit app:

torch.save(pt_model.state_dict(), 'diabetes_model_pt.pth')
with open('scaler.pkl', 'wb') as f:
    pickle.dump(scaler, f)

Get predictions for test set for classification report:

with torch.no_grad():
    predictions = model(X_test_scaled)
    predictions = predictions.round()

print(classification_report(y_test_predictions, predictions))

              precision    recall  f1-score   support

           0       0.79      0.88      0.83       129
           1       0.67      0.52      0.59        63

accuracy                               0.76       192
macro avg          0.73      0.70      0.71       192
weighted avg       0.75      0.76      0.75       192

cm = confusion_matrix(y_test_tensor, predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
Neural Network (TensorFlow):

Define the neural network model:

import tensorflow as tf

tf_model = tf.keras.Sequential([
    tf.keras.layers.Dense(16, activation='relu', input_shape=(7,)),
    tf.keras.layers.Dense(8, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

Compile the model:

tf_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Train the model. Use the same scaled data used with the PyTorch model:

tf_model.fit(X_train_scaled, y_train, epochs=100, batch_size=32, validation_split=0.2)

Evaluate the model:

loss, accuracy = model.evaluate(X_test_scaled, y_test)
print(f'Accuracy: {accuracy:.4f}')

Save the model for the Streamlit app:

tf_model.save('diabetes_model_tf.h5')

Get predictions for test set for classification report:

predictions = tf_model.predict(X_test_scaled).round()
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

           0       0.81      0.83      0.82       129
           1       0.63      0.60      0.62        63

    accuracy                           0.76       192
   macro avg       0.72      0.72      0.72       192
weighted avg       0.75      0.76      0.75       192

cm = confusion_matrix(y_test, predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
Model Comparison:

These examples show basic implementations of scikit-learn, PyTorch, and TensorFlow models. This is not to demonstrate the huge number of options that are available, such as for fine tuning, data loading in PyTorch, or the properties of tensors in general. We could have also have computed loss curves and done things like adjust for sample imbalance. But from what we did do, we can see the fundamental structure of each of the models.

In comparing the output, we can see confusion matrices that are very similar to each other, which is most likely a function of the data itself or the small size of the data. Normally, these models can vary significantly from each other as far as evaluation of their performance.

Before moving on to deploy these models to a Streamlit app, we could look at one more characteristic of the models, which is to answer the question of which are the most important variables to the model. We'll do that for one of the one of the models - the random forest model.

We can get and display the most important features like this:

importances = rf_model.feature_importances_
importances_sorted = sorted(zip(rf_model.feature_importances_, X.columns), reverse=True)

# Plot the feature importances
features = sorted(zip(X.columns, importances), key = lambda x: x[1])
cols = [f[0] for f in features]
width = [f[1] for f in features]

fig, ax = plt.subplots()

fig.set_size_inches(8,6)
plt.margins(y=0.001)

ax.barh(y=cols, width=width)

plt.show()



As we can see from the chart: glucose, BMI, and age are the most important variables - at least for the random forest model.

Streamlit Application:

For the Streamlit app, we allow the user to enter in values for any of the seven predictor variables and use default values if they don't change them. They can then select from any of the four models we built (and saved) and get a prediction. And very importantly for the two models that we scaled the data, we load the trained scaler for each of those two models and apply it to the user's selections.

And that's it! We have a deployed Streamlit app for the four models.



Sunday, December 22, 2024

Implications of a New Tool for Molecular Dynamics: MDGen

With the end of 2024 and the holidays and extra time off, I've had a bunch of time to catch up on my reading of recent research papers that I've been meaning to get to and one of them that I've been very excited about is a framework called MDGen introduced in a paper called "Generative Modeling of Molecular Dynamics Trajectories" by Jeng et al. that was put out recently. I think this paper needs to get much wider reach, which is one of the reasons I'm writing about it here. But I also want to describe how I believe this system could be extended and how if combined with other systems into a larger pipeline could have a significant impact and really stretch the cutting edge in areas like drug discovery.

Before I get into why I think this paper is important, how MDGen could be used and what its implications are beyond those outlined in the paper, I want to go over a very brief description of molecular dynamics. 

Molecular Dynamics:

Molecular dynamics (MD) plays a pivotal role in drug discovery by providing detailed insights into the movement and interactions of molecules over time. Unlike static structural snapshots, MD simulations capture the dynamic behavior of proteins, ligands, and their complexes, revealing conformational changes, binding pathways, and interaction forces. These insights are critical for understanding how drugs interact with their targets, particularly in complex biological environments where flexibility and motion significantly influence binding affinity and specificity.

In drug discovery, MD is invaluable for identifying binding sites, exploring conformations, and predicting the stability of protein-ligand complexes. By simulating the behavior of molecules at atomic resolution, researchers can assess how candidate drugs bind to their targets, optimize lead compounds, and predict resistance mechanisms. MD also aids in exploring challenging targets like intrinsically disordered proteins, which lack stable structures and require dynamic analysis to uncover potential binding sites. The ability to simulate these dynamic processes accelerates the drug development pipeline, reducing reliance on trial and error methods and enabling more precise, mechanism-driven drug design.

MDGen:

While protein structure predictions like AlphaFold 3, released this past year, has gotten a lot of attention, the problem is that just having that static pose does not get you everything you need, such as building a new enzyme that catalyzes a new reaction. MDGen working with other tools can help with understanding this process.

MDGen is a generative model designed to simulate molecular dynamics (MD) trajectories, offering implications for computational chemistry, biophysics, and AI-driven molecular design. Molecular dynamics simulations, while essential for exploring the behavior of atoms and molecules, are computationally expensive due to the significant disparity between the timescales of integration steps and meaningful molecular phenomena. MDGen addresses this challenge by leveraging deep learning techniques to provide a flexible, multi-task surrogate model capable of tackling diverse downstream tasks.

The generative modeling approach of MDGen diverges from traditional methods, which focus on autoregressive transition density or equilibrium distribution. Instead, MDGen employs end-to-end modeling of complete MD trajectories, enabling applications beyond forward simulation. These include trajectory interpolation (transition path sampling), upsampling molecular dynamics trajectories to capture fast dynamics, and inpainting missing molecular regions for tasks like scaffold design. So this framework expands the scope of MD simulations, making it possible to infer rare molecular events, bridge gaps in trajectory data, and scaffold molecular structures for desired dynamic behaviors.

MDGen is capable of reproducing MD-like outputs for unseen molecules. The model achieves a high degree of accuracy in capturing free energy surfaces, reproducing Markov state fluxes, and predicting torsional relaxation times. Their benchmarks indicate that MDGen can emulate the structural and dynamical content of MD simulations with significant computational efficiency, offering speed-ups of up to 1000x compared to traditional MD methods. Work that was measured in weeks could conceivably be done in hours. This efficiency is particularly advantageous in protein simulation tasks, where MDGen is shown to outperform existing techniques in recovering ensemble statistical properties while being orders of magnitude faster.

With the generative inpainting idea, you can think of this inpainting as like a SORA for molecular dynamics. This inpainting allows for filling in missing molecular regions and generating consistent dynamics for the entire structure. This capability has significant implications for molecular design, particularly in creating new molecules or scaffolding specific dynamics into protein designs. For example, in enzyme engineering, MDGen could generate consistent side-chain configurations and dynamics around a catalytic site, ensuring functional integration into the broader molecular structure.

Not to be too hyperbolic, but I think I'm able to call the implications of this profound, because inpainting inside of trajectories is wild. Because by introducing this generative modeling into MD trajectory data, MDGen enables rapid exploration of molecular dynamics. The framework’s ability to interpolate trajectories suggests a potential for unique hypothesis generation in molecular mechanisms.

Future Possibilites:

Okay, so the potential of the ideas in this paper are incredibly interesting. But I want to outline some of the ideas not specifically mentioned in the paper, how the model could be improved, and potential applications beyond what is readily apparent in the paper.

First, there's the obvious idea of just fine tuning the model to specific use cases. But beyond applying a fine tuned version of the model, it could be worthwhile to look at different tokenization strategies and definitely it would be worthwhile to retrain the model on more and different types of data. For example, it is trained on single proteins. A model could be created using some of the MDGen ideas but trained on protein complexes.

But beyond those types of data and strategies, MDGen could also be trained with multimodal data sources, such as textual or experimental descriptors, which could be very interesting. Furthermore, it could automate experimental design by proposing dynamic behaviors tailored to experimental conditions, guiding laboratory work with predictive insights. Similarly, MDGen could leverage large scale knowledge graphs of molecular interactions and pathways, refining trajectory predictions to include broader biological contexts. These integrations could position MDGen as a versatile tool that bridges the gap between computational predictions and experimental realities.

Furthermore, MDGen's molecular inpainting feature could provide insight into precise design and repair of molecular structures. It could be useful in applications like mutation repair, where it could predict the impact of a mutation and suggest compensatory structural or dynamic changes to restore function. In synthetic biology, MDGen could be used to engineer entirely new molecular pathways with tailored dynamic behaviors, such as light-activated enzymes or thermally sensitive molecular systems. 

The interpolation capabilities could be reimagined as tools for hypothesis generation, allowing the uncovering of unknown intermediate states in biochemical pathways or explore dynamic transitions in materials science. This could significantly aid in understanding complex processes, such as protein-ligand binding or phase transitions at the molecular level.

MDGen could also provide a platform for studying equilibrium and non-equilibrium dynamics, offering insights into phenomena such as protein folding and misfolding in diseases like Alzheimer’s. Its trajectory generation capabilities could be used to explore how time asymmetry manifests at the molecular level, providing theoretical insights into entropy and energy landscapes. As more high-quality MD trajectory data becomes available, MDGen or a model like it that incorporated that data could model increasingly complex systems, offering new ways to study crowded cellular environments or investigate the limitations of Markovian dynamics in highly dynamic systems.

In very practical applications like drug screening and optimization, MDGen could enhance virtual pipelines by predicting dynamic interaction profiles, especially for targets like intrinsically disordered proteins. Its role in multi-scale modeling could bridge atomic-level changes and mesoscopic behaviors, while molecular AI agents built on MDGen could iteratively explore chemical space, design new molecules, and simulate their dynamics for optimized functionality.

Conclusion:

MDGen presents transformative opportunities in molecular science, building on its ability to generate molecular dynamics (MD) trajectories. By framing MD generation as analogous to video modeling, MDGen potentially offers a unified platform for understanding and designing molecular systems. 


Sunday, December 8, 2024

Eigenvalues and Eigenvectors and their Applications

In linear algebra, eigenvalues and eigenvectors are essential concepts in machine learning and artificial intelligence - and are also important in applications in physics, engineering, and much more. They often seem abstract at first, but we can build up an intuition with examples and practical applications.

What are Eigenvalues and Eigenvectors?

 

Suppose we have a square matrix \( A \), which represents some kind of transformation (e.g., rotation, scaling, or shear). When this transformation is applied to a vector \( \mathbf{v} \), sometimes the vector changes direction, and sometimes it doesn’t. When a vector doesn’t change direction (it might still stretch or shrink), that vector is called an eigenvector. The amount by which the vector is stretched or shrunk is called the eigenvalue.
  • Eigenvector \( \mathbf{v} \): A vector that doesn’t change direction when a transformation is applied.
  • Eigenvalue \( \lambda \): A scalar that tells how much the eigenvector is stretched or shrunk.

Mathematically, this is written as:

\[ A \mathbf{v} = \lambda \mathbf{v} \]

Here:
  • A is the transformation matrix.
  • \( \mathbf{v} \) is the eigenvector.
  • \( \lambda \) is the eigenvalue.

A Simple Example: Stretching Along an Axis:


Imagine a transformation matrix: \(A = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix} \) This matrix scales vectors along the x-axis by 3 and along the y-axis by 2. If we apply this transformation to the vector \( \mathbf{v} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \) , we get:

\( A \mathbf{v} = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 3 \\ 0 \end{bmatrix} \)

Here, \( \mathbf{v} \) doesn’t change direction; it’s simply scaled by 3.
Thus:
  • \( \mathbf{v} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \) is an eigenvector.
  • \( \lambda = 3 \) is the eigenvalue.
Similarly, the vector \( \mathbf{w} = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \) we get:

\( A \mathbf{w} = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 2 \end{bmatrix} \)

And here again, \( \mathbf{w} \) doesn’t change direction; it’s simply scaled by 2.
Thus,
  • \( \mathbf{w} = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \) is another eigenvector.
  • \( \lambda = 2 \) is the eigenvalue.

Real World Applications


Before we get more into the math, let's talk about why eigenvalues and eigenvectors are important. Eigenvalues and eigenvectors are more than just mathematical abstractions; they play critical roles in real-world applications. Here are some practical examples:
  1. Machine Learning and Principal Component Analysis (PCA)
    In PCA, we compute the covariance matrix of a dataset and find its eigenvalues and eigenvectors. The eigenvectors represent the principal axes of the data, while the eigenvalues indicate the amount of variance explained along each axis. By selecting the top eigenvectors (those with the largest eigenvalues), we can reduce the dataset’s dimensionality while retaining most of its variance.
  2. Graph Theory
    In network analysis, the eigenvalues and eigenvectors of the adjacency matrix of a graph help us understand its structure. For instance, the largest eigenvalue of the adjacency matrix can indicate the network’s connectivity. Another example is that eigenvectors are used in Google’s PageRank algorithm to rank webpages based on their importance.
  3. Quantum Mechanics
    In quantum mechanics, eigenvalues correspond to measurable quantities like energy levels of a system. The Schrödinger equation involves finding eigenvalues and eigenvectors of the Hamiltonian operator, which gives the possible energy states of a particle.
  4. Image Compression
    In image processing, Singular Value Decomposition (SVD), which is a related concept, relies on eigenvalues and eigenvectors. By keeping the top eigenvalues and their corresponding eigenvectors, we can approximate an image, significantly reducing storage requirements without much loss of quality.

How to Compute Eigenvalues and Eigenvectors:


Alright. Back to the math! How do we actually calculate eigenvalues and eigenvectors?

The determinant plays a critical role in identifying eigenvalues, which leads us to their corresponding eigenvectors.

1. Find Eigenvalues: \( \lambda \):
The eigenvalues of a matrix A are found by solving the characteristic equation utilizing the determinant:

\( \text{det}(A - \lambda I) = 0 \)

And here, \( I \) is the identity matrix \( \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \).

To find \( \lambda \), we solve \( \text{det}(A - \lambda I) = 0 \), which expands into a polynomial equation in \( \lambda \) (called the characteristic polynomial). The roots of this polynomial are the eigenvalues.

Example:

Let:

\( A = \begin{bmatrix} 4 & 2 \\ 1 & 3 \end{bmatrix} \)

The characteristic equation is:

\( \text{det}(A - \lambda I) = \text{det} \begin{bmatrix} 4-\lambda & 2 \\ 1 & 3-\lambda \end{bmatrix} = 0 \)

Expand the determinant:

\( \text{det} = (4-\lambda)(3-\lambda) - (2)(1) \)

\( \text{det} = \lambda^2 - 7\lambda + 10 = 0 \)

Solving this quadratic equation, we get:

\( \lambda = 5, \quad \lambda = 2 \)

Thus, the eigenvalues are \( \lambda = 5 \) and \( \lambda = 2 \).

2. Find Eigenvectors \( \mathbf{v} \):
Once the eigenvalues \( ( \lambda ) \) are identified, we find the corresponding eigenvectors \( ( \mathbf{v} ) \):

For each eigenvalue \( \lambda \), we solve: \( (A - \lambda I) \mathbf{v} = 0 \)

The solution to this equation is a non-zero vector \( \mathbf{v} \).

Example:

For \( A \) and \( \lambda = 5 \) from above:

\( A - 5I = \begin{bmatrix} 4-5 & 2 \\ 1 & 3-5 \end{bmatrix} = \begin{bmatrix} -1 & 2 \\ 1 & -2 \end{bmatrix} \)

Solve:

\( \begin{bmatrix} -1 & 2 \\ 1 & -2 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \)

This gives:

\( -1x + 2y = 0 \quad \implies \quad y = \frac{1}{2}x \)

Let \( x = 2 \) (arbitrary scaling), then:

\( \mathbf{v_1} = \begin{bmatrix} 2 \\ 1 \end{bmatrix} \)

For \( A \) and \( \lambda = 2 \) from above:

\( A - 2I = \begin{bmatrix} 4-2 & 2 \\ 1 & 3-2 \end{bmatrix} = \begin{bmatrix} 2 & 2 \\ 1 & 1 \end{bmatrix} \)

Solved similarly to find:

\( \mathbf{v_2} = \begin{bmatrix} -1 \\ 1 \end{bmatrix} \)

What does this mean geometrically - and practically?


Given the matrix \( A \) from before:

\( A = \begin{bmatrix} 4 & 2 \\ 1 & 3 \end{bmatrix} \)

This matrix represents a linear transformation in 2D space. When applied to a vector, \( A \) performs a combination of scaling, rotation, and/or shearing.

The eigenvalues and eigenvectors from earlier calculations for matrix \( A \):

  • Eigenvalues: \( \lambda_1 = 5 , \lambda_2 = 2 \)
  • Eigenvectors:
    • Corresponding to \( \lambda_1 = 5 : \mathbf{v}_1 = \begin{bmatrix} 2 \\ 1 \end{bmatrix} \)
    • Corresponding to \( \lambda_2 = 2 : \mathbf{v}_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix} \)
What These Values Mean
    1. Eigenvalue \( \lambda_1 = 5 \) and Eigenvector \( \mathbf{v}_1 = \begin{bmatrix} 2 \\ 1 \end{bmatrix} \)
    • Geometrically: The eigenvector \( \mathbf{v}_1 = \begin{bmatrix} 2 \\ 1 \end{bmatrix} \) lies along a specific direction in 2D space. When the transformation \( A \) is applied to any vector along this direction, the vector is stretched by a factor of 5, but its direction remains unchanged.
    • Practically: This tells us that there is a “preferred direction” in the transformation where the stretching is maximized by a factor of 5. Vectors aligned with \( \mathbf{v}_1 \) are scaled significantly, making this direction dominant in how \( A \) affects space.
    2. Eigenvalue \( \lambda_2 = 2 \) and Eigenvector \( \mathbf{v}_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix} \)
    • Geometrically: The eigenvector \( \mathbf{v}_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix} \) points in another specific direction in 2D space. Vectors along this direction are scaled by a factor of 2 when the transformation \( A \) is applied, with no change in direction.
    • Practically: This direction represents a less dominant feature of the transformation, where stretching occurs to a lesser extent (factor of 2). It shows that the transformation affects different directions differently.
Visually we can imagine a grid of arrows (vectors) in 2D space. Applying the transformation \( A \):
  • Along \( \mathbf{v}_1 \) (eigenvector for \( \lambda_1 = 5 ) \), vectors stretch significantly, growing by a factor of 5.
  • Along \( \mathbf{v}_2 \) (eigenvector for \( \lambda_2 = 2 ) \), vectors stretch less, growing by a factor of 2.
  • For vectors not aligned with \( \mathbf{v}_1 \) or \( \mathbf{v}_2 \), the transformation involves a combination of scaling and changing direction. These vectors can be expressed as combinations of the eigenvectors, showing how any arbitrary vector is affected.

The eigenvalues and eigenvectors of \( A \) provide insights into its transformation behavior. The eigenvectors \( \mathbf{v}_1 \) and \( \mathbf{v}_2 \) represent the preferred directions in 2D space that remain unchanged in orientation under the transformation. The eigenvalues \( \lambda_1 = 5 \) and \( \lambda_2 = 2 \) indicate how much vectors along these directions are stretched or compressed, with \( \mathbf{v}_1 \) experiencing a larger scaling factor.

Furthermore, any vector in space can be expressed as a combination of the eigenvectors, allowing us to decompose the transformation 
\( A \) into its fundamental actions - scaling along these specific directions - providing a complete and intuitive understanding of how \( A \) reshapes space.

Eigenvalues and Eigenvectors in PyTorch


Okay, now that we understand the math, how can we do it programmically in PyTorch?

import torch
# Define the matrix A
A = torch.tensor([[4.0, 2.0], [1.0, 3.0]])

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = torch.linalg.eig(A)

# Display the matrix
print("Matrix A:")
print(A)

# Display eigenvalues
print("\nEigenvalues:")
print(eigenvalues)

# Display eigenvectors
print("\nEigenvectors:")
print(eigenvectors)

# Verify the eigenvalue equation A * v = λ * v for the first eigenvalue and eigenvector
v1 = eigenvectors[:, 0] # First eigenvector
lambda1 = eigenvalues[0] # First eigenvalue

verification = torch.matmul(A, v1) # A * v
expected = lambda1 * v1 # λ * v

print("\nVerification for the first eigenvalue and eigenvector:")
print("A * v1 =", verification)
print("λ1 * v1 =", expected)


The output will look like this:

Matrix A:
tensor([[4., 2.], [1., 3.]])

Eigenvalues:
tensor([5.+0.j, 2.+0.j])

Eigenvectors:
tensor([[ 0.8944, -0.7071], [ 0.4472, 0.7071]])

Verification for the first eigenvalue and eigenvector:
A * v1 = tensor([4.4721+0.j, 2.2361+0.j])
λ1 * v1 = tensor([4.4721+0.j, 2.2361+0.j])


Now if you have been following along closely, you may be wondering why the eigenvalues from PyTorch of 5 and 2 match what we calcuated manually, but the eigenvectors \( \mathbf{v}_1 = \begin{bmatrix} 2 \\ 1 \end{bmatrix} \) and \( \mathbf{v}_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix} \) do not match \( \mathbf{v}_1 = \begin{bmatrix} 0.8944 \\ 0.4472 \end{bmatrix} \) and \( \mathbf{v}_2 = \begin{bmatrix} -0.7071 \\ 0.7071 \end{bmatrix} \).

This is because when solving the eigenvector equation \( (A - \lambda I)\mathbf{v} = 0 \), we calculate the eigenvectors up to any scalar multiple. For example, if [2, 1] is a solution, then [4, 2] or even [0.2, 0.1] are also valid eigenvectors because eigenvectors are defined up to scaling. And if you look above when we were calculating eigenvectors, we let \( x = 2 \) (arbitrary scaling).

PyTorch automatically normalizes the eigenvectors so that their length (or magnitude) is 1. This is achieved by dividing each eigenvector by its norm:

\( \|\mathbf{v}\| = \sqrt{x_1^2 + x_2^2} \)

For example:

• For the manually calculated eigenvector [2, 1]:

\( \|\mathbf{v}_1\| = \sqrt{2^2 + 1^2} = \sqrt{5} \)

Normalizing it:

\( \mathbf{v}_1 = \left[\frac{2}{\sqrt{5}}, \frac{1}{\sqrt{5}}\right] = [0.8944, 0.4472] \)

• Similarly, for [-1, 1]:

\( \|\mathbf{v}_2\| = \sqrt{(-1)^2 + 1^2} = \sqrt{2} \)

Normalizing it:

\( \mathbf{v}_2 = \left[\frac{-1}{\sqrt{2}}, \frac{1}{\sqrt{2}}\right] = [-0.7071, 0.7071] \)

Both the manually calculated eigenvectors ([2, 1] and [-1, 1]) and PyTorch’s normalized eigenvectors ([0.8944, 0.4472] and [-0.7071, 0.7071]) are equally valid. They represent the same direction in space, and the eigenvalues (5 and 2) remain unchanged. The difference is only due to normalization. This ensures a standardized representation of eigenvectors, which is particularly useful in programming and numerical computations. Many applications, such as Principal Component Analysis (PCA), assume normalized eigenvectors for interpretation or computation.

Summary

Eigenvalues and eigenvectors provide a great way to understand transformations in linear algebra. They appear in diverse fields, from machine learning and physics to structural engineering and image processing. We can use these tools to solve complex problems and gain insights into the behavior of systems.



Tuesday, November 19, 2024

The Evolution of Market and Political Research: AI Agents

The landscape of market and political research is about to undergo a significant transformation. Traditional methods like surveys, panels, and in-person focus groups, while long-standing, will increasingly be replaced by AI-driven alternatives. These methods, leveraging autonomous agents to simulate human behavior and attitudes, are proving to be faster, more cost-effective, and potentially more accurate. This shift represents a turning point in how we gather insights, and it is happening much sooner than many anticipated.

There are two papers that have recently come out that I want to use to illustrate what I believe is going to be possible:

  •  "Generative Agent Simulations of 1,000 People" is a paper that just came out of Stanford. The paper presents an architecture for simulating human behavior using generative agents informed by qualitative interviews. What's really interesting is that these agents are modeled after 1,052 real individuals. They replicate attitudes and behaviors across various social science tasks with high accuracy, performing comparably to human self-replication over time. The research demonstrates the agents’ utility in predicting responses to surveys, personality assessments, economic games, and experimental settings. By reducing demographic biases and enabling scalable simulations, the approach offers a powerful tool for understanding individual and collective behaviors in diverse contexts.
  • "Scaling Synthetic Data Creation with 1,000,000,000 Personas" is a paper that introduces Persona Hub, a collection of one billion synthetic personas designed to enhance data diversity and scalability in large language models (LLMs). By associating each persona with unique perspectives and knowledge, the framework enables the generation of highly diverse synthetic data across multiple applications, including math problems, instructions, and knowledge-rich texts. The approach overcomes limitations of previous data synthesis methods by leveraging personas to guide LLMs, demonstrating significant potential for advancing AI research, development, and practical applications

These are just two examples of the recent creation of agents, but there are many others and I have also been creating my own autonomous agents that can do focus groups and questionnaire research. So I strongly believe that AI and agents will have a large role in the future. But before we can talk about using AI agents, we need to examine the limitations of traditional research.

Limitations of Traditional Methods

I started out in market research many years ago in a part-time job during college checking data quality in survey questionnaires. When I graduated, I worked as an analyst for a company called Sophisticated Data Research (SDR). I left SDR after a few years, dissatisfied with the current state of software at the time for research and went on to join another company to write some of the first statistical software for Windows and for the internet for market research. In the early 2000s, I left to join a start up to build agents to model marketing effectiveness, so I've been around agents for over 20 years. So I'm very aware of agents, but also of the traditionalism in the industry and its limitations. 

Traditional approaches to market and political research have long faced challenges, but these issues have become more pronounced in recent years. Phone-based surveys, once a cornerstone of consumer and political research, have seen their accuracy steadily decline. The widespread use of mobile devices has fundamentally changed how people interact with calls - screening is common, response rates have plummeted, and the pool of reachable participants is increasingly skewed. This has led researchers to rely on heavy weighting of subpopulations to align with presumed demographic truths. However, this practice has become increasingly tenuous, bordering on speculative guesswork, as the assumptions underlying these adjustments often lack a solid foundation. As a result, the reliability of phone surveys is now widely questioned, making them an increasingly impractical method for gathering actionable data.

These challenges extend beyond phone surveys. In-person focus groups and panels also face issues with scalability, cost, and bias. Facilitating these sessions requires significant resources, and their relatively small sample sizes make it difficult to generalize findings. Biases - both from facilitators and participants - can further distort results. Focus groups are increasingly having a difficult time recruiting in some segments - doctors, researchers, engineers, etc. Together, these factors have created a pressing need for new methodologies that are more efficient and reliable.

The Role of AI and Agents in Simulated Research

Recent advancements in artificial intelligence, particularly in the creation and deployment of generative agents, are addressing many of these challenges. By using large libraries of personas, AI systems can simulate the attitudes, preferences, and behaviors of diverse populations - and difficult to reach populations like doctors. Studies have demonstrated that these agents can replicate human responses with a high degree of accuracy. For example, as mentioned earlier, research from Tencent’s Persona Hub highlights the ability to synthesize billions of personas, enabling nuanced and scalable simulations, while Stanford’s work on generative agents shows their effectiveness in predicting individual attitudes and behaviors.

These systems allow for the creation of virtual focus groups and the simulation of surveys in which each participant is an AI-driven persona. In virtual focus groups, the personas can interact dynamically, mimicking the complex interpersonal dynamics found in real-life settings. These approaches don't have to wait to recruit participants or field a study. They can be done immediately and not just done once but repeated hundreds or thousands of times. This approach enables the collection of insights that are not only faster to obtain but also potentially more comprehensive.

Benefits and Broader Implications

Simulated research offers several advantages that are increasingly difficult to ignore. First, it reduces the time and costs associated with traditional methods. Virtual focus groups can be conducted instantaneously and at a fraction of the cost, making it possible to run studies that were previously too expensive or logistically complex.

Second, the accuracy of these methods is rapidly improving. Generative agents have demonstrated their ability to align closely with human responses in studies, offering reliable insights that rival or exceed those obtained through traditional research. This capability challenges the reliance on demographic sampling by using more detailed persona-based approaches, which can reduce biases.

Third, the technology is evolving to overcome limitations such as knowledge cutoffs in large language models. Although, not too many people are talking about this, but in a paper titled "Mixture of a Million Experts" the authors talk about this idea of continuous learning. And this is really exciting - AI models are going to be able to continuously be updated. Continuous learning capabilities will enable AI agents to stay updated with real-time information without relying on web searches. This development will further enhance the utility of simulated research by providing more contextually relevant and up-to-date responses. This will enable the type of research based on recent current events - opening up a potentially effective tool to measure practically real-time economic and political attitudes.

Preparing for the Future

The rise of AI-driven research methods signals a need for companies in the market and political research sectors to rethink their approaches. Adapting to this new reality will require investing in AI capabilities and integrating them into existing workflows. Organizations will also need to reconsider their business models, as the cost structures of traditional methods are unlikely to remain competitive against the efficiency of synthetic research.

Some of the larger organizations will be unable or unwilling to adapt as they try to protect the ways they have been doing things going back decades. Most will try and add AI to their current offerings as a sincere but ultimately half-hearted attempt to remain relevant. They will talk about things like their agent based profiles in their new AI based consumer segments. First, if you are talking about your new "AI-based insights" as part of your new marketing, well everyone is saying that now and how is that any different than saying in the early 2000's that your new solution is using the World Wide Web. How does that excite a customer - when you are stating something obvious and what everyone else is saying? Second, don't just tack on AI onto your existing offerings. That's not  going to fly in a time of exponential change. In order to adapt to exponential change, you need to think radically. 

Because with the coming of agents, there will be some use cases where the cost of doing research will be driven to zero and the barrier to entry will be minimal.

While this transition may not render traditional methods entirely obsolete overnight, it is clear that the trajectory of research is changing. The industry must embrace these advancements to stay relevant in a world where insights will become increasingly instantaneous and accessible. If old value propositions are driven to zero, new value propositions and strategic advantages will need to be identified.

An Inevitable Shift

The adoption of AI-driven research is not a distant prospect; it is already happening. As the tools and techniques improve, they will become integral to understanding consumer and voter behavior. The question for organizations is not whether to adopt these methods but how quickly they can do so and how effectively they can integrate them into their operations.

The AI transformation of market and political research signals that innovation doesn't merely enhance - it redefines and disrupts industries entirely. AI agents are not just an alternative to traditional methods - they are a glimpse into the future of how we understand and engage with the world.



Thursday, October 24, 2024

Building an Artificial User Interface

(Updated on 11/15/24)

As artificial intelligence continues to evolve, AI agents will take on increasingly complex tasks. Although many agent frameworks currently exist—such as AutoGen, LangGraph, and CrewAI - and numerous papers have been written along with some successful proof of concepts, agents have been challenging to move into production with current models. However, agents will soon be deployed across various devices (in retail, labs, banking), automating tasks or enabling things that were previously thought impossible.

The open-source community have been developing agents using LLMs for the past few years, but it was well known that major AI labs were preparing to jump into the agent space. They have been discussing this at length, with companies like Anthropic and OpenAI making it clear that this was a major focus for them. Their upcoming releases, along with contributions from the wider community, should make 2025 the year of AI agents, with many capable of working together beyond just computers connected to the internet.

And just this week, Anthropic announced a version of their AI model, Claude, capable of computer use - translating instructions, checking spreadsheets, scrolling, moving cursors, and executing a series of actions to achieve objectives. Claude does this by analyzing screenshots, counting pixels to determine cursor movements, and clicking in the correct places—a process that, while innovative, underscores a significant inefficiency in how AI interacts with software designed for humans.

This approach requires AI agents to mimic human interactions, essentially teaching them to navigate interfaces built for human senses and motor skills. It’s akin to asking a robot to use a screwdriver designed for human hands instead of giving it a tool tailored to its mechanical capabilities. 

So this raises the question: Why are we making AI conform to human-centric software interfaces when we could design software specifically for AI agents? 

We need to focus on creating software that is data and task centric and not human-UI centric. The AI does not care about its user experience. So user experience needs to change to helping the human user express their objectives, helping the user guide or correct the AI, and display the outcome of the interactions.

The Inefficiency of Mimicking Human Interactions

Training AI agents to interact with software via graphical user interfaces (GUIs) such as what Anthropic has done involves complex image recognition, pixel counting, and simulated mouse movements. This not only consumes computational resources but also introduces potential errors. A slight change in the UI layout or an unexpected pop-up can confuse the AI, leading to failures in task execution.

Consider a scenario where an AI assistant needs to update a spreadsheet. Teaching it to navigate menus, click on cells, and input data as a human would is cumbersome. Instead, if the spreadsheet software provided an API for data manipulation, the AI could perform the task more efficiently and reliably.

Building Software for AI Agents

To overcome these inefficiencies, we should shift towards designing software that AI agents can interact with directly. This means extending existing applications or creating new ones with machine-readable interfaces that AI can access without relying on a GUI.

Although, it doesn't have to be REST APIs running locally, it could also be locally installed command line interfaces CLIs, I think the standardization and wide use of REST makes sense to use this structure. Conceivably, there could be a server running in the background and software designed for AI would be registered with the OS on installation and running on that server. So the agents would be able to easily see what functionality it had access to in order to accomplish a user's objective through internal software REST calls as well as using external calls to other APIs. The agents would then have local tools as well as external internet tools. This could be rolled out in various ways so that software that was made to this standard could coexist with traditional software (using the "agent  computer use" that Anthropic just released) or software could be released with both versions - the traditional software and software designed for AI.

As I've stated, it doesn't have to be a locally running background server with API endpoints, it could be some other kind of implementation, but there are plenty of examples of software that currently use background REST API calls. For example, you can have locally running versions of PostgreSQL, Gitlab, and Jenkins accessible through their API calls. Another good example is Home Assistant. You can use the Home Assistant API for home automation that can interact with various Home devices controlling lights, thermostats, and other devices through Home Assistant's API.

Example: An AI-Driven Spreadsheet

Here's a specific example of a productivity use case assuming it could be controlled through API endpoints (and Microsoft Excel already has a REST API). 

Imagine a spreadsheet application that offers a comprehensive API for data manipulation. An AI agent running locally on a machine could:

  • Read Data: Retrieve cell values, ranges, and metadata directly.
  • Write Data: Update cells, add formulas, and insert data without GUI interaction.
  • Analyze: Perform computations, generate charts, and identify trends through API calls.

The AI wouldn’t need to “see” the spreadsheet; it would understand its structure and content inherently, leading to faster and more accurate task completion. The AI doesn't need to "experience" the interface. It just needs efficient access to software and data.

Transforming Productivity Software

In this paradigm, productivity software undergoes a significant transformation.

Example: Code Development:

Instead of manually writing code, developers could specify functionality and constraints:

  • Intent: “Create a function that sorts a list of customer names alphabetically and removes duplicates.”
  • AI Agent’s Role: The AI generates the code, tests it, and presents it for review. The agent is capable of looking over the entire code base and design documents.
  • User Interaction: The developer reviews the code changes, provides feedback, accepts changes, and iterates as necessary.
We have been seeing this shift over the last few months already with software from ReplitV0 by Vercel, and Cursor.

Example: Research and Analysis:

Researchers could leverage AI agents to gather information and synthesize insights:

  • Intent: “Summarize the latest research on renewable energy storage solutions and prepare a presentation.”
  • AI Agent’s Role: The AI collects data from reputable sources, analyzes trends, and generates a presentation. These sources could be local sources or local functionality in other programs as well as internet sources.
  • User Interaction: The researcher reviews the content, adjusts focus areas, and finalizes the presentation.
This would all be done through AI agents working on the computer on data to meet objectives directly and not through manipulating a UI meant for humans.

While productivity software would see dramatic changes, entertainment software might remain largely unaffected. Games and media content are inherently human-centric experiences designed for enjoyment and engagement. However, AI could enhance these experiences by personalizing content or managing in-game assets based on user preferences.

An Operating System Designed for AI Agents

Taking this concept further, envision an operating system (OS) specifically designed for AI agents. AI legend Andrej Karpathy has proposed an AI-based operating system centered around large language models (LLMs) equipped with tools. I'm proposing taking this a step beyond that and saying that an OS should be developed that is explicitly AI-centric and agent-centric. This OS wouldn’t just be a platform for running applications but a dynamic environment where AI agents can perform complex operations seamlessly. 

While languages like Python and others already allow direct execution of OS commands, an API layer for operating system tasks could offer distinct advantages. By exposing OS functionalities—file management, network communication, process control, and more—through standardized RESTful APIs, such a system would provide language-agnostic access and simplify integration for AI agents built with diverse tools and frameworks.

This AI-centric OS would be more than a platform for executing applications; it would be a dynamic, modular environment tailored for agent-based interactions. By introducing an API layer, the OS could ensure consistent and secure access to its capabilities while abstracting the complexity of direct command execution. AI agents could leverage these APIs to interact with the OS in a predictable, scalable, and maintainable way, unlocking a new level of efficiency.

User Experience in an AI-Centric OS

For human users, interacting with such an OS would be fundamentally different:

  • Expressing Intent: Users would convey their objectives verbally or via text. For example, “Create an analysis of the last three months advertising effectiveness - make sure to take into account any competitive trends or exogenous variables.”
  • AI Execution: AI agents interpret these intents and execute tasks using the OS’s APIs.
  • Feedback and Control: Users receive updates on task progress and can intervene or adjust objectives as needed.
  • Output Consumption: Once tasks are completed, users engage with the results - organized files, generated reports, or synthesized research findings.

So Instead of simply equipping large language models (LLMs) with tools, the OS itself becomes a tool-rich environment specifically designed for agents. This architecture transforms the relationship between software, operating systems, and AI, creating a seamless bridge where agents can efficiently perform complex tasks while remaining secure, scalable, and adaptable to future advancements.

The Human-AI Interface: A New UI Paradigm

The user interface in this AI-centric world shifts from direct manipulation to intent expression and result consumption. Human interaction focuses on expressing goals or objectives, which the AI interprets and executes. Importantly, achieving a single human objective often requires multiple interdependent tasks, such as performing data analysis in Excel, creating a presentation in PowerPoint, and sharing the final output via email or communication platforms like Teams or Slack. However, AI agents are not constrained by the need to open or interact with these applications as humans do. In many cases, the AI doesn’t even require the software to be installed. Instead, it performs the necessary actions behind the scenes, generating the final outputs - such as .xlsx, .csv, .pptx, or PDF files—directly. By bypassing traditional application workflows, AI agents streamline the process, delivering results efficiently without the overhead of navigating human-centric software interfaces.

The actual human interface could be reduced down:

  • Command/Voice Interface: A simple input field or voice interface where users state their objectives.
  • Progress Feedback: Dashboards or notifications that keep users informed about task status.
  • Result Display: Outputs are presented in human-friendly formats - documents, visualizations, or actionable summaries.

Benefits of Designing Software for AI Agents

  • Efficiency: Eliminates the overhead of GUI navigation by AI agents.
  • Reliability: Reduces errors caused by UI changes or unexpected elements.
  • Scalability: AI agents can perform tasks faster and handle larger volumes of data.
  • User Empowerment: Users focus on defining goals rather than executing steps, enhancing productivity.

Embracing the Paradigm Shift

As AI becomes more integrated into our daily workflows, rethinking software design to accommodate AI agents is not just logical - it’s inevitable. This doesn't need to happen all at once. At first, there could be software with dual uses - human interfaces and AI data/functionality access. But by building software whereby AI communicates directly with data resources and functionality through APIs and an OS designed for AI use, we unlock the full potential of intelligent agents, streamline processes, and create a more efficient partnership between humans and AI agents.

It’s time to start thinking about moving beyond teaching AI to use our tools and start building software and an operating system designed for AI.



Elements of Monte Carlo Tree Search - Typical and Non-typical Applications

Monte Carlo Tree Search (MCTS) offers a very intuitive way of tackling challenging decision making problems. In essence, MCTS combines the...