Thursday, February 29, 2024

Rank and Single Value Decomposition in Machine Learning, AI, and Beyond

The rank of a matrix is a fundamental concept in linear algebra with significant applications across various fields, including statistics, engineering, computer science, economics, and found in a variety of machine learning and artificial intelligence topics.

Understanding the rank of a matrix gives us insight into the properties of the system it represents. The concept of rank in the context of Singular Value Decomposition (SVD) is deeply interconnected and plays a crucial role in understanding the structure and properties of a matrix. SVD is a method in linear algebra that decomposes a matrix into three other matrices, revealing its intrinsic geometric and algebraic properties. Let’s break down what rank is first and then what SVD is and how rank fits into the picture, using an analogy to make it more digestible.

To understand rank, imagine you're the coach of a basketball team, and you're trying to figure out how many of your players are truly essential. In this analogy, your basketball team is like a matrix, where each player has their own set of skills (shooting, passing, defense, etc.), represented as numbers in the matrix. The "rank" of your matrix (team) tells you the smallest number of players (rows or columns) you need to keep the team's overall skill set intact.

If the rank of your matrix is high (close to the total number of players), it means every player brings something unique to the game, like a player who's the only one who can dunk. But, if the rank is low, it's like having several players who only know how to do layups; they're kind of redundant, and you probably aren't going to win many games with players that have the same skill sets.

To keep it simple, let's say you are in a 3 on 3 league and with your team of 3 players you're looking at their skill stats in a matrix:

1) Player 1 can score 10 points, make 2 assists, and grab 5 rebounds.

2) Player 2 can score 20 points, make 4 assists, and grab 10 rebounds.

3) Player 3 can score 5 points, make 1 assist, and grab 2.5 rebounds.

Here is their matrix:

[[10    2     5]

[20    4     10]

[ 5    1    2.5]

Okay, okay I know that in real basketball they don't award a half of a rebound, but let's say it's possible in our league so the 2.5 is valid. But notice that Player 3 is doing a quarter of what Player 2 does and Player 1 is doing half of what Player 2 does. In matrix terms, Player 3's and Player 1's skills are not adding anything new to the team; they're like a "linear combination" of Player 2 (or, in simpler terms, Player 1's and Player 3's abilities are just a scaled-down version of Player 2's). The rank of this matrix is 1. A matrix is like counting how many players on your basketball team have unique skills that are essential for the team. A high rank means everyone's a star with their own uniqueness.

In this example with a rank of 1, all of the players' stats are essentially pointing in the same direction. In simpler terms, you could think of every player's performance as being scaled up or down from a single set of stats. This indicates that, despite appearances, there's essentially only one set of unique skills across all players, and everyone else is just varying levels of that skillset.

It's like having a team where every player is a clone of the original in terms of skills, just at different magnitudes. Only one player's unique skill set is truly needed to define the entire team's capabilities.

If we were to do this in Python numpy:

import numpy as np

matrix = np.array([

[10, 2, 5],

[20, 4, 10],

[5, 1, 2.5]

])

rank = np.linalg.matrix_rank(matrix)

We can also see the rank by reducing the matrix to Reduced Row Echelon Form (RREF). Using a series of row swaps, row additions, and row multiplications. RREF requires each leading entry in a row to be 1 through these row operations. We will do these row operations until:

  • - Each non-zero row starts with a leading 1, and these leading 1s shift to the right as you move down the rows.
  • - The columns containing leading 1s have zeros in all other positions.
  • - Any rows consisting entirely of zeros are at the bottom of the matrix.

Lets start by normalizing the first row by dividing by 10 we get:

[[1    .2     .5]

20    4     10]

5     1    2.5]]

Using appropriate multiples of the first row, subtract from the other rows to achieve zeros below this leading 1, i.e.:

2nd row = 2nd row - (20 * 1st row)

3rd row = 3rd row - (5 * 1st row)

After reducing the matrix we obtain:

[[1    1/5  1/2]

 [0     0    0 ]

 [0     0    0 ]]

Only the first row is non-zero, which means only one row is linearly independent. The other rows are zeros, indicating they do not add any new information or dimension to the space spanned by the matrix. The rank of a matrix is the number of linearly independent rows. In this case, since we only have one linearly independent row, the rank of the matrix is 1.

Now let's talk about how rank fits into the concept of Single Value Decomposition (SVD).

Let's use another analogy, and imagine you have a complex robot (your matrix) that can move in many different ways. The rank of a matrix is essentially the count of its non-zero singular values. In our robot example, it’s the number of unique, independent movements the robot can make. If our robot has a lot of movements but many are just variations of others (e.g., moving its arm up a little and up a lot), then the true "rank" is only the count of distinct, fundamental movements.

SVD is like taking this robot apart to understand what makes it work by finding three main components (matrices) that can be combined to replicate all the movements (transformations) the robot can do. You apply SVD to "decompose" the robot into these three main components (matrices called U, Σ, V*). This process is akin to figuring out:

    • U: How the robot sees and plans its movements in its own perspective (the space of features).
    • Σ: The strengths of these movements (singular values), ranking them from most to least impactful. This tells you how important or effective each movement is in achieving the robot’s tasks.
    • V*: How these movements translate into the real world (the space of observations).

After we have these three decomposed matrices, one thing we can do is Principle Components Analysis which is one of the key techniques in feature reduction in machine learning. Principal Component Analysis (PCA) is like analyzing the robot's movements to find the most effective ways it can move that capture the essence of its abilities. You’re looking for a few key movements (principal components) that allow the robot to perform most of its tasks efficiently without needing to use all possible movements.

You can then focus on the strongest movements (rank and singular values) by examining the singular values (Σ), you identify which movements (principal components) are actually significant. The highest singular values represent the robot's most distinct and powerful movements. In PCA, these correspond to the directions in which the data varies the most.

Then you can reconstruct the robot's abilities (dimensionality reduction). Instead of allowing the robot to move in all the original directions (which can be inefficient and redundant), you reconstruct its abilities based on only the most significant movements identified by SVD. This streamlined version of the robot can perform almost as well in its tasks but with less effort and complexity. In data terms, PCA reduces the dimensionality of your dataset while retaining the essence of the information, based on the principal components identified by SVD.

In PCA, using SVD is like breaking down the robot's complex movements into fundamental components, identifying the most significant ones, and reconstructing its capabilities in a more efficient manner. This streamlined approach not only simplifies analysis but also uncovers the most meaningful patterns in the data, analogous to understanding the higher core capabilities of our hypothetical robot.

In machine learning, PCA is often used for feature reduction in that it can reduce the number of variables in your data by transforming them into a new set of variables, called principal components, which are ordered so that the first few retain most of the variation present in all of the original variables. It's like summarizing a long book into a few key sentences that still retains the meaning of the story.

Other uses of PCA in machine learning include noise reduction, visualization, and improving model performance.

Other Applications of SVD

Beside PCA other applications of SVDs in machine learning are:

Data Compression: Just like finding out our robot doesn't need to move in hundreds of ways but only a few, SVD allows us to compress data by keeping only the most significant singular values (movements) and discarding the rest. This is used in image compression, where you keep the essence of the image but reduce its size.

Noise Reduction: Imagine our robot is moving in a noisy environment, and some of its movements are just reactions to this noise. By identifying and keeping only the significant movements (high singular values), we can filter out the noise, similar to how SVD can help remove noise from data, enhancing signal clarity.

Large Language Model Fine Tuning: One of the more recent usages of SVDs is in fine tuning large language models through a method called Low-Rank Adaptation (LoRA). This technique is aimed at reducing the computational complexity and memory requirements of these models. In neural networks, especially with large pre-trained models, the weights matrix can be extremely large, making the model cumbersome to fine-tune and deploy. By applying SVD, one can approximate these weight matrices with lower-rank matrices that significantly reduce the size without substantially compromising the model's performance.

Summary:

Matrix rank and Singular Value Decomposition (SVD) provides the underpinnings of linear algebra and its applications across a wide spectrum of disciplines, from statistics and engineering to computer science and economics, extending into the realms of machine learning and artificial intelligence. Hopefully, the analogy of a basketball team and robots serves to demystify these concepts. By deconstructing matrices into their singular values and vectors, SVD provides a robust framework for data compression, noise reduction, and the extraction of meaningful features, thereby enhancing the interpretability and efficiency of data analysis and model training processes.

Sunday, February 11, 2024

Build a Chat Application with Ollama and Open Source Models

Creating a chat application that is both easy to build and versatile enough to integrate with open source large language models or proprietary systems from giants like OpenAI or Google is a very worthwhile venture. Such flexibility allows developers to choose the best model for their needs, whether prioritizing privacy and security by running models locally or leveraging the advanced capabilities of cloud-based services.

Running open source language models can enhance data security, as sensitive information would never leave the organization, offering a compelling advantage in scenarios where confidentiality is paramount.

Additionally, this approach can lead to substantial cost savings by eliminating the need for purchasing tokens for every interaction, a common expense when using external APIs. Because using propriety models can get expensive - especially in test mode. So it would be great if an engineer could build out the model and test it with an open source large language model and then just by changing a couple of lines of code switch to either a different open source LLM or to a proprietary model.

Using open source models democratizes access to cutting-edge AI technologies but also empower developers and businesses of all sizes to create more personalized and secure communication tools without breaking the bank. Although this has been possible for many months, it has gotten easier and easier.

With the recent release from Ollama, I will show that this can be done with just a few steps and in less than 75 lines of Python code and have a chat application running as a deployable Streamlit application.  

But first, what is Ollama?

Ollama offers a straightforward and user-friendly platform for operating large language models, right now catering especially to MacOS and Linux users, with plans to extend support to Windows in the near future. It accommodates a wide variety of models, such as Lama 2, CodeLlama, Phi, Mixtral, etc. but the one I'll be using in this example is Mistral 7B. It is a relatively simple setup process. (If you have Windows and don't want to wait for Ollama to be available, you can use LM Studio.) Ollama creates a server endpoint that you can use in your application.

Chat Application:

Step 1:

Download and install Ollama.

https://ollama.com/

This is a straightforward installation process and will place an icon on the menu bar that Ollama is available.

Step 2:

Download an LLM model.

In this example, we will be using Mistral 7b. To download the model run this command in the terminal:

ollama pull mistral

The ollama pull command downloads the model. If you want a different model, such as Llama you would type llama2 instead of mistral in the ollama pull command. A full list of available models can be found here.

Step 3:

Run the LLM model Mistral.

To run Mistral 7b type this command in the terminal.

ollama run mistral

This will start Mistral at the endpoint of http://localhost:11434/v1. You could start chatting with the endpoint right in the terminal, but what we are going to do is use this endpoint in our chat application.

Step 4:

Build the chat application.

The newest version of Ollama that came out last week includes OpenAI compatibility as explained here. So if you don't have the latest version go ahead and download it. With OpenAI compatibility we get a standardized output regardless of what LLM model we are using. Before this release, I used LiteLLM to get this compatibility, but this is no longer necessary.

All of the code for this application can get be found at this github repository.

But let's walk through the major parts of the code.

In order to initialize the client and get the OpenAI compatibility, we create a base URL from the Ollama endpoint. For api_key, we put 'ollama', but this could be anything since there's no API key. If we were using the OpenAI API, we would put our API key here.

client = openai.OpenAI(

      base_url = 'http://localhost:11434/v1',

      api_key='ollama', # api_key is required, but unused for local models

  )

We also need to keep track of the conversation and send it to the LLM with each API call, so that the LLM remembers what was said before, i.e. we are sending not just the current prompt but the history of preceding prompts and responses. Since this is Streamlit, I'm using st.session_state.

I append both the prompt question and the response answer:

    q = {

      "role": "assistant",

      "content": response.choices[0].message.content

    }

    st.session_state.message_list.append(q)

Because of how Streamlist rewrites the UI every time the user makes an input change, I loop through everything in session_state and write it to the chat control:

prompt = st.chat_input("Ask a question")
  if prompt:
    
    with st.spinner('Thinking...'):
            
      answer = conversation.message(prompt)
            
      for l in st.session_state.message_list:
        
        if l['role'] == 'user':
          with st.chat_message("user"):
            st.write(l['content'])
        elif l['role'] == 'assistant':
          with st.chat_message("assistant"):
            st.write(l['content'])

Here is the full code:

import openai
import streamlit as st

if 'message_list' not in st.session_state:
st.session_state.message_list = [
{"role": "system", "content": "You are a helpful assistant."}
]

class Conversaion:

client = openai.OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='ollama', # api_key is required, but unused for local models
)
def __init__(self):
pass
def message(self, question):
q = {
"role": "user",
"content": question
}
st.session_state.message_list.append(q)
response = self.client.chat.completions.create(
model="mistral",
messages=st.session_state.message_list
)
q = {
"role": "assistant",
"content": response.choices[0].message.content
}
st.session_state.message_list.append(q)
return response.choices[0].message.content

if __name__ == "__main__":
st.title('Chatbot')

message = st.chat_message("assistant")
message.write("Hello human!")

conversation = Conversaion()
prompt = st.chat_input("Ask a question")
if prompt:
with st.spinner('Thinking...'):
answer = conversation.message(prompt)
for l in st.session_state.message_list:
if l['role'] == 'user':
with st.chat_message("user"):
st.write(l['content'])
elif l['role'] == 'assistant':
with st.chat_message("assistant"):
st.write(l['content'])


Step 5:

Run the application

Before you run the application it is best to create a conda environment. Let’s call it chat:

conda create -n chat python=3.11

conda activate chat

You will also need to install dependencies:

pip install openai

pip install streamlit

Now you can run the streamlit application:

streamlit run app.py

Let's look at the UI and output.

As you can see, it answers the question and then in the followup question, it knows that the question of, "Could you explain more?" refers to the preceding conversation.


And that's it! In a few easy steps and less than 75 lines of code, we now have a chat application running that is using a local open source LLM.


Elements of Monte Carlo Tree Search - Typical and Non-typical Applications

Monte Carlo Tree Search (MCTS) offers a very intuitive way of tackling challenging decision making problems. In essence, MCTS combines the...