Sunday, December 8, 2024

Eigenvalues and Eigenvectors and their Applications

In linear algebra, eigenvalues and eigenvectors are essential concepts in machine learning and artificial intelligence - and are also important in applications in physics, engineering, and much more. They often seem abstract at first, but we can build up an intuition with examples and practical applications.

What are Eigenvalues and Eigenvectors?

 

Suppose we have a square matrix \( A \), which represents some kind of transformation (e.g., rotation, scaling, or shear). When this transformation is applied to a vector \( \mathbf{v} \), sometimes the vector changes direction, and sometimes it doesn’t. When a vector doesn’t change direction (it might still stretch or shrink), that vector is called an eigenvector. The amount by which the vector is stretched or shrunk is called the eigenvalue.
  • Eigenvector \( \mathbf{v} \): A vector that doesn’t change direction when a transformation is applied.
  • Eigenvalue \( \lambda \): A scalar that tells how much the eigenvector is stretched or shrunk.

Mathematically, this is written as:

\[ A \mathbf{v} = \lambda \mathbf{v} \]

Here:
  • A is the transformation matrix.
  • \( \mathbf{v} \) is the eigenvector.
  • \( \lambda \) is the eigenvalue.

A Simple Example: Stretching Along an Axis:


Imagine a transformation matrix: \(A = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix} \) This matrix scales vectors along the x-axis by 3 and along the y-axis by 2. If we apply this transformation to the vector \( \mathbf{v} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \) , we get:

\( A \mathbf{v} = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 3 \\ 0 \end{bmatrix} \)

Here, \( \mathbf{v} \) doesn’t change direction; it’s simply scaled by 3.
Thus:
  • \( \mathbf{v} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \) is an eigenvector.
  • \( \lambda = 3 \) is the eigenvalue.
Similarly, the vector \( \mathbf{w} = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \) we get:

\( A \mathbf{w} = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 2 \end{bmatrix} \)

And here again, \( \mathbf{w} \) doesn’t change direction; it’s simply scaled by 2.
Thus,
  • \( \mathbf{w} = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \) is another eigenvector.
  • \( \lambda = 2 \) is the eigenvalue.

Real World Applications


Before we get more into the math, let's talk about why eigenvalues and eigenvectors are important. Eigenvalues and eigenvectors are more than just mathematical abstractions; they play critical roles in real-world applications. Here are some practical examples:
  1. Machine Learning and Principal Component Analysis (PCA)
    In PCA, we compute the covariance matrix of a dataset and find its eigenvalues and eigenvectors. The eigenvectors represent the principal axes of the data, while the eigenvalues indicate the amount of variance explained along each axis. By selecting the top eigenvectors (those with the largest eigenvalues), we can reduce the dataset’s dimensionality while retaining most of its variance.
  2. Graph Theory
    In network analysis, the eigenvalues and eigenvectors of the adjacency matrix of a graph help us understand its structure. For instance, the largest eigenvalue of the adjacency matrix can indicate the network’s connectivity. Another example is that eigenvectors are used in Google’s PageRank algorithm to rank webpages based on their importance.
  3. Quantum Mechanics
    In quantum mechanics, eigenvalues correspond to measurable quantities like energy levels of a system. The Schrödinger equation involves finding eigenvalues and eigenvectors of the Hamiltonian operator, which gives the possible energy states of a particle.
  4. Image Compression
    In image processing, Singular Value Decomposition (SVD), which is a related concept, relies on eigenvalues and eigenvectors. By keeping the top eigenvalues and their corresponding eigenvectors, we can approximate an image, significantly reducing storage requirements without much loss of quality.

How to Compute Eigenvalues and Eigenvectors:


Alright. Back to the math! How do we actually calculate eigenvalues and eigenvectors?

The determinant plays a critical role in identifying eigenvalues, which leads us to their corresponding eigenvectors.

1. Find Eigenvalues: \( \lambda \):
The eigenvalues of a matrix A are found by solving the characteristic equation utilizing the determinant:

\( \text{det}(A - \lambda I) = 0 \)

And here, \( I \) is the identity matrix \( \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \).

To find \( \lambda \), we solve \( \text{det}(A - \lambda I) = 0 \), which expands into a polynomial equation in \( \lambda \) (called the characteristic polynomial). The roots of this polynomial are the eigenvalues.

Example:

Let:

\( A = \begin{bmatrix} 4 & 2 \\ 1 & 3 \end{bmatrix} \)

The characteristic equation is:

\( \text{det}(A - \lambda I) = \text{det} \begin{bmatrix} 4-\lambda & 2 \\ 1 & 3-\lambda \end{bmatrix} = 0 \)

Expand the determinant:

\( \text{det} = (4-\lambda)(3-\lambda) - (2)(1) \)

\( \text{det} = \lambda^2 - 7\lambda + 10 = 0 \)

Solving this quadratic equation, we get:

\( \lambda = 5, \quad \lambda = 2 \)

Thus, the eigenvalues are \( \lambda = 5 \) and \( \lambda = 2 \).

2. Find Eigenvectors \( \mathbf{v} \):
Once the eigenvalues \( ( \lambda ) \) are identified, we find the corresponding eigenvectors \( ( \mathbf{v} ) \):

For each eigenvalue \( \lambda \), we solve: \( (A - \lambda I) \mathbf{v} = 0 \)

The solution to this equation is a non-zero vector \( \mathbf{v} \).

Example:

For \( A \) and \( \lambda = 5 \) from above:

\( A - 5I = \begin{bmatrix} 4-5 & 2 \\ 1 & 3-5 \end{bmatrix} = \begin{bmatrix} -1 & 2 \\ 1 & -2 \end{bmatrix} \)

Solve:

\( \begin{bmatrix} -1 & 2 \\ 1 & -2 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \)

This gives:

\( -1x + 2y = 0 \quad \implies \quad y = \frac{1}{2}x \)

Let \( x = 2 \) (arbitrary scaling), then:

\( \mathbf{v_1} = \begin{bmatrix} 2 \\ 1 \end{bmatrix} \)

For \( A \) and \( \lambda = 2 \) from above:

\( A - 2I = \begin{bmatrix} 4-2 & 2 \\ 1 & 3-2 \end{bmatrix} = \begin{bmatrix} 2 & 2 \\ 1 & 1 \end{bmatrix} \)

Solved similarly to find:

\( \mathbf{v_2} = \begin{bmatrix} -1 \\ 1 \end{bmatrix} \)

What does this mean geometrically - and practically?


Given the matrix \( A \) from before:

\( A = \begin{bmatrix} 4 & 2 \\ 1 & 3 \end{bmatrix} \)

This matrix represents a linear transformation in 2D space. When applied to a vector, \( A \) performs a combination of scaling, rotation, and/or shearing.

The eigenvalues and eigenvectors from earlier calculations for matrix \( A \):

  • Eigenvalues: \( \lambda_1 = 5 , \lambda_2 = 2 \)
  • Eigenvectors:
    • Corresponding to \( \lambda_1 = 5 : \mathbf{v}_1 = \begin{bmatrix} 2 \\ 1 \end{bmatrix} \)
    • Corresponding to \( \lambda_2 = 2 : \mathbf{v}_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix} \)
What These Values Mean
    1. Eigenvalue \( \lambda_1 = 5 \) and Eigenvector \( \mathbf{v}_1 = \begin{bmatrix} 2 \\ 1 \end{bmatrix} \)
    • Geometrically: The eigenvector \( \mathbf{v}_1 = \begin{bmatrix} 2 \\ 1 \end{bmatrix} \) lies along a specific direction in 2D space. When the transformation \( A \) is applied to any vector along this direction, the vector is stretched by a factor of 5, but its direction remains unchanged.
    • Practically: This tells us that there is a “preferred direction” in the transformation where the stretching is maximized by a factor of 5. Vectors aligned with \( \mathbf{v}_1 \) are scaled significantly, making this direction dominant in how \( A \) affects space.
    2. Eigenvalue \( \lambda_2 = 2 \) and Eigenvector \( \mathbf{v}_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix} \)
    • Geometrically: The eigenvector \( \mathbf{v}_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix} \) points in another specific direction in 2D space. Vectors along this direction are scaled by a factor of 2 when the transformation \( A \) is applied, with no change in direction.
    • Practically: This direction represents a less dominant feature of the transformation, where stretching occurs to a lesser extent (factor of 2). It shows that the transformation affects different directions differently.
Visually we can imagine a grid of arrows (vectors) in 2D space. Applying the transformation \( A \):
  • Along \( \mathbf{v}_1 \) (eigenvector for \( \lambda_1 = 5 ) \), vectors stretch significantly, growing by a factor of 5.
  • Along \( \mathbf{v}_2 \) (eigenvector for \( \lambda_2 = 2 ) \), vectors stretch less, growing by a factor of 2.
  • For vectors not aligned with \( \mathbf{v}_1 \) or \( \mathbf{v}_2 \), the transformation involves a combination of scaling and changing direction. These vectors can be expressed as combinations of the eigenvectors, showing how any arbitrary vector is affected.

The eigenvalues and eigenvectors of \( A \) provide insights into its transformation behavior. The eigenvectors \( \mathbf{v}_1 \) and \( \mathbf{v}_2 \) represent the preferred directions in 2D space that remain unchanged in orientation under the transformation. The eigenvalues \( \lambda_1 = 5 \) and \( \lambda_2 = 2 \) indicate how much vectors along these directions are stretched or compressed, with \( \mathbf{v}_1 \) experiencing a larger scaling factor.

Furthermore, any vector in space can be expressed as a combination of the eigenvectors, allowing us to decompose the transformation 
\( A \) into its fundamental actions - scaling along these specific directions - providing a complete and intuitive understanding of how \( A \) reshapes space.

Eigenvalues and Eigenvectors in PyTorch


Okay, now that we understand the math, how can we do it programmically in PyTorch?

import torch
# Define the matrix A
A = torch.tensor([[4.0, 2.0], [1.0, 3.0]])

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = torch.linalg.eig(A)

# Display the matrix
print("Matrix A:")
print(A)

# Display eigenvalues
print("\nEigenvalues:")
print(eigenvalues)

# Display eigenvectors
print("\nEigenvectors:")
print(eigenvectors)

# Verify the eigenvalue equation A * v = λ * v for the first eigenvalue and eigenvector
v1 = eigenvectors[:, 0] # First eigenvector
lambda1 = eigenvalues[0] # First eigenvalue

verification = torch.matmul(A, v1) # A * v
expected = lambda1 * v1 # λ * v

print("\nVerification for the first eigenvalue and eigenvector:")
print("A * v1 =", verification)
print("λ1 * v1 =", expected)


The output will look like this:

Matrix A:
tensor([[4., 2.], [1., 3.]])

Eigenvalues:
tensor([5.+0.j, 2.+0.j])

Eigenvectors:
tensor([[ 0.8944, -0.7071], [ 0.4472, 0.7071]])

Verification for the first eigenvalue and eigenvector:
A * v1 = tensor([4.4721+0.j, 2.2361+0.j])
λ1 * v1 = tensor([4.4721+0.j, 2.2361+0.j])


Now if you have been following along closely, you may be wondering why the eigenvalues from PyTorch of 5 and 2 match what we calcuated manually, but the eigenvectors \( \mathbf{v}_1 = \begin{bmatrix} 2 \\ 1 \end{bmatrix} \) and \( \mathbf{v}_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix} \) do not match \( \mathbf{v}_1 = \begin{bmatrix} 0.8944 \\ 0.4472 \end{bmatrix} \) and \( \mathbf{v}_2 = \begin{bmatrix} -0.7071 \\ 0.7071 \end{bmatrix} \).

This is because when solving the eigenvector equation \( (A - \lambda I)\mathbf{v} = 0 \), we calculate the eigenvectors up to any scalar multiple. For example, if [2, 1] is a solution, then [4, 2] or even [0.2, 0.1] are also valid eigenvectors because eigenvectors are defined up to scaling. And if you look above when we were calculating eigenvectors, we let \( x = 2 \) (arbitrary scaling).

PyTorch automatically normalizes the eigenvectors so that their length (or magnitude) is 1. This is achieved by dividing each eigenvector by its norm:

\( \|\mathbf{v}\| = \sqrt{x_1^2 + x_2^2} \)

For example:

• For the manually calculated eigenvector [2, 1]:

\( \|\mathbf{v}_1\| = \sqrt{2^2 + 1^2} = \sqrt{5} \)

Normalizing it:

\( \mathbf{v}_1 = \left[\frac{2}{\sqrt{5}}, \frac{1}{\sqrt{5}}\right] = [0.8944, 0.4472] \)

• Similarly, for [-1, 1]:

\( \|\mathbf{v}_2\| = \sqrt{(-1)^2 + 1^2} = \sqrt{2} \)

Normalizing it:

\( \mathbf{v}_2 = \left[\frac{-1}{\sqrt{2}}, \frac{1}{\sqrt{2}}\right] = [-0.7071, 0.7071] \)

Both the manually calculated eigenvectors ([2, 1] and [-1, 1]) and PyTorch’s normalized eigenvectors ([0.8944, 0.4472] and [-0.7071, 0.7071]) are equally valid. They represent the same direction in space, and the eigenvalues (5 and 2) remain unchanged. The difference is only due to normalization. This ensures a standardized representation of eigenvectors, which is particularly useful in programming and numerical computations. Many applications, such as Principal Component Analysis (PCA), assume normalized eigenvectors for interpretation or computation.

Summary

Eigenvalues and eigenvectors provide a great way to understand transformations in linear algebra. They appear in diverse fields, from machine learning and physics to structural engineering and image processing. We can use these tools to solve complex problems and gain insights into the behavior of systems.



No comments:

Post a Comment

Elements of Monte Carlo Tree Search - Typical and Non-typical Applications

Monte Carlo Tree Search (MCTS) offers a very intuitive way of tackling challenging decision making problems. In essence, MCTS combines the...