Back to References


Eigenvalues and Eigenvectors


Learning Objectives

Nothing here yet.

Eigenvalues and Eigenvectors

An eigenvalue of an \(n \times n\) matrix \(A\) is a real or complex scalar \(\lambda\) such that \[ A \boldsymbol{x} = \lambda \boldsymbol{x} \] for some nonzero vector \(\boldsymbol{x} \in \mathbb{R}^n\). This equation is called the eigenvalue equation and any such vector \(\boldsymbol{x}\) is called an eigenvector of \(A\) corresponding to \(\lambda\).

The eigenvalue equation can be rearranged to \((A - \lambda I) \boldsymbol{x} = 0\) and because \(\boldsymbol{x}\) is not zero this has solutions if and only if \(\lambda\) is a solution of the characteristic equation: \[ \operatorname{det}(A - \lambda I) = 0. \] The expression \(p(\lambda) = \operatorname{det}(A - \lambda I)\) is called the characteristic polynomial and is a polynomial of degree at most \(n\).

Although all eigenvalues can be found by solving the characteristic equation, there is no closed-form analytical solution for \(n \ge 5\) and this is not a good numerical approach for finding eigenvalues.

Unless otherwise specified we write eigenvalues ordered by magnitude, so that \[ |\lambda_1| > |\lambda_2| > \cdots > |\lambda_n|, \] and we normalize eigenvectors, so that \(\|\boldsymbol{x}\| = 1\).

Eigenvalues of a Shifted Matrix

Given a matrix \(A\), the shifted matrix is \(A - \sigma I\) for some constant \(\sigma\). If \(\lambda\) is an eigenvalue of \(A\) with eigenvector \(\boldsymbol{x}\) then \(\lambda - \sigma\) is an eigenvalue of the shifted matrix with the same eigenvector. This can be derived by \[ \begin{aligned} (A - \sigma I) \boldsymbol{x} &= A \boldsymbol{x} - \sigma I \boldsymbol{x} \\ &= \lambda \boldsymbol{x} - \sigma \boldsymbol{x} \\ &= (\lambda - \sigma) \boldsymbol{x}. \end{aligned} \]

Eigenvalues of an Inverse

For invertible matrices, the eigenvalues are defined as the inverse of the eigenvalues of the original matrix: \[ A \boldsymbol{x} = \lambda \boldsymbol{x}, \\ A^{-1} A \boldsymbol{x} = \lambda A^{-1} \boldsymbol{x}, \\ \boldsymbol{x} = \lambda A^{-1} \boldsymbol{x}, \\ A^{-1} \boldsymbol{x} = \frac{1}{\lambda} \boldsymbol{x}.\]

Eigenvalues of a Shifted Inverse

Similarly, we can describe the eigenvalues for shifted inverse matrices as: \[ (A - \sigma I)^{-1} \boldsymbol{x} = \frac{1}{\lambda - \sigma} \boldsymbol{x}.\] It is important to note here, that the eigenvectors remain unchanged for shifted or/and inverted matrices.

Arbitary Vector as a Linear Combination of Eigenvectors

We can write a vector as a linear combination of the eigenvectors of the matrix \(A\): \[ \boldsymbol{y} = \alpha_1 \boldsymbol{x}_1 + \dots + \alpha_n \boldsymbol{x}_n. \] If we apply the matrix \(A\) \[ \begin{align} A\boldsymbol{y} &= \alpha_1 A\boldsymbol{x}_1 + \dots + \alpha_n A\boldsymbol{x}_n, \\ &= \alpha_1 \lambda_1 \boldsymbol{x}_1 + \dots + \alpha_n \lambda_n \boldsymbol{x}_n, \\ &= \lambda_1 \left(\alpha_1 \boldsymbol{x}_1 + \dots + \alpha_n \frac{\lambda_n}{\lambda_1} \boldsymbol{x}_n\right). \\ \end{align}\] On repeatedly applying \(A\) we have \[ A^k\boldsymbol{y}= \lambda_1^k \left(\alpha_1 \boldsymbol{x}_1 + \dots + \alpha_n \left(\frac{\lambda_n}{\lambda_1}\right)^k \boldsymbol{x}_n\right), \] and if \(|\lambda_1| > |\lambda_2|,\dots,|\lambda_n|\), this implies \[ \lim_{k\to\infty}\lambda_1^{-k} A^k \boldsymbol{y} = \alpha_1 \boldsymbol{x}_1.\]

Diagonalizability

An \(n \times n\) matrix with \(n\) linearly independent eigenvectors can be expressed as its eigenvalues and eigenvectors as: \[ AX = \begin{bmatrix} \vert & & \vert \\ \lambda_1 \boldsymbol{x}_1 & \dots & \lambda_n \boldsymbol{x}_n \\ \vert & & \vert \end{bmatrix} = \begin{bmatrix} \vert & & \vert \\ \boldsymbol{x}_1 & \dots & \boldsymbol{x}_n \\ \vert & & \vert \end{bmatrix} \begin{bmatrix}\lambda_1 & & \\ & \ddots & \\ & & \lambda_n \end{bmatrix}\] The eigenvector matrix can be inverted to obtain the following similarity transformation of \(A\): \[AX = XD \iff A = XDX^{-1} \]

Example: Matrix that is diagonalizable

A \(n \times n\) matrix is diagonalizable if and only if it has \(n\) linearly independent eigenvectors. For example: \[ \overbrace{\begin{bmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix}}^{A} \overbrace{\begin{bmatrix} 1 & -1 & 1 \\ -2 & 0 & 1 \\ 1 & 1 & 1 \end{bmatrix}}^{X} = \overbrace{\begin{bmatrix} 1 & -1 & 1 \\ -2 & 0 & 1 \\ 1 & 1 & 1 \end{bmatrix}}^{X} \overbrace{\begin{bmatrix} -1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{bmatrix}}^{D}. \\ \overbrace{\begin{bmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix}}^{A} = \overbrace{\begin{bmatrix} 1 & -1 & 1 \\ -2 & 0 & 1 \\ 1 & 1 & 1 \end{bmatrix}}^{X} \overbrace{\begin{bmatrix} -1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{bmatrix}}^{D} \overbrace{\begin{bmatrix} 1/6 & -1/3 & 1/6 \\ -1/2 & 0 & 1/2 \\ 1/3 & 1/3 & 1/3 \end{bmatrix}}^{X^{-1}}. \]

Example: Matrix that is not diagonalizable

If we construct a matrix with dependent eigenvectors, it is easy to show that it is not diagonalizable. For example: \[ \overbrace{\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}}^{A} \overbrace{\begin{bmatrix} 1 & 1 \\ 0 & 0 \end{bmatrix}}^{X} \neq \overbrace{\begin{bmatrix} 1 & 1 \\ 0 & 0 \end{bmatrix}}^{X} \overbrace{\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}}^{D}. \]

Power Iteration algorithm

For a full rank matrix \(A\), power iteration will find the eigenvector \(\boldsymbol{x}_1\), corresponding to the dominant eigenvalue (largest in magnitude) \(\lambda_1\), provided that \(|\lambda_1|\) is strictly greater than the magnitude of the other eigenvalues (i.e., \(|\lambda_1| > |\lambda_2| \ge \dots \ge |\lambda_n|\)). It is given by the following recurrence relationship: \[ y_{k+1} = \frac{Ay_k}{\|Ay_k\|},\] where \(y_0 \in \textrm{span}(A) \neq 0\) is the initial guess to the algorithm which must have some component in the direction of \(\boldsymbol{x}_1\).

Power Iteration code

The following code snippet performs power iteration:

import numpy as np
def power_iter(A, y_0):
  y_k = y_0
  for i in range(max_iterations):
    y_k = A @ y_k
    y_k = y_k/np.linalg.norm(y_k)
  return y_k

Power Iteration two steps example

Let us solve the power iteration for the estimating the eigenvector of the following matrix: \[ A = \begin{bmatrix} 1 & -2 \\ -1 & 1 \end{bmatrix}, \] and the following initial guess: \[ y = \begin{bmatrix} -1 & 0 \end{bmatrix}^\top \] First iteration: \[ y_1 = Ay_0 = \begin{bmatrix} 1 & -2 \\ -1 & 1 \end{bmatrix} \begin{bmatrix} -1 \\ 0 \end{bmatrix} = \begin{bmatrix} -1 \\ 1 \end{bmatrix} \\ y_1 = \frac{y_1}{\|y_1\|} = \begin{bmatrix} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}^\top \]

Second iteration: \[ y_1 = Ay_0 = \begin{bmatrix} 1 & -2 \\ -1 & 1 \end{bmatrix} \begin{bmatrix} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}}\end{bmatrix} = \begin{bmatrix} -\frac{3}{\sqrt{2}} \\ \frac{2}{\sqrt{2}} \end{bmatrix} \\ y_1 = \frac{y_1}{\|y_1\|} = \begin{bmatrix} -\frac{3}{\sqrt{13}} & \frac{2}{\sqrt{13}} \end{bmatrix}^\top = \begin{bmatrix} -0.8321 & 0.5547 \end{bmatrix}^\top \]

This is numerically close to the dominant eigenvector \(\boldsymbol{x}_1 = \begin{bmatrix} -0.8165 & 0.5774 \end{bmatrix}^T\)

Rayleigh Quotient

We can estimate the eigenvalue associated with the eigenvector derived from the power iteration algorithm by: \[ \lambda = \frac{\boldsymbol{x}^\top A \boldsymbol{x}}{\boldsymbol{x}^\top\boldsymbol{x}} \]

Inverse Iteration

To obtain an eigenvector corresponding to the smallest eigenvalue \(\sigma\), \(A\) can be inverted in order to solve it similarly to the power iteration algorithm. The following recurrence relationship describes inverse iteration algorithm: \[\boldsymbol{x}_{k+1} = \frac{A^{-1} \boldsymbol{x}_k}{\|A^{-1} \boldsymbol{x}_k\|}\]

Inverse Iteration with Shift

To obtain an eigenvector corresponding to the eigenvalue closest to some value \(\sigma\), \(A\) can be shifted close to the desired eigenvalue by \(\sigma\) and inverted in order to solve it similarly to the power iteration algorithm. The following recurrence relationship describes inverse iteration algorithm: \[\boldsymbol{x}_{k+1} = \frac{(A - \sigma I)^{-1} \boldsymbol{x}_k}{\|(A - \sigma I)^{-1} \boldsymbol{x}_k\|}\]. Note that this is identical to inverse iteration if the shift is zero.

Rayleigh Quotient Iteration

The shift \(\sigma\) can be updated based on the current estimate of the eigenvalue in order to improve convergence rate, which is given by the following recurrence relationship: \[ \sigma_k = \frac{\boldsymbol{x}_k^\top A \boldsymbol{x}_k}{\boldsymbol{x}_k^\top\boldsymbol{x}_k} \\ \boldsymbol{x}_{k+1} = \frac{(A - \sigma_k I)^{-1} \boldsymbol{x}_k}{\|(A - \sigma_k I)^{-1} \boldsymbol{x}_k\|}\]

Convergence properties

The convergence rate for power iteration is linear and the recurrence relationship for the errors is given by: \[ e_{k+1} \approx \frac{|\lambda_2|}{|\lambda_1|}e_k\] Similarly, convergence rate for inverse iteration is linear and the recurrence relationship for the errors is given by: \[ e_{k+1} \approx \frac{|\lambda_{closest} - \sigma|}{|\lambda_{second-closest} - \sigma|}e_k\]

Orthogonal Matrices

Square matrices are called orthogonal if and only if the columns are mutually orthogonal to one another and have a norm of \(1\) (such a set of vectors are formally known as a orthonormal set), i.e.: \[\boldsymbol{c}_i^T \boldsymbol{c}_j = 0 \quad \forall \ i \neq j, \quad \|\boldsymbol{c}_i\| = 1 \quad \forall \ i \iff A \in \mathcal{O}(n),\] or \[ \langle\boldsymbol{c}_i,\boldsymbol{c}_j \rangle = \begin{cases} 0 \quad \mathrm{if} \ i \neq j, \\ 1 \quad \mathrm{if} \ i = j \end{cases} \iff A \in \mathcal{O}(n),\] where \(\mathcal{O}(n)\) is the set of all \(n \times n\) orthogonal matrices called the orthogonal group, \(\boldsymbol{c}_i\), \(i=1, \dots, n\), are the columns of \(A\), and \(\langle \cdot, \cdot \rangle\) is the inner product operator. Orthogonal matrices have many desirable properties: \[ A^\top \in \mathcal{O}(n) \\ A^\top A = A A^\top = I \implies A^{-1} = A^\top \\ \det{A} = \pm 1 \\ \kappa_2(A) = 1 \]

Gram-Schmidt

The algorithm to construct an orthogonal basis from a set of linearly independent vectors is called the Gram-Schmidt process. For a basis set \(\{x_1, x_2, \dots x_n\}\), we can form a orthogonal set \(\{v_1, v_2, \dots v_n\}\) given by the following transformation: \[\begin{align} \boldsymbol{v}_1 &= \boldsymbol{x}_1, \\ \boldsymbol{v}_2 &= \boldsymbol{x}_2 - \frac{\langle\boldsymbol{v}_1,\boldsymbol{x}_2\rangle}{\|\boldsymbol{v}_1\|^2}\boldsymbol{v}_1\\ \boldsymbol{v}_3 &= \boldsymbol{x}_3 - \frac{\langle\boldsymbol{v}_1,\boldsymbol{x}_3\rangle}{\|\boldsymbol{v}_1\|^2}\boldsymbol{v}_1 - \frac{\langle\boldsymbol{v}_2,\boldsymbol{x}_3\rangle}{\|\boldsymbol{v}_2\|^2}\boldsymbol{v}_2\\ \vdots &= \vdots\\ \boldsymbol{v}_n &= \boldsymbol{x}_n - \sum_{i=1}^{n-1}\frac{\langle\boldsymbol{v}_i,\boldsymbol{x}_n\rangle}{\|\boldsymbol{v}_i\|^2}\boldsymbol{v}_i, \end{align}\] where \(\langle \cdot, \cdot \rangle\) is the inner product operator. Each of the vectors in the orthogonal set can be normalized independently to obtain a orthonormal basis.

Review Questions

  1. What is the definition of an eigenvalue/eigenvector pair?
  2. If \(\mathbf{v}\) is an eigenvector of \(A\), what can we say about \(c\mathbf{v}\) for any nonzero scalar \(c\)?
  3. What is the relationship between the eigenvalues of \(A\) and the eigenvalues of 1) \(cA\) for some scalar \(c\), 2) \((A - \sigma I)\) for some scalar \(\sigma\), and 3) \(A^{-1}\)?
  4. What is the relationship between the eigenvectors of \(A\) and the eigenvectors of 1) \(cA\) for some scalar \(c\), 2) \((A - \sigma I)\) for some scalar \(\sigma\), and 3) \(A^{-1}\)?
  5. Be able to run a few steps of normalized power method.
  6. To what eigenvector of \(A\) does power method converge?
  7. To what eigenvector of \(A\) does inverse power method converge?
  8. To what eigenvector of \(A\) does inverse power method with a shift converge?
  9. Describe the cost of inverse iteration.
  10. Describe the cost of inverse iteration if we are given an LUP factorization of \((A - \sigma I)\).
  11. When can power method (or normalized power method) fail?
  12. How can we approximate an eigenvalue of \(A\) given an approximate eigenvector?
  13. What happens if we do not normalize our iterates in power method?
  14. What is the Rayleigh quotient?
  15. What happens to the result of power method if the initial guess does not have any components of the dominant eigenvector? Does this depend on whether we are using finite or infinite precision?
  16. What is the convergence rate of power method?
  17. How does the convergence of power method depend on the eigenvalues?
  18. How can we find eigenvalues of a matrix other than the dominant eigenvalue?
  19. What does it mean for a matrix to be diagonalizable?
  20. Are all matrices diagonalizable?
  21. What is an orthogonal matrix?

ChangeLog