Loading [MathJax]/jax/output/CommonHTML/jax.js

Eigenvalues and Eigenvectors


Learning Objectives

Eigenvalues and Eigenvectors

An eigenvalue of an n×n matrix A is a scalar λ such that Ax=λx for some non-zero vector x. The eigenvalue λ can be any real or complex scalar, (which we write λR or λC). Eigenvalues can be complex even if all the entries of the matrix A are real. In this case, the corresponding vector x must have complex-valued components (which we write xCn). The equation Ax=λx is called the eigenvalue equation and any such non-zero vector x is called an eigenvector of A corresponding to λ.

The eigenvalue equation can be rearranged to (AλI)x=0, and because x is not zero this has solutions if and only if λ is a solution of the characteristic equation:

det(AλI)=0.

The expression p(λ)=det(AλI) is called the characteristic polynomial and is a polynomial of degree n.

Although all eigenvalues can be found by solving the characteristic equation, there is no general, closed-form analytical solution for the roots of polynomials of degree n5 and this is not a good numerical approach for finding eigenvalues.

Unless otherwise specified, we write eigenvalues ordered by magnitude, so that

|λ1||λ2||λn|,

and we normalize eigenvectors, so that x=1.

Eigenvalues of a Shifted Matrix

Given a matrix A, for any constant scalar σ, we define the shifted matrix is AσI. If λ is an eigenvalue of A with eigenvector x then λσ is an eigenvalue of the shifted matrix with the same eigenvector. This can be derived by

(AσI)x=AxσIx=λxσx=(λσ)x.

Eigenvalues of an Inverse

An invertible matrix cannot have an eigenvalue equal to zero. Furthermore, the eigenvalues of the inverse matrix are equal to the inverse of the eigenvalues of the original matrix:

Ax=λxA1Ax=λA1xx=λA1xA1x=1λx.

Eigenvalues of a Shifted Inverse

Similarly, we can describe the eigenvalues for shifted inverse matrices as:

(AσI)1x=1λσx.

It is important to note here, that the eigenvectors remain unchanged for shifted or/and inverted matrices.

Diagonalizability

An n×n matrix with n linearly independent eigenvectors can be expressed as its eigenvalues and eigenvectors as:

AX=[||λ1x1λnxn||]=[||x1xn||][λ1λn]=XD

The eigenvector matrix can be inverted to obtain the following similarity transformation of A:

AX=XDA=XDX1X1AX=D

Multiplying the matrix A by X1 on the left and X on the right transforms it into a diagonal matrix; it has been ‘‘diagonalized’’.

Example: Matrix that is diagonalizable

A n×n matrix is diagonalizable if and only if it has n linearly independent eigenvectors. For example:

X1[1/61/31/61/201/21/31/31/3]A[110101011]X[111201111]=D[100010002]

Example: Matrix that is not diagonalizable

A matrix A with linearly dependent eigenvectors is not diagonalizable. For example, while it is true that

A[1101]X[1100]=X[1100]D[1001],

the matrix X does not have an inverse, so we cannot diagonalize A by applying an inverse. In fact, for any non-singular matrix P, the product P1AP is not diagonal.

Expressing an Arbitrary Vector as a Linear Combination of Eigenvectors

If an n×n matrix A is diagonalizable, then we can write an arbitrary vector as a linear combination of the eigenvectors of A. Let u1,u2,,un be n linearly independent eigenvectors of A; then an arbitrary vector x0 can be written:

x0=α1u1+α2u2++αnun.

If we apply the matrix A to x0:

Ax0=α1Au1+α2Au2++αnAun,=α1λ1u1+α2λ2u2++αnλnun,=λ1(α1u1+α2λ2λ1u2++αnλnλ1un).

If we repeatedly apply A we have

Akx0=λk1(α1u1+α2(λ2λ1)ku2++αn(λnλ1)kun).

In the case where one eigenvalue has magnitude that is strictly greater than all the others, i.e.

|λ1|>|λ2||λ3||λn|,

this implies

limkAkx0λk1=α1u1.

This observation motivates the algorithm known as power iteration, which is the topic of the next section.

Power Iteration algorithm

For a matrix A, power iteration will find a scalar multiple of an eigenvector u1, corresponding to the dominant eigenvalue (largest in magnitude) λ1, provided that |λ1| is strictly greater than the magnitude of the other eigenvalues, i.e., |λ1|>|λ2||λn|.

Suppose

x0=α1u1+α2u2+αnun, with α10.

From the previous section, the iterative sequence

xk=Axk1 for k=1,2,3,

satisfies

xk=Akx0limkxkλk1=α1u1.

Thus, for large k, xkλk1α1u1. Unfortunately, this mean that xk|λ1|kα1u1, which will be very large if |λ1|>1, or very small if |λ1|<1. For this reason, we use normalized power iteration.

Normalized power iteration, is given by the following. Let x0 be a vector with unit norm: x0=1 (any norm is fine), with x0=α1u1+α2u2+αnun, and α10.

Normalized power iteration is defined by the following iterative sequence for k=1,2,3,:

yk=Axk1xk=ykyk

where the norm is identical to the norm used when we assumed x0=1.

It can be shown that this sequence satisfies

xk=Akx0Akx0.

This means that for large values of k, we have

xk(λ1|λ1|)kα1u1α1u1.

The largest eigenvalue could be positive, negative, or a complex number. In each case we will have:

λ1>0xkα1u1α1u1xk convergesλ1<0xk(1)kα1u1α1u1in the limit, xk alternates between ±α1u1α1u1λ1=reiθxkeikθα1u1α1u1in the limit, xk is a scalar multiple of u1 with coefficient that rotates around the unit circle.

Strictly speaking, normalized power iteration only converges to a single vector if λ1>0, but xk will be close to a scalar multiple of the eigenvector u1 for large values of k, regardless of whether the dominant eigenvalue is positive, negative, or complex. So normalized power iteration will work for any value of λ1, as long as it is strictly bigger in magnitude than the other eigenvalues.

Power Iteration code

The following code snippet performs power iteration:

import numpy as np
def power_iter(A, x_0, p):
  # A: nxn matrix, x_0: initial guess, p: type of norm
  x_0 = x_0/np.linalg.norm(x_0,p)
  x_k = x_0
  for i in range(max_iterations):
    y_k = A @ x_k
    x_k = y_k/np.linalg.norm(y_k,p)
  return x_k

Example: Two Steps of Power Iteration

We’ll use normalized power iteration (with the infinity norm) to approximate an eigenvector of the following matrix: A=[1211], and the following initial guess: x0=[10]

First Iteration:

y1=Ax0=[1211][10]=[11],x1=y1y1=y1=[11].

Second Iteration:

y2=Ax1=[1211][11]=[32],x2=y2y2=13y2=[123]=[10.6666].

Even after only two iterations, we are getting close to a corresponding eigenvector:

u1=[112][10.7071]

with relative error about 4 percent when measured in the infinity norm.

Computing Eigenvalues from Eigenvectors

Power iteration allows us to find an approximate eigenvector corresponding to the largest eigenvalue in magnitude. How can we compute the actual eigenvalue from this? If λ is an eigenvalue of A, with corresponding eigenvector u, then we can compute the value of λ using the Rayleigh Quotient:

λ=uTAuuTu.

Thus, one can compute an approximate eigenvalue using the approximate eigenvector found during power iteration.

Power Iteration and Floating-Point Arithmetic

Recall that we made the assumption that the initial guess satisfies

x0=α1u1+α2u2+αnun, with α10.

What happens if we choose an initial guess where α1=0? If we further assume that |λ2|>|λ3||λ4||λn|, then in theory

Akx0=λk2(α2u2+α3(λ3λ2)ku3++αn(λnλ2)kun),

and we would expect that

limkAkx0λk2=α2u2.

In practice, this does not happen. For one thing, choosing an initial guess such that α1=0 is extremely unlikely if we have no prior knowledge about the eigenvector u1. Since power iteration is performed numerically, using finite precision arithmetic, we will encounter the presence of rounding error in every iteration. This means that at every iteration k, including k=0, we will instead have

Akˆx0=λk1(ˆα1u1+ˆα2(λ2λ1)ku2++ˆαn(λnλ1)kun),

where the ˆαk are the approximate expansion coefficients of the rounded result. Even if α1=0, the finite precision representation ˆx0, will very likely have expansion coefficient ˆα10. Even in the case where rounding the initial guess does not introduce a non-zero ˆα1, rounding after applying the matrix A will almost certainly introduce a non-zero component in the dominant eigenvector after enough iterations. The probability of coming up with a starting guess x0 such that ˆα1=0 for all iterations is very, very low, if not impossible.

Power Iteration without a Dominant Eigenvalue

Above, we assumed that one eigenvalue had magnitude strictly larger than all the others: |λ1|>|λ2||λ3||λn|. What happens if |λ1|=|λ2|?

If λ1=λ2=λR, then:

xk=Akx0α1λku1+α2λku2=λk(α1u1+α2u2),

hence

limkλkAkx0=α1u1+α2u2.

The quantity α1u1+α2u2 is still an eigenvector corresponding to λ, so power iteration will still approach a dominant eigenvector.

If the dominant eigenvalues have opposite sign, i.e., λ1=λ2=λR, then

xk=Akx0α1λku1+α2(λ)ku2=λk(α1u1+(1)kα2u2).

For large k, we will have λkAx0α1u1+(1)kα2u2, which although is a linear combination of two eigenvectors, is not itself an eigenvector of A.

Finally, if the two dominant eigenvalues are a complex-conjugate pair λ1=reiθ, λ2=reiθ, then xk=Akx0α1λku1+α2(¯λ)ku2=λk(α1u1+(¯λλ)kα2u2)=λk(α1u1+α2ei2kθu2).

For large k, λkAx0 approximate a linear combination of two eigenvectors, but this linear combination will not itself be an eigenvector.

Inverse Iteration

To obtain an eigenvector corresponding to the smallest eigenvalue λn of a non-singular matrix, we can apply power iteration to A1. The following recurrence relationship describes inverse iteration algorithm: xk+1=A1xkA1xk,

Inverse Iteration with Shift

To obtain an eigenvector corresponding to the eigenvalue closest to some value σ, A can be shifted by σ and inverted in order to solve it similarly to the power iteration algorithm. The following recurrence relationship describes inverse iteration algorithm: xk+1=(AσI)1xk(AσI)1xk. Note that this is identical to inverse iteration if the shift is zero.

Rayleigh Quotient Iteration

The shift σ can be updated based on a current estimate of the eigenvalue in order to improve convergence rate. Such an estimate can be found using the Rayleigh Quotient. Rayleigh Quotient Iteration is given by the following recurrence relation:

σk=xTkAxkxTkxk xk+1=(AσkI)1xk(AσkI)1xk.

Convergence properties

The convergence rate for power iteration is linear and the recurrence relationship for the error between the current iterate and a dominant eigenvector is given by: ek+1|λ2||λ1|ek The convergence rate for (shifted) inverse iteration is also linear, but now depends on the two closest eigenvalues to the shift σ. (Standard inverse iteration corresponds to a shift σ=0. The recurrence relationship for the errors is given by: ek+1|λclosestσ||λsecond-closestσ|ek

Orthogonal Matrices

Square matrices are called orthogonal if and only if the columns are mutually orthogonal to one another and have a norm of 1 (such a set of vectors are formally known as a orthonormal set), i.e.: cTicj=0 ij,ci=1 iAO(n), or ci,cj={0if ij,1if i=jAO(n), where O(n) is the set of all n×n orthogonal matrices called the orthogonal group, ci, i=1,,n, are the columns of A, and , is the inner product operator. Orthogonal matrices have many desirable properties: ATO(n)ATA=AAT=IA1=ATdetA=±1κ2(A)=1

Gram-Schmidt

The algorithm to construct an orthogonal basis from a set of linearly independent vectors is called the Gram-Schmidt process. For a basis set {x1,x2,xn}, we can form a orthogonal set {v1,v2,vn} given by the following transformation: v1=x1,v2=x2v1,x2v12v1v3=x3v1,x3v12v1v2,x3v22v2=vn=xnn1i=1vi,xnvi2vi, where , is the inner product operator. Each of the vectors in the orthogonal set can be normalized independently to obtain a orthonormal basis.

Review Questions

  1. What is the definition of an eigenvalue/eigenvector pair?
  2. If v is an eigenvector of A, what can we say about cv for any nonzero scalar c?
  3. What is the relationship between the eigenvalues of A and the eigenvalues of 1) cA for some scalar c, 2) (AσI) for some scalar σ, and 3) A1?
  4. What is the relationship between the eigenvectors of A and the eigenvectors of 1) cA for some scalar c, 2) (AσI) for some scalar σ, and 3) A1?
  5. Be able to run a few steps of normalized power iteration.
  6. To what eigenvector of A does power iteration converge?
  7. To what eigenvector of A does inverse power iteration converge?
  8. To what eigenvector of A does inverse power iteration with a shift converge?
  9. Describe the cost of inverse iteration.
  10. Describe the cost of inverse iteration if we are given an LU-factorization of (AσI).
  11. When can power iteration (or normalized power iteration) fail?
  12. How can we approximate an eigenvalue of A given an approximate eigenvector?
  13. What happens if we do not normalize our iterates during power iteration?
  14. What is the Rayleigh quotient?
  15. What happens to the result of power iteration if the initial guess does not have any components of the dominant eigenvector? Does this depend on whether we are using finite or infinite precision?
  16. What is the convergence rate of power iteration?
  17. How does the convergence of power iteration depend on the eigenvalues?
  18. How can we find eigenvalues of a matrix other than the dominant eigenvalue?
  19. What does it mean for a matrix to be diagonalizable?
  20. Are all matrices diagonalizable?

ChangeLog