Singular Value Decompositions


Learning Objectives

Singular Value Decomposition

An \(m \times n\) real matrix \({\bf A}\) has a singular value decomposition of the form

where

where and \(\sigma_1 \ge \sigma_2 \dots \ge \sigma_s \ge 0\) are the square roots of the eigenvalues values of \({\bf A}^T {\bf A}\). The diagonal entries are called the singular values of \({\bf A}\).

Note that if , then and both have the same eigenvalues:

(left-multiply both sides by )

Time Complexity

The time-complexity for computing the SVD factorization of an arbitrary \(m \times n\) matrix is proportional to , where the constant of proportionality ranges from 4 to 10 (or more) depending on the algorithm.

In general, we can define the cost as:

Reduced SVD

The SVD factorization of a non-square matrix \({\bf A}\) of size \(m \times n\) can be represented in a reduced format:


The following figure depicts the reduced SVD factorization (in red) against the full SVD factorizations (in gray).

In general, we will represent the reduced SVD as:

where is a matrix, is a matrix, is a matrix, and .

Example: Computing the SVD

We begin with the following non-square matrix, \({\bf A}\)

\[ {\bf A} = \left[ \begin{array}{ccc} 3 & 2 & 3 \\ 8 & 8 & 2 \\ 8 & 7 & 4 \\ 1 & 8 & 7 \\ 6 & 4 & 7 \\ \end{array} \right] \]

and we will compute the reduced form of the SVD (where here ):

(1) Compute \({\bf A}^T {\bf A}\):

(2) Compute the eigenvectors and eigenvalues of \({\bf A}^T {\bf A}\):

(3) Construct \({\bf V}_R\) from the eigenvectors of \({\bf A}^T {\bf A}\):

\[ {\bf V}_R = \left[ \begin{array}{ccc} 0.585051 & -0.710399 & 0.391212 \\ 0.652648 & 0.126068 & -0.747098 \\ 0.481418 & 0.692415 & 0.537398 \\ \end{array} \right]. \]

(4) Construct \({\bf \Sigma}_R\) from the square roots of the eigenvalues of \({\bf A}^T {\bf A}\):

\[ {\bf \Sigma}_R = \begin{bmatrix} 20.916 & 0 & 0 \\ 0 & 6.53207 & 0 \\ 0 & 0 & 4.22807 \end{bmatrix} \]

(5) Find \({\bf U}\) by solving \({\bf U}{\bf\Sigma} = {\bf A}{\bf V}\). For our reduced case, we can find \({\bf U}_R = {\bf A}{\bf V}_R {\bf \Sigma}_R^{-1}\). You could also find \({\bf U}\) by computing the eigenvectors of \({\bf AA}^T\).

We obtain the following singular value decomposition for \({\bf A}\):

\[ \overbrace{\left[ \begin{array}{ccc} 3 & 2 & 3 \\ 8 & 8 & 2 \\ 8 & 7 & 4 \\ 1 & 8 & 7 \\ 6 & 4 & 7 \\ \end{array} \right]}^{A} = \overbrace{\left[ \begin{array}{ccc} 0.215371 & 0.030348 & 0.305490 \\ 0.519432 & -0.503779 & -0.419173 \\ 0.534262 & -0.311021 & 0.011730 \\ 0.438715 & 0.787878 & -0.431352\\ 0.453759 & 0.166729 & 0.738082\\ \end{array} \right]}^{U} \overbrace{\left[ \begin{array}{ccc} 20.916 & 0 & 0 \\ 0 & 6.53207 & 0 \\ 0 & 0 & 4.22807 \\ \end{array} \right]}^{\Sigma} \overbrace{\left[ \begin{array}{ccc} 0.585051 & 0.652648 & 0.481418 \\ -0.710399 & 0.126068 & 0.692415\\ 0.391212 & -0.747098 & 0.537398\\ \end{array} \right]}^{V^T} \]

Recall that we computed the reduced SVD factorization (i.e. \({\bf \Sigma}\) is square, \({\bf U}\) is non-square) here.

Rank, null space and range of a matrix

Suppose is a matrix where (without loss of generality):

We can re-write the above as:

Furthermore, the product of two matrices can be written as a sum of outer products:

For a general rectangular matrix, we have:

where .

If has non-zero singular values, the matrix is full rank, i.e. .

If has non-zero singular values, and , the matrix is rank deficient, i.e. .

In other words, the rank of equals the number of non-zero singular values which is the same as the number of non-zero diagonal elements in .

Rounding errors may lead to small but non-zero singular values in a rank deficient matrix. Singular values that are smaller than a given tolerance are assumed to be numerically equivalent to zero, defining what is sometimes called the effective rank.

The right-singular vectors (columns of ) corresponding to vanishing singular values of span the null space of , i.e. null() = span{, , …, }.

The left-singular vectors (columns of ) corresponding to the non-zero singular values of span the range of , i.e. range() = span{, , …, }.

Example:

The rank of is 2.

The vectors and provide an orthonormal basis for the range of .

The vector provides an orthonormal basis for the null space of .

(Moore-Penrose) Pseudoinverse

If the matrix is rank deficient, we cannot get its inverse. We define instead the pseudoinverse:

For a general non-square matrix \({\bf A}\) with known SVD (\({\bf A} = {\bf U\Sigma V}^T\)), the pseudoinverse is defined as:

For example, if we consider a full rank matrix where :

Euclidean norm of matrices

The induced 2-norm of a matrix can be obtained using the SVD of the matrix :

And hence,

In the above equations, all the notations for the norm refer to the Euclidean norm, and we used the fact that and are orthogonal matrices and hence .

Example:

We begin with the following non-square matrix :

The matrix of singular values, \({\bf \Sigma}\), computed from the SVD factorization is:

\[ \Sigma = \left[ \begin{array}{ccc} 20.916 & 0 & 0 \\ 0 & 6.53207 & 0 \\ 0 & 0 & 4.22807 \\ \end{array} \right]. \]

Consequently the 2-norm of \({\bf A}\) is

\[ \|{\bf A}\|_2 = 20.916.\]

Euclidean norm of the inverse of matrices

Following the same derivation as above, we can show that for a full rank matrix we have:

where is the smallest singular value.

For non-square matrices, we can use the definition of the pseudoinverse (regardless of the rank):

where is the smallest non-zero singular value. Note that for a full rank square matrix, we have . An exception of the definition above is the zero matrix. In this case,

2-Norm Condition Number

The 2-norm condition number of a matrix \({\bf A}\) is given by the ratio of its largest singular value to its smallest singular value:

If the matrix is rank deficient, i.e. , then .

Low-rank Approximation

The best rank- approximation for a matrix , where , for some matrix norm , is one that minimizes the following problem:

Under the induced \(2\)-norm, the best rank-\(k\) approximation is given by the sum of the first \(k\) outer products of the left and right singular vectors scaled by the corresponding singular value (where, \(\sigma_1 \ge \dots \ge \sigma_s\)):

Observe that the norm of the difference between the best approximation and the matrix under the induced \(2\)-norm condition is the magnitude of the \((k+1)^\text{th}\) singular value of the matrix:

Note that the best rank- approximation to can be stored efficiently by only storing the singular values , the left singular vectors , and the right singular vectors .

The figure below show best rank-\(k\) approximations of an image (you can find the code snippet that generates these images in the IPython notebook):

Review Questions

ChangeLog