# Singular Value Decomposition

This is the first in a short series of posts on an application of matrix theory to the mechanics of continuous media.  I’m planning my Vector and Tensor Analysis course for the winter quarter, and wanted to include a discussion of stress in elastic media as an application of tensors interesting to the engineering students.  Identifying the prerequisite material needed for this topic and to what extent I can expect the students to have been exposed to it through our curriculum has been an interesting exercise.  The physical interpretation of the stress tensor (which I will define in a later post) comes through the real polar decomposition of a matrix.  The real polar decomposition can be deduced from the singular value decomposition which can be deduced from the real spectral theorem.  Here’s how that last step goes.

Theorem. (Singular Value Decomposition)  Every $m\times n$ real matrix $A$ can be factored as $A=USV^T$ where $U$ is an $m\times m$ orthogonal matrix, $S$ is an $m\times n$ matrix with non-negative real numbers on the diagonal (called the singular values of $A$) and $V$ is an $n\times n$ orthogonal matrix.

With an understanding of the spectral theorem and how to interpret matrix multiplication one can actually derive a procedure for computing this factorization.  The important things to remember are:

• if $M$ is an $m\times n$ matrix then $M^TM$ is a symmetric $n\times n$ matrix and $MM^T$ is a symmetric $m\times m$ matrix because of the two properties of the transpose that $(CD)^T=D^TC^T$ and $(C^T)^T=C$;
• what it means for a matrix $W$ to be orthogonal is that $W$ is square and $W^TW=I=WW^T$, i.e., $W$ is invertible and $W^T=W^{-1}$;
• the condition $W^TW=I$ also says that the columns of $W$ are orthonormal vectors since the $(i,j)$-entry of the product $W^TW$ is precisely the dot product of row $i$ of $W^T$ (i.e., column $i$ of $W$) with column $j$ of $W$;
• if $D=[d_{ij}]$ is a diagonal matrix of size $m\times m$ then $DM$ is the result of multiplying the rows of $M$ by the corresponding diagonal entries of $D$, while if $D$ is diagonal of size $n\times n$ then $MD$ is the result of multiplying the columns of $M$ by the corresponding entries of $D$.

Let’s suppose for a moment that $A=USV^T$ with $U$, $S$, and $V$ as in the statement of the theorem.  Then

$A^TA=(USV^T)^T(USV^T)=VS^TU^TUSV^T=VS^TSV^T\Rightarrow A^TAV=VS^TS$

and

$AA^T=(USV^T)(USV^T)^T=USV^TVS^TU^T=USS^TU^T\Rightarrow AA^TU=USS^T$

by the orthogonality of $U$ and $V$.  Due to the structure of $S$, the matrix $S^TS$ is an $n\times n$ diagonal matrix whose first $\ell=\min\{m,n\}$ diagonal entries are the squares of the $\ell$ diagonal entries of $S$.  Likewise, the matrix $SS^T$ is an $m\times m$ matrix whose first $\ell$ diagonal entries are the squares of the $\ell$ diagonal entries of $S$ (in the same order).  The matrix equation $(A^TA)V=V(S^TS)$ then says  that the columns of $V$ are eigenvectors of $A^TA$.  Similarly, the equation $(AA^T)U=U(SS^T)$ then says that the columns of $U$ are eigenvectors of $AA^T$.

Fortunately, the real spectral theorem guarantees that the eigenvectors of the symmetric matrices $A^TA$ and $AA^T$ are orthogonal, and thus can be chosen to be orthonormal.  If $u$ is an normalized eigenvector of $AA^T$ with associated eigenvalue $\lambda$ (i.e., $AA^Tu=\lambda u$) then $\lambda=\lambda u^Tu=u^T\lambda u=u^T(AA^Tu)=(A^Tu)^T(Au)\ge 0$ being the dot product of a vector with itself.  Similarly, the eigenvalues of $A^A$ are non-negative.  So it is certainly possible that the non-negative diagonal entries of $SS^T$ and $S^TS$ are the eigenvalues of $AA^T$ and $A^TA$, respectively.  But, it must be the case that the $i^{th}$ eigenvalue of $AA^T$ is equal to the $i^{th}$ eigenvalue of $A^TA$ for $1\le i\le \ell$ in order for our factorization to work since both must be equal to the square of the $i^{th}$ diagonal entry of $S$.  This is where a clever observation saves the day.  The matrix $A$ converts eigenvectors of $A^TA$ into eigenvectors of $AA^T$!  Indeed, if $v$ is an eigenvector of $A^TA$ with eigenvalue $\lambda$, then

$AA^T(Av)=A(A^TAv)=A\lambda v=\lambda (Av).$

Hence, if $Av\not=0$ then it is an eigenvalue of $AA^T$ with the same eigenvalue $\lambda$!  Of course, if $Av=0$ but $v$ is non-zero (being an eigenvector of $A^TA$) then $v\in\ker A$ and so the associated eigenvalue of $A^TA$ is zero.  Likewise, if $u$ is an eigenvector of $AA^T$ and $A^Tu$ is non-zero then $A^Tu$ is an eigenvector of $A^TA$ with the same eigenvalue.  It follows that at most $\ell$ of the eigenvalues of $AA^T$ and $A^TA$ are nonzero and, in fact, $\ell$ of them are the same.  The square roots of these $\ell$ joint eigenvalues of $A^TA$ and $AA^T$ are called the singular values of $A$.

The upshot is that we can compute a set of normalized eigenvectors for $AA^T$ and order them in descending order of their associated eigenvalues (so that those in the kernel occur at the end) and use them to form the columns of the orthogonal matrix $U$.  We can compute a set of normalized eigenvectors for $A^TA$ and order them in descending order of their associated eigenvalues to form the columns of the orthogonal matrix $V$.  Then, if $S$ is the $m\times n$ matrix whose non-diagonal entries are zero but with diagonal entries equal to the singular values of $A$ (also listed in descending order) we are guaranteed that $A=USV^T$.  This constructs the singular value decomposition of $A$.

Posted on July 20, 2012, in Matrix Theory and tagged , , , . Bookmark the permalink. 1 Comment.

• ### Comments (1)

1. Jeff Diller

A quick couple of things you probably already know:

1) there are some java demos out there illustrating svd used to do image compression. Nice eye-candy!

2) also, svd gives a great way to visualize linear transformations from R^n to R^n–i.e. the vectors in some priveleged o.n. basis are first scaled individually (the ones scaled by 0 span the kernel) and then the resulting orthogonal set is mapped isometrically into the target (i.e. by rotation/reflection if target = source)

Finally, a couple of my honors alg students last year took on the project of programming svd from scratch. The business of finding evals/evecs of a largish symmetric matrix was pretty enlightening.

-Jeff