Singular value decomposition

One of the most important matrix decompositions in data science and machine learning.

Eigenvalues are great for understanding $A^{n}$ . Does it grow? Does it shrink? How can it be easily modeled?

But it says nothing about the size of $A x$ . How is $A$ transforming $x$ and rescaling the vector?

Because $A$ is a matrix, multiple scales are involved when applying $A$ . These scales can be represented systematically using the singular value decomposition.

A = U Σ V^{T}

$A$ : $m \times n$ .
$U$ : orthogonal, $m \times m$ . Left singular vectors
$V$ : orthogonal, $n \times n$ . Right singular vectors
$Σ$ : $m \times n$ , diagonal matrix with real positive entries = pure scaling = singular values.

$A v_{i} = σ_{i} u_{i}$

Consider a ball $B$ in $R^{n}$ . $A$ transforms this ball into an ellipsoid.

$A x = U Σ V^{T} x$ :
- $V^{T} x$ : point on the unit ball
- $Σ (V^{T} x)$ : point on an ellipsoid; the axes are aligned with the coordinate axes.
- $U (Σ V^{T} x)$ : rotate/reflect the ellipsoid.

The lengths of the axes of this ellipsoid are the singular values of $A$ .

As we can expect, the size of a matrix can be related to its singular values:

∥ A ∥_{2} = σ_{1} (A), ∥ A ∥_{F} = i = 1 \sum p σ_{i}^{2}

We can also define a new operator norm using the SVD, the Schatten $p$ -norm:

∥ A ∥_{p}^{Schatten} = ∥ (σ_{1}, \dots, σ_{r}) ∥_{p}

where $r$ is the rank and $∥ ∥_{p}$ is the vector $p$ -norm.

The four fundamental spaces. Assume $A$ is $m \times n$ . $r$ : number of non-zero singular values = rank of the matrix. Then:

N (A) R (A) N (A^{T}) R (A^{T}) = {v_{r + 1}, \dots, v_{n}} = {u_{1}, \dots, u_{r}} = {u_{r + 1}, \dots, u_{m}} = R (A)^{⊥} = {v_{1}, \dots, v_{r}} = N (A)^{⊥}

We recover the four fundamental spaces and the rank-nullity theorem.

Connection with eigenvalues. The eigenvalues of $A A^{T}$ and $A^{T} A$ are equal to $σ_{1}^{2},$ …, $σ_{r}^{2},$ or 0. The eigenvectors of $A A^{T}$ are given by $U$ , and those of $A^{T} A$ by $V$ :

A A^{T} = U Σ^{2} U^{T}, A^{T} A = V Σ^{2} V^{T}

The computational cost of computing the singular value decomposition is $O (n^{3})$ .

Proof of the existence of the SVD. Multiple proofs are possible. Let’s look at one of them. Define the matrix

B = [0 A^{T} A 0]

This matrix is symmetric and therefore is unitarily diagonalizable:

B = Q Λ Q^{T}

We can restrict this factorization such that $Λ$ has only non-zero entries on the diagonal. Assume that $[x; y]$ is an eigenvector of $B$ associated with $λ \neq = 0$ . Then

B [x - y] = [0 A^{T} A 0] [x - y] = [- A y A^{T} x] = - λ [x - y]

So $- λ$ is also an eigenvalue. Since the eigenvectors must be orthogonal we have

x^{T} x - y^{T} y = 0

Let’s normalize our eigenvector such that

x^{T} x + y^{T} y = 2

This implies that $∥ x ∥_{2} = ∥ y ∥_{2} = 1$ .

Note that since $A y = λ x$ and $A^{T} x = λ y$ , we have

A^{T} A y = λ^{2} y, A A^{T} x = λ^{2} x

So $λ^{2}$ is an eigenvalue of $A^{T} A$ and $A A^{T}$ .

Denote by $X$ all the eigenvectors of $A A^{T}$ and by $Y$ those of $A^{T} A$ (keeping only the non-zero eigenvalues). We have shown that

Q = \frac{1}{2} [X Y X - Y], Λ = [Σ 0 0 - Σ]

Using

[0 A^{T} A 0] = Q Λ Q^{T}

we find that

A = X Σ Y^{T}

This is the SVD of $A$ .

$□$

The SVD is unique if and only if all the singular values are distinct (up to a sign change of columns in $U$ and $V$ ).

The four fundamental spaces, Eigenvalues, Operator and matrix norms, Orthogonal matrix and projector

📓 CME 302

Explorer

Singular value decomposition

Graph View

Backlinks