SVD for Rank-Deficient Least-Squares

SVD for Rank-Deficient Least-Squares#

In our previous discussions, we assumed \(A\) had full column rank. But what happens if it doesn’t?

If rank\((A) = r < n\), the matrix \(A\) is rank-deficient.
This implies \(A\) has a non-trivial null space, \(N(A)\).
If \(x_p\) is a solution to \(\text{argmin}_x \|Ax - b\|_2\), then \(x_p + z\) is also a solution for any vector \(z \in N(A)\), because \(A(x_p + z) - b = (Ax_p - b) + Az = (Ax_p - b)\).
This means the least-squares solution is not unique.

Our previous methods fail. For example, in the QR factorization \(A=QR\), the upper-triangular matrix \(R\) will have zeros on its diagonal, making it singular and not invertible.

The Minimum Norm Solution#

When the solution is not unique, we must add a second criterion to select one of the many possible solutions. The standard choice is the minimum norm solution, which is the unique vector \(x^*\) that satisfies two conditions:

It solves the LS problem: \(x^* = \text{argmin}_x \|Ax - b\|_2\).
It has the minimum 2-norm: \(\|x^*\|_2 \le \|x\|_2\) for any other LS solution \(x\).

This second condition is equivalent to requiring the solution to be orthogonal to the null space: \(x^* \perp N(A)\). This unique solution can be found robustly using the Singular Value Decomposition (SVD).

Proof. We prove the fact that the minimum-norm least-squares solution \(x^*\) is orthogonal to the null space, \(x^* \perp N(A)\).

Let \(S\) be the set of all vectors \(x\) that solve the least-squares problem. We are looking for a unique vector \(x^* \in S\) such that \(\|x^*\|_2\) is minimized.

Decompose any LS solution: Let \(x_p\) be any solution to the least-squares problem. As we’ve established, the set of all solutions \(S\) can be written as:

\[ S = \{ x_p + z \mid z \in N(A) \} \]

where \(N(A)\) is the null space of \(A\). Our goal is to find the vector in this set with the smallest 2-norm.
Decompose \(x_p\) using Fundamental Subspaces: By the fundamental theorem of linear algebra, any vector in \(\mathbb{R}^n\) (like \(x_p\)) can be uniquely decomposed into a component in the row space, \(\text{span}(A^T)\), and a component in the null space, \(N(A)\).

\[ x_p = x_r + x_n \]

where \(x_r \in \text{span}(A^T)\) and \(x_n \in N(A)\).
Rewrite the set of all solutions: Let’s substitute this decomposition back into our set \(S\):

\[ S = \{ x_r + z \mid z \in N(A) \} \]

since \(x_n\) is also in \(N(A)\), we can absorb it into the general \(z\) term.
Minimize the Norm using Orthogonality: We want to find \(\text{argmin}_{x \in S} \|x\|_2^2\), which is now:

\[ \min_{z' \in N(A)} \|x_r + z'\|_2^2 \]

A key part of the fundamental theorem is that the row space and the null space are orthogonal complements. This means that \(x_r \perp z'\) for any \(z' \in N(A)\). By the Pythagorean theorem, the squared norm becomes:

\[ \|x_r + z'\|_2^2 = \|x_r\|_2^2 + \|z'\|_2^2 \]
Find the Minimum: Our minimization problem is now:

\[ \min_{z' \in N(A)} \left( \|x_r\|_2^2 + \|z'\|_2^2 \right) \]

The smallest possible value for \(\|z'\|_2^2\) is 0, which is achieved when \(z' = 0\).
The unique minimum-norm solution \(x^*\) occurs when \(z' = 0\). This gives:

\[ x^* = x_r + 0 = x_r \]

By definition, \(x_r\) is in the row space, \(x_r \in \text{span}(A^T)\). Since the row space is the orthogonal complement of the null space (\(\text{span}(A^T) = N(A)^\perp\)), this proves that the minimum-norm solution \(x^*\) must be orthogonal to the null space \(N(A)\).

Derivation using SVD#

Let’s now use the thin SVD of \(A\), which is a factorization \(A = U \Sigma V^T\) where:

\(A\) is \(m \times n\) with rank\((A) = r < n\).
\(U\) is an \(m \times r\) matrix with orthonormal columns. Its columns form a basis for \(\text{span}(A)\).
\(\Sigma\) is an \(r \times r\) diagonal matrix containing the \(r\) non-zero singular values, \(\sigma_1 \ge \dots \ge \sigma_r > 0\). It is invertible.
\(V\) is an \(n \times r\) matrix with orthonormal columns. Its columns form a basis for the row space, \(\text{span}(A^T)\).

We will now apply our two conditions.

1. The Least-Squares Condition#

The LS solution must make the residual \(r = Ax - b\) orthogonal to the column space, \(\text{span}(A)\).

Since \(\text{span}(A) = \text{span}(U)\), this orthogonality condition is:

\[ U^T (Ax - b) = 0 \]
Substitute \(A = U \Sigma V^T\):

\[ U^T ( (U \Sigma V^T)x - b ) = 0 \]
Distribute \(U^T\). Since \(U\) has orthonormal columns, \(U^T U = I_r\) (the \(r \times r\) identity):

\[\begin{split} \begin{aligned} (U^T U) \Sigma V^T x - U^T b &= 0 \\ \Sigma V^T x &= U^T b \end{aligned} \end{split}\]

This equation defines all LS solutions. As \(V^T\) is a “wide” \(r \times n\) matrix, this system is underdetermined for \(x\) and has infinitely many solutions.

2. The Minimum Norm Condition#

Now we apply our second condition: the solution \(x\) must be orthogonal to the null space, \(x \perp N(A)\).

From the SVD, we know the null space \(N(A)\) is the orthogonal complement of the row space \(\text{span}(A^T)\).
Since \(\text{span}(A^T) = \text{span}(V)\), the condition \(x \perp N(A)\) is equivalent to \(x \in \text{span}(V)\).
This means \(x\) must be a linear combination of the columns of \(V\). We can express this by defining a new, unique vector \(y \in \mathbb{R}^r\) such that:

\[ x = V y \]

3. Combining and Solving#

We now substitute our “minimum norm” form \(x = Vy\) into our “least-squares” equation from Step 1:

\[ \Sigma V^T (Vy) = U^T b \]

Since \(V\) has orthonormal columns, \(V^T V = I_r\). We can simplify the equation:

\[\begin{split} \begin{aligned} \Sigma (V^T V) y &= U^T b \\ \Sigma y &= U^T b \end{aligned} \end{split}\]
This is now a simple \(r \times r\) diagonal system. Since \(\Sigma\) is invertible (it only contains non-zero singular values), we can solve for the unique \(y\):

\[ y = \Sigma^{-1} U^T b \]
Finally, we substitute \(y\) back into \(x = Vy\) to get our unique, minimum-norm LS solution:

\[ x = V y = V \Sigma^{-1} U^T b \]

This solution

\[ x = \sum_{i=1}^r \frac{u_i^T b}{\sigma_i} \; v_i \]

is the most robust and general solution to the linear least-squares problem.

Tip

The Moore-Penrose Pseudoinverse

The matrix we constructed, \(A^\dagger = V \Sigma^{-1} U^T\), is known as the Moore-Penrose Pseudoinverse.

The minimum-norm least-squares solution to \(Ax=b\) is always given by:

\[ x = A^\dagger b \]

The SVD provides the most general and numerically stable way to compute \(A^\dagger\). It handles all cases:

\(A\) is invertible: \(A^\dagger = A^{-1}\).
\(A\) is overdetermined (full rank): \(A^\dagger = (A^T A)^{-1} A^T\).
\(A\) is rank-deficient (our case): \(A^\dagger\) gives the unique minimum-norm solution.

Equivalence with the Normal Equations Solution when Full Rank#

We will now prove that the minimum-norm least-squares solution using the SVD given by \(V \Sigma^{-1} U^T b\) is the same as \((A^T A)^{-1} A^T b\) when \(A\) has full column rank.

This proof is valid under the assumption that \(A\) has full column rank (rank\((A) = n\)), which is the same assumption required for \((A^T A)^{-1}\) to exist in the first place.

We will use the thin SVD, where \(A = U \Sigma V^T\):

\(A\) is \(m \times n\).
\(U\) is \(m \times n\) with orthonormal columns, so \(U^T U = I_n\).
\(\Sigma\) is \(n \times n\), diagonal, and invertible (since rank\(=n\), all \(n\) singular values are non-zero).
\(V\) is \(n \times n\) and orthogonal, so \(V^T V = V V^T = I_n\) and \(V^{-1} = V^T\).

Proof. We will start with the normal equations form \(A^\dagger = (A^T A)^{-1} A^T\) and substitute \(A = U \Sigma V^T\) into it.

1. Substitute \(A\) into \(A^T A\): First, let’s find the expression for \(A^T A\):

\[ A^T = (U \Sigma V^T)^T = V \Sigma^T U^T \]

Since \(\Sigma\) is a square diagonal matrix, \(\Sigma^T = \Sigma\). So, \(A^T = V \Sigma U^T\). Now, multiply \(A^T\) and \(A\):

\[ A^T A = (V \Sigma U^T) (U \Sigma V^T) \]

Group the middle terms: \(A^T A = V \Sigma (U^T U) \Sigma V^T\). Since \(U^T U = I_n\) (the \(n \times n\) identity), this simplifies to:

\[ A^T A = V \Sigma (I_n) \Sigma V^T = V \Sigma^2 V^T \]

2. Invert \(A^T A\): Now we find the inverse of the expression from Step 1:

\[ (A^T A)^{-1} = (V \Sigma^2 V^T)^{-1} \]

Using the inverse rule \((XYZ)^{-1} = Z^{-1} Y^{-1} X^{-1}\):

\[ (A^T A)^{-1} = (V^T)^{-1} (\Sigma^2)^{-1} V^{-1} \]

Since \(V\) is orthogonal, \(V^{-1} = V^T\) and \((V^T)^{-1} = V\).
Since \(\Sigma^2\) is diagonal, \((\Sigma^2)^{-1} = \Sigma^{-2}\). Substituting these in, we get:

\[ (A^T A)^{-1} = V \Sigma^{-2} V^T \]

3. Put It All Together: Now we go back to the original formula and substitute our results from Step 1 and Step 2:

\[\begin{split} \begin{aligned} A^\dagger &= (A^T A)^{-1} A^T \\ &= (V \Sigma^{-2} V^T) (V \Sigma U^T) \end{aligned} \end{split}\]

Group the middle terms: \(A^\dagger = V \Sigma^{-2} (V^T V) \Sigma U^T\). Since \(V^T V = I_n\):

\[ A^\dagger = V \Sigma^{-2} (I_n) \Sigma U^T = V (\Sigma^{-2} \Sigma) U^T \]

Finally, simplify the diagonal matrices: \(\Sigma^{-2} \Sigma = \Sigma^{-1}\).

\[ A^\dagger = V \Sigma^{-1} U^T \]

This completes the proof, showing the two expressions are identical.