Projection onto Krylov Subspaces

Projection onto Krylov Subspaces#

We established that for large, sparse matrices, our only computationally feasible tool is the sparse matrix-vector product, \(v \mapsto Av\). Any method that modifies the matrix \(A\) itself, like the dense QR algorithm, is unusable due to “fill-in.”

This forces us to ask a new question: How can we find eigenvalues using only matrix-vector products?

The answer lies in building a small, dense matrix \(H_k\) whose eigenvalues approximate the eigenvalues of \(A\). We do this by projecting \(A\) onto a carefully chosen low-dimensional subspace.

Revisiting the Hessenberg Reduction#

Let’s reconsider the full Hessenberg reduction from our study of direct methods:

\[ A Q = Q H \]

where \(Q \in \mathbb{R}^{n \times n}\) is orthogonal and \(H \in \mathbb{R}^{n \times n}\) is upper Hessenberg.

Let’s look at this equation one column at a time. Let \(q_j\) be the \(j\)-th column of \(Q\) and \(h_j\) be the \(j\)-th column of \(H\). The \(j\)-th column of the equation \(AQ = QH\) reads:

\[ A q_j = Q h_j = \sum_{i=1}^{n} q_i h_{ij} \]

Because \(H\) is upper Hessenberg, \(h_{ij} = 0\) for all \(i > j+1\). This simplifies the sum dramatically:

\[ A q_j = \sum_{i=1}^{j+1} h_{ij} q_i = h_{1j}q_1 + h_{2j}q_2 + \dots + h_{j,j}q_j + h_{j+1,j}q_{j+1} \]

We can rearrange this to solve for the “new” vector, \(q_{j+1}\):

\[ h_{j+1,j} q_{j+1} = A q_j - \sum_{i=1}^{j} h_{ij} q_i \]

This equation reveals that the Hessenberg reduction is actually a Gram-Schmidt-like iterative process. Starting with an arbitrary unit vector \(q_1\):

Compute the vector \(v = A q_j\).
Orthogonalize \(v\) against the “old” basis vectors \(q_1, \dots, q_j\). The coefficients of this orthogonalization are \(h_{ij} = q_i^T v\) (since \(q_i^T q_k = 0\)).
The remaining vector, \(w = v - \sum_{i=1}^{j} h_{ij} q_i\), is, by construction, orthogonal to \(q_1, \dots, q_j\).
Normalize this vector: \(h_{j+1,j} = \|w\|_2\) and \(q_{j+1} = w / h_{j+1,j}\).

This is the Arnoldi iteration.

If we run this process for \(n\) steps, we build the full matrices \(Q\) and \(H\). The cost is dominated by the \(n\) sparse matvecs and the \(O(n^3)\) work of the \(n\) growing orthogonalizations. We have simply re-derived the \(O(n^3)\) dense Hessenberg reduction, and the sparsity of \(A\) has not saved us.

The Iterative Truncation#

The key insight is that it is possible to stop this process early.

What if we only run the Arnoldi iteration for \(k\) steps, where \(k \ll n\)?

After \(k\) steps, we have produced an orthonormal set of vectors \(Q_k = [q_1, \dots, q_k]\) and a small \(k \times k\) upper Hessenberg matrix \(H_k\), which is the top-left block of the full \(H\).

The process generates the following relationship, which is the key to all iterative methods:

\[ \boxed{A Q_k = Q_k H_k + h_{k+1,k} q_{k+1} e_k^T} \]

where \(e_k^T = [0, \dots, 0, 1]\) is the \(k\)-th standard basis vector. This is known as the Arnoldi relation.

The term \(Q_k H_k\) exactly represents the components of \(A Q_k\) that lie within the subspace spanned by \(Q_k.\)
The term \(h_{k+1,k} q_{k+1} e_k^T\) is the “remainder” or “residual”—the part of \(A Q_k\) that points “out” of the subspace.

The Projection and the Approximation#

If we simply ignore the residual term (which we hope is small), we get an approximation:

\[ A Q_k \approx Q_k H_k \]

This suggests that \(A\) behaves like \(H_k\) when restricted to the subspace \(\text{span}(Q_k)\). Let’s make this formal by left-multiplying the Arnoldi relation by \(Q_k^T\):

\[\begin{split} \begin{aligned} Q_k^T (A Q_k) &= Q_k^T (Q_k H_k + h_{k+1,k} q_{k+1} e_k^T) \\ Q_k^T A Q_k &= (Q_k^T Q_k) H_k + h_{k+1,k} (Q_k^T q_{k+1}) e_k^T \end{aligned} \end{split}\]

By construction, \(Q_k\) is orthonormal, so \(Q_k^T Q_k = I_k\). Also, \(q_{k+1}\) is orthogonal to all columns in \(Q_k\), so \(Q_k^T q_{k+1} = \mathbf{0}\). The equation simplifies perfectly:

\[ H_k = Q_k^T A Q_k \]

This \(k \times k\) matrix \(H_k\) is the Rayleigh-Ritz projection of \(A\) onto the subspace spanned by \(Q_k\).

The Krylov Subspace#

The projection is onto the subspace spanned by the columns of \(Q_k\). What is this subspace?

The Arnoldi process starts with \(q_1\) and generates subsequent vectors using matrix-vector products with \(A\). By induction, one can see that the basis vectors \(\{q_1, \dots, q_k\}\) form an orthonormal basis for the space spanned by the vectors \(\{q_1, Aq_1, A^2q_1, \dots, A^{k-1}q_1\}\).

This subspace, \(\mathcal{K}_k(A, q_1) = \text{span}\{q_1, Aq_1, A^2q_1, \dots, A^{k-1}q_1\}\), is called the Krylov subspace of degree \(k\) generated by \(A\) and \(q_1\).

The Arnoldi iteration is a numerically stable way (essentially a modified Gram-Schmidt process) to build an orthonormal basis for this subspace. The matrix \(H_k\) is therefore the projection of \(A\) onto the \(k\)-th Krylov subspace.

The Iterative Eigensolver Strategy#

This defines the general strategy for all modern iterative eigensolvers:

Build Subspace: Choose a starting vector \(q_1\) and run the Arnoldi iteration for \(k\) steps to build an orthonormal basis \(Q_k\) for the Krylov subspace \(\mathcal{K}_k\) and the \(k \times k\) Hessenberg projection \(H_k = Q_k^T A Q_k\).
Solve Small Problem: Compute the eigenvalues of the small, dense matrix \(H_k\) using a standard method (e.g., the QR iteration algorithm). Let these eigenvalues be \(\theta_1, \dots, \theta_k\). These are called the Ritz values.
Check Convergence: These Ritz values are our approximations for the eigenvalues of \(A\). We can check their associated residual error (which, as we will see, is related to the \(h_{k+1,k}\) term) to see if they are “good enough.”
Iterate: If not converged, we can expand the subspace (increase \(k\)) and repeat.

The total cost is dominated by the \(k\) sparse matrix-vector products, at \(O(k \cdot nnz(A)) \approx O(kn)\), and the \(k\) orthogonalization steps, at \(O(k^2 n)\). As long as \(k \ll n\), this cost is vastly lower than the \(O(n^3)\) cost of a direct method.

In the special (and very common) case that \(A\) is symmetric, the projected matrix \(H_k\) will also be symmetric. A symmetric Hessenberg matrix is tridiagonal. This simplification of the Arnoldi iteration is known as the Lanczos iteration, which is even more efficient.