The Generalized Minimal Residual Method (GMRES)

The Generalized Minimal Residual Method (GMRES)#

From Conjugate Gradients to GMRES#

In our study of iterative methods, the Conjugate Gradient (CG) algorithm is a powerful and efficient tool. However, its use is strictly limited to systems where the matrix \(A\) is symmetric positive definite (SPD). When we face a general, non-symmetric square matrix, CG is no longer applicable, and we require a new approach.

This is the problem that the Generalized Minimal Residual (GMRES) method solves.

The Arnoldi Analogy and Computational Cost#

GMRES is a Krylov subspace method, just like CG. The relationship between them is best understood through the lens of the processes that build their respective subspaces:

CG is based on the Lanczos process, which leverages the symmetry of \(A\) to build an orthogonal basis using a short-term (three-term) recurrence. This is computationally cheap and requires minimal storage.
GMRES is based on the Arnoldi process, which applies to any matrix. This generality comes at a price: the Arnoldi process requires a long-term recurrence. To maintain orthogonality, each new vector must be explicitly orthogonalized against all previously generated basis vectors.

This distinction in the underlying process has a direct impact on the algorithms:

Applicability: CG is restricted to SPD matrices, while GMRES can be applied to any square matrix.
Cost: GMRES has a significantly higher computational and storage cost per iteration. It must store a complete basis \(Q_k\) for the Krylov subspace \(\mathcal{K}_k\), and the work to generate \(q_{k+1}\) grows with each step \(k\).

Conceptually, the GMRES approach is more straightforward than CG. Because it cannot rely on the special properties of an \(A\)-conjugate basis, it lacks the optimization opportunities that make CG so efficient (like short-term recurrences). Instead, GMRES “just solves a least-squares problem” at each step.

A New Objective: Minimizing the Residual#

The fundamental difference between CG and GMRES lies in the cost function they are designed to optimize.

Recall that CG is built to find the approximation \(x^{(k)} \in \mathcal{K}_k\) that minimizes the \(A\)-norm of the error (\(e^{(k)} = x - x^{(k)}\)):

Important

CG Objective: \(\min_{x^{(k)} \in \mathcal{K}_k} \| x - x^{(k)} \|_A^2\), where \(\| z \|_A^2 = z^T A z\)

This objective function is computationally convenient only when \(A\) is SPD, as it defines a valid norm.

Important

GMRES takes a more direct and intuitive approach. It abandons the \(A\)-norm and instead seeks to find the approximation \(x^{(k)}\) in the Krylov subspace that minimizes the 2-norm of the residual (\(r^{(k)} = b - Ax^{(k)}\)).

This new objective function is precisely what gives the method its name: Generalized Minimal RESidual.

This choice makes sense. The residual is the only quantity we can easily measure without knowing the true solution \(x\). Minimizing the residual’s norm is the most natural goal for finding an approximate solution.

This choice of minimizing \(\| r^{(k)} \|_2\) is formally equivalent to minimizing the \(A^TA\)-norm of the error. If we let our approximation \(x^{(k)}\) be represented by \(Q_k y\) (where \(Q_k\) is the Arnoldi basis for \(\mathcal{K}_k\)), we see the direct connection:

\[\begin{split} \begin{align} \| x - x^{(k)} \|_{A^TA}^2 & = (x - Q_k y)^T A^TA (x - Q_k y) \\[.5em] & = \| A(x - Q_k y) \|_2^2 \\[.5em] & = \| b - A Q_k y \|_2^2 = \| r^{(k)} \|_2^2 \end{align} \end{split}\]

Thus, at each step \(k\), GMRES finds the vector \(x^{(k)} \in \mathcal{K}_k\) that makes the residual \(r^{(k)}\) as small as possible in the Euclidean 2-norm. In the following sections, we will explore how this minimization problem is solved efficiently.

Deriving the GMRES Least-Squares Problem#

In the previous section, we established that the core task of GMRES is to find the approximation \(x^{(k)} = Q_k y\) that minimizes the 2-norm of the residual. This gives us the following formal optimization problem for the coefficients \(y \in \mathbb{R}^k\):

\[ \min_{y \in \mathbb{R}^k} \| b - A Q_k y \|_2 \]

with \(x_k = Q_k y\).

At first glance, this appears to be a large, \(N \times k\) least-squares problem. The key to solving it efficiently lies in the Arnoldi process used to build the \(Q_k\) basis.

The Arnoldi Relation#

Recall the fundamental Arnoldi relation, which connects \(A\), \(Q_k\), and \(Q_{k+1}\):

\[ A Q_k = Q_{k+1} \underline{H}_k \]

Here, \(\underline{H}_k\) is the \((k+1) \times k\) upper-Hessenberg matrix of coefficients generated by the process. This matrix is precisely \(Q_{k+1}^T A Q_k\).

Transforming the Problem#

We can substitute this relation directly into our minimization problem:

\[ \min_y \| b - Q_{k+1} \underline{H}_k y \|_2 \]

Next, we must handle the vector \(b\). The Arnoldi process was started with the normalized initial residual, \(q_1 = b / \|b\|_2\). This means \(b = \|b\|_2 q_1\). The vector \(q_1\) is, by definition, the first column of the \(Q_{k+1}\) basis.

Therefore, we can express the vector \(b\) perfectly within the \(Q_{k+1}\) basis:

\[ b = Q_{k+1} (\|b\|_2 \, e_1) \]

where \(e_1 = [1, 0, \dots, 0]^T\) is the first standard basis vector in \(\mathbb{R}^{k+1}\).

Let’s substitute this final piece back into our problem. We are now minimizing:

\[ \| Q_{k+1} (\|b\|_2 \, e_1) - Q_{k+1} \underline{H}_k y \|_2 \]

We can factor out the \(Q_{k+1}\) matrix:

\[ \min_y \left\| Q_{k+1} \left( (\|b\|_2 \, e_1) - \underline{H}_k y \right) \right\|_2 \]

This is the crucial step. Because \(Q_{k+1}\) is an \(N \times (k+1)\) matrix with orthonormal columns, it acts as an isometry. This means it preserves the 2-norm of any vector it multiplies (i.e., \(\| Q_{k+1} z \|_2 = \|z\|_2\) for any \(z \in \mathbb{R}^{k+1}\)).

This property allows us to “remove” \(Q_{k+1}\) from the outside of the norm, transforming our large \(N \times k\) problem into a small \((k+1) \times k\) problem:

\[ \min_{y \in \mathbb{R}^k} \big\| \|b\|_2 \, e_1 - \underline{H}_k \, y \big\|_2, \qquad x_k = Q_k y. \]

This is the central least-squares problem that GMRES solves at each iteration. It is a small, \((k+1) \times k\) problem that can be solved very efficiently, typically using a QR factorization of \(\underline{H}_k\) (which is itself cheap, as \(\underline{H}_k\) is upper Hessenberg).

The GMRES Algorithm: Computational Steps#

The derivation in the previous section established that at each step \(k\), we must solve the small least-squares problem:

\[ \min_{y \in \mathbb{R}^k} \big\| \beta e_1 - \underline{H}_k \, y \big\|_2 \]

where \(\beta = \|b\|_2 = \|r^{(0)}\|_2\).

Solving this from scratch at every iteration would be inefficient. A practical GMRES implementation is incremental. It cleverly updates the solution from step \(k-1\) to step \(k\). This is achieved by maintaining a QR factorization of the \(\underline{H}_k\) matrix, which is updated at each step using Givens rotations.

The algorithm proceeds in the following three stages at each iteration \(k\).

The Arnoldi Step#

We first run one step of the Arnoldi process. This involves:

Computing \(w = A q_k\).
Orthogonalizing \(w\) against all previous basis vectors \(q_1, \dots, q_k\). The coefficients from this orthogonalization form the \(k\)-th column of the Hessenberg matrix \(\underline{H}_k\).
Normalizing the resulting vector to get the new basis vector, \(q_{k+1}\).

This single step provides us with the \(k\)-th column of \(\underline{H}_k\) and the new basis vector \(q_{k+1}\).

Update the QR Factorization and Solve#

This is the core of the GMRES computation. We have the \((k+1) \times k\) least-squares problem. We solve it by maintaining the QR factorization of \(\underline{H}_k\).

Because \(\underline{H}_k\) is upper Hessenberg, its QR factorization is cheap to compute and, more importantly, cheap to update. When we add column \(k\) (which we just computed in the Arnoldi step), we only need to introduce a single new Givens rotation, \(G_k\), to zero out the new subdiagonal element.

Let \(G_1, \dots, G_k\) be the sequence of Givens rotations that transforms \(\underline{H}_k\) into an upper-triangular matrix \(R_k\) (padded with a row of zeros).

\[\begin{split} G_k^T \cdots G_1^T \underline{H}_k = \begin{pmatrix} R_k \\ 0 \end{pmatrix} \end{split}\]

where \(R_k\) is a \(k \times k\) upper-triangular matrix.

To solve the least-squares problem, we must apply the exact same sequence of rotations to the right-hand-side vector, \(\beta e_1\):

\[\begin{split} \min_y \left\| \left( G_k^T \cdots G_1^T \right) (\beta e_1) - \begin{pmatrix} R_k \\ 0 \end{pmatrix} y \right\|_2 \end{split}\]

Let’s define the rotated right-hand-side vector as \(g_k\):

\[\begin{split} g_k = \left( G_k^T \cdots G_1^T \right) (\beta e_1) = \begin{pmatrix} p_k \\ \rho_k \end{pmatrix} \end{split}\]

Here, \(p_k\) is a vector of length \(k\), and \(\rho_k\) is a single scalar. Our minimization problem is now:

\[\begin{split} \min_y \left\| \begin{pmatrix} p_k \\ \rho_k \end{pmatrix} - \begin{pmatrix} R_k \\ 0 \end{pmatrix} y \right\|_2^2 = \min_y \left\| \begin{pmatrix} p_k - R_k y \\ \rho_k \end{pmatrix} \right\|_2^2 = \min_y \| p_k - R_k y \|_2^2 + |\rho_k|^2 \end{split}\]

The minimum is achieved when we make the first term zero. We find our coefficient vector \(y^{(k)}\) by solving the \(k \times k\) upper-triangular system:

\[ R_k \, y^{(k)} = p_k \]

This is easily solved by back-substitution.

A key result in this approach is that the 2-norm of the residual at this step is exactly the leftover scalar term, \(\rho_k\).

\[ \| r^{(k)} \|_2 = \| \beta e_1 - \underline{H}_k y^{(k)} \|_2 = |\rho_k| \]

This gives us a cheap way to monitor the residual norm at every single iteration without computing the solution \(x^{(k)}\) or the residual \(r^{(k)}\). We use \(\rho_k\) as our stopping criterion.

Compute the Final Solution#

We only perform this final step once the residual norm \(\rho_k\) is small enough (i.e., \(\rho_k < \text{tol} \cdot \beta\)).

Once the loop terminates at iteration \(k\), we have the coefficient vector \(y^{(k)}\) from solving \(R_k y^{(k)} = p_k\). The final approximate solution is the linear combination of the Krylov basis vectors defined by \(y^{(k)}\):

\[ x^{(k)} = Q_k \, y^{(k)} \]

This final matrix-vector product is the last computation required to get our answer.

Convergence of GMRES#

Understanding when GMRES converges is a complex topic, as it depends on the entirety of the matrix \(A\), not just on individual properties like its condition number (as in CG for SPD matrices).

The key to understanding its convergence lies in the polynomial approximation problem that GMRES implicitly solves.

The GMRES Polynomial Problem#

Recall that at step \(k\), GMRES finds the approximation \(x^{(k)} = Q_k y\) in the Krylov subspace \(\mathcal{K}_k\) that minimizes the 2-norm of the residual. Any vector \(x^{(k)} \in \mathcal{K}_k\) can be written as \(x^{(k)} = q(A)b\) for some polynomial \(q\) of degree at most \(k-1\).

Therefore, the residual \(r^{(k)}\) can be expressed as:

\[ r^{(k)} = b - A x^{(k)} = b - A q(A) b = (I - A q(A)) b \]

If we define a new polynomial \(p(z) = 1 - z q(z)\), we can see that:

\(p(z)\) is a polynomial of degree at most \(k\).
\(p(0) = 1 - 0 \cdot q(0) = 1\).

Thus, GMRES finds the \(x^{(k)}\) corresponding to the polynomial \(p \in \mathcal{P}_k\) (the set of polynomials of degree \(\le k\)) that satisfies \(p(0)=1\) and minimizes the norm of the resulting residual:

\[\begin{split} \|r^{(k)}\|_2 = \min_{ \substack{p \in \mathcal{P}_k \\ p(0) = 1} } \| p(A) b \|_2 \end{split}\]

This tells us that GMRES converges quickly if there exists a low-degree polynomial \(p\) such that \(p(0)=1\) and \(p(A)b\) is small.

A Formal Error Bound#

This polynomial formulation leads to a famous (though often pessimistic) upper bound on the GMRES residual.

Theorem 40 (GMRES Residual Bound)

Suppose that \(A\) is diagonalizable with an eigenvalue decomposition \(A = X \Lambda X^{-1}\). Then the GMRES residual at step \(k\) is bounded by:

\[\begin{split} \|r^{(k)}\|_2 \le \|b\|_2 \cdot \kappa(X) \cdot \min_{ \substack{p \in \mathcal{P}_k \\p(0) = 1} } \max_{\lambda \in \Lambda(A)} |p(\lambda)| \end{split}\]

where \(\Lambda(A)\) is the set of \(A\)’s eigenvalues and \(\kappa(X) = \|X\|_2 \|X^{-1}\|_2\) is the condition number of the eigenvector matrix \(X\).

This bound reveals the two key factors governing GMRES convergence:

Eigenvalue Distribution: The term \(\min \max |p(\lambda)|\). We need to find a low-degree polynomial \(p\) (with \(p(0)=1\)) that is as small as possible on all eigenvalues of \(A\).
- Good Case: If the eigenvalues are clustered in a few compact groups, all far from the origin, it is easy to construct such a polynomial.
- Bad Case: If the eigenvalues are distributed widely, or worse, accumulate near the origin, it is very difficult for a low-degree polynomial to be small on all of them simultaneously.
Non-Normality of \(A\): The term \(\kappa(X)\).
- If \(A\) is a normal matrix (e.g., symmetric, skew-symmetric, or unitary), its eigenvectors are orthogonal, and \(\kappa(X) = 1\). In this case, convergence is governed only by the eigenvalue distribution.
- If \(A\) is non-normal, its eigenvectors can be nearly linearly dependent, making \(\kappa(X)\) very large. A large \(\kappa(X)\) can permit slow convergence even when the eigenvalue distribution appears favorable. This is the most significant departure from the convergence behavior of CG.

A “Failure Case” for GMRES#

A classic example of worst-case behavior occurs when the eigenvalues are uniformly distributed on the unit circle.

Consider a simple permutation matrix \(A\) that represents a single cycle on \(n\) elements, for instance, \(\pi(i) = (i \mod n) + 1\). The eigenvalues of this matrix are the \(n\) roots of unity, which are perfectly spread around the unit circle.

Let’s try to solve \(Ax = e_1\). The solution \(x\) is the vector \(e_r\) such that \(\pi(r) = 1\), which in this case is \(e_n\). The Krylov subspace is:

\[ \mathcal{K}_k(A, e_1) = \text{span} \{e_1, A e_1, \dots, A^{k-1} e_1\} = \text{span} \{e_1, e_2, \dots, e_k\} \]

The subspace \(\mathcal{K}_k\) is simply the coordinate subspace spanned by the first \(k\) standard basis vectors.

The true solution is \(x = e_n\). This solution vector is orthogonal to the Krylov subspace \(\mathcal{K}_k\) for all \(k < n\). Because the GMRES approximation \(x^{(k)}\) must lie in \(\mathcal{K}_k\), the algorithm cannot make any progress toward the solution. The residual norm will remain completely stagnant.

Only at iteration \(k=n\) does the subspace \(\mathcal{K}_n\) finally contain the solution. GMRES will find the exact solution in \(n\) steps (as it must, in exact arithmetic), but it shows zero convergence until the very last iteration. This is a catastrophic failure of “fast” convergence, and it stems directly from an eigenvalue distribution that makes the polynomial problem \(\min \max |p(\lambda)|\) impossible to solve well for \(k < n\).