Preconditioning

Preconditioning#

In our previous discussions on iterative methods like the Conjugate Gradient (CG) algorithm and GMRES, we observed a critical fact: their convergence speed is highly dependent on the properties of the system matrix \(A\). Specifically, convergence is slow when the condition number \(\kappa(A)\) is large, or when the eigenvalues are spread out. Convergence improves significantly if the matrix is “closer” to the identity, has clustered eigenvalues, and its eigenbasis is well-conditioned.

For many practical problems, especially those arising from the discretization of partial differential equations, \(\kappa(A)\) can be very large, rendering our standard iterative methods impractically slow. This motivates the central topic of this section: preconditioning.

The Goal of a Preconditioner#

The core idea of preconditioning is simple: instead of solving the original, difficult system \(Ax=b\), we solve a related, “easier” system that has the same solution. We do this by applying a preconditioner \(M\), which is a matrix that approximates \(A\) but is much easier to invert.

A good preconditioner \(M\) must balance two competing goals:

Improve Convergence: The preconditioned system (e.g., \(M^{-1}A\)) must be “nicer” than the original \(A\). Ideally, its eigenvalues are clustered, and its condition number \(\kappa(M^{-1}A)\) is much smaller than \(\kappa(A)\). The best possible (but impractical) choice is \(M=A\), which gives \(M^{-1}A = I\) and \(\kappa(M^{-1}A)=1\), allowing convergence in one iteration.
Be Easy to Apply: The preconditioner is only useful if it is computationally cheap. We must be able to solve linear systems of the form \(Mz=r\) (i.e., compute \(M^{-1}r\)) very quickly in each iteration of our method (e.g., CG or GMRES).

The art of preconditioning lies in finding a matrix \(M\) that strikes the optimal trade-off between these two goals.

Three Types of Preconditioning#

There are three main ways to apply a preconditioner \(M\) (or its components \(M_1, M_2\)) to the system \(Ax=b\).

Left Preconditioning: We multiply from the left by an approximation \(M_1 \approx A\). We solve the modified system:

\[(M_1^{-1} A) x = M_1^{-1} b\]

This is a common choice for non-symmetric solvers like GMRES, as it directly acts on the system matrix to improve its properties.
Right Preconditioning: We use an approximation \(M_2 \approx A\) and introduce a variable substitution \(x = M_2^{-1} z\). This leads to the system:

\[(A M_2^{-1}) z = b, \quad \text{followed by} \quad x = M_2^{-1} z\]

This is also common for GMRES. A key advantage of right preconditioning is that the residual of the original system, \(r = b - Ax\), can be computed directly, which is often desirable.
Symmetric (Split) Preconditioning: We use two matrices, \(M_1\) and \(M_2\), to “split” the preconditioning:

\[(M_1^{-1} A M_2^{-1}) z = M_1^{-1} b, \quad \text{followed by} \quad x = M_2^{-1} z\]

This third case is essential for methods like CG and MINRES. The Conjugate Gradient algorithm requires the system matrix to be Symmetric Positive Definite (SPD). If \(A\) is SPD, the left-preconditioned matrix \(M_1^{-1} A\) is generally not symmetric, so we cannot apply CG.

To preserve symmetry, we must construct the preconditioned system in a symmetric way. For example, if we have a preconditioner \(M\) that is itself SPD, we can find its Cholesky factor \(C\) such that \(M = C C^T\). We can then set \(M_1 = C\) and \(M_2 = C^T\). This transforms \(Ax=b\) into the equivalent system:

\[ (C^{-1} A C^{-T}) z = C^{-1} b, \quad \text{where} \quad x = C^{-T} z \]

The new system matrix \(A' = C^{-1} A C^{-T}\) is SPD, and we can now safely apply the standard CG algorithm to \(A'z = b'\).

Examples of Preconditioners#

Choosing a preconditioner is arguably the most critical and problem-dependent step in successfully using iterative solvers. There is no single “best” preconditioner; the choice is always a trade-off between effectiveness (how much it reduces the iteration count), computational cost (how expensive it is to apply \(M^{-1}r\)), and storage cost.

We will survey the main families, progressing from the simplest “classical” methods to more advanced, specialized, and powerful techniques.

1. Classical Iteration Preconditioners#

These simple preconditioners are derived directly from the classical iterative methods (Jacobi, Gauss-Seidel, SOR) we studied earlier. The preconditioner \(M\) is chosen to be the part of the matrix \(A\) that is “inverted” in a single step of that method. Let \(A = D - L - U\), where \(D\) is the diagonal, \(-L\) is the strict lower triangle, and \(-U\) is the strict upper triangle.

Jacobi: \(M = D\). The preconditioner is just the diagonal of \(A\).
Gauss-Seidel: \(M = D - L\). The preconditioner is the lower-triangular part of \(A\).
Successive Over-Relaxation (SOR): \(M = \frac{1}{\omega}(D - \omega L)\).

Pros	Cons
✅ Very cheap: Trivial to construct and apply.	❌ Often weak: Convergence improvement is minimal for ill-conditioned systems.
✅ Low memory: \(M\) is (at most) a triangle or diagonal of \(A\).	❌ Sequential (Gauss-Seidel/SOR): The forward solve \(Mz=r\) is sequential and does not parallelize well.
✅ Parallel (Jacobi): The Jacobi preconditioner is “embarrassingly parallel,” as \(M^{-1} = D^{-1}\) is just a component-wise division.	❌ Limited Applicability: May not be SPD, and can fail if a diagonal entry is zero.

Applicability: While rarely effective as standalone preconditioners for difficult problems, they are crucial components of other methods. For instance, they are often used as “smoothers” within the Multigrid algorithm.

2. Incomplete Factorization Preconditioners (ILU/IC)#

This is the most common and “workhorse” family of general-purpose preconditioners. The idea is to perform an LU or Cholesky factorization of \(A\), but to discard some or all of the “fill-in” (new non-zeros) that is created during the factorization.

Incomplete Cholesky (IC): Used for SPD matrices (e.g., with CG). We compute an approximate Cholesky factor \(C\) such that \(A \approx C C^T = M\). We then use \(M = C C^T\) as the preconditioner.
Incomplete LU (ILU): Used for non-symmetric matrices (e.g., with GMRES). We compute \(L\) and \(U\) such that \(A \approx L U = M\).

The level of fill-in is controlled by different strategies:

ILU(0) / IC(0): The “zero-fill” variants. We only keep entries in \(L\) and \(U\) where \(A\) originally had a non-zero. The sparsity pattern of \(L+U\) is identical to that of \(A\).
ILU(k) / IC(k): “Level-based” fill. We allow fill-in up to a “level” \(k\). \(k=0\) is ILU(0). Higher \(k\) means more fill, a more accurate \(M\), but more cost.
ILUT: “Threshold-based” ILU. We discard any fill-in entries that are smaller than a given drop tolerance \(\tau\).

Pros	Cons
✅ Very effective: Often provides a massive reduction in iterations. A huge step up from classical methods.	❌ Sequential solves: Applying \(M^{-1}\) requires a forward and a backward triangular solve, which are difficult to parallelize.
✅ General purpose: Often the first-choice preconditioner for unstructured sparse systems.	❌ Breakdown: The factorization can fail (e.g., encounter a zero pivot), especially for ILU(0). IC(0) is only guaranteed to exist for M-matrices.
✅ Tunable: The level `k` or threshold `t` gives a direct knob to trade off cost vs. effectiveness.	❌ Construction cost: Can be expensive to compute, though this is usually an “offline” (one-time) cost.

Applicability: ILU/IC are the default preconditioners for unstructured sparse linear systems on single-core or shared-memory (CPU) architectures. IC is essential for preconditioning CG, and ILU is a staple for GMRES.

3. Sparse Approximate Inverse (SPAI) Preconditioners#

This family operates on a completely different philosophy. Instead of finding \(M \approx A\) and applying \(M^{-1}\) with a solve, we try to directly compute a sparse matrix \(M\) that approximates \(A^{-1}\).

The preconditioning step is then just a matrix-vector product: \(z = M r\).

This is typically set up as an optimization problem: for a given sparsity pattern, find \(M\) that minimizes \(\| I - MA \|_F\) (for left preconditioning) or \(\| I - AM \|_F\) (for right preconditioning) in the Frobenius norm.

Pros	Cons
✅ Fully parallel: The “apply” step is a sparse matrix-vector product, which is ideal for parallel hardware like GPUs.	❌ High construction cost: Finding \(M\) can be very expensive, often more so than an ILU factorization.
✅ Overcomes triangular solve bottleneck: This is the main motivation for using SPAI.	❌ Effectiveness: Often less effective at reducing iterations than a comparable ILU. The “best” sparse approximation of \(A^{-1}\) is often not sparse at all.

Applicability: These are specialized for high-performance, massively parallel systems where the sequential nature of ILU/IC becomes the primary performance bottleneck.

4. Advanced & Specialized Preconditioners#

This group includes highly powerful methods that are often tailored to specific problem structures, such as those from PDEs.

Algebraic Multigrid (AMG): This is a “black-box” version of geometric multigrid. It works only from the matrix \(A\) to automatically build a hierarchy of “coarse” problems. It then uses simple smoothers (like Jacobi or Gauss-Seidel) on each level to eliminate error at different frequencies.
- Pros: Optimal complexity. For many elliptic PDEs, this is an \(O(N)\) method. The number of iterations is independent of the problem size \(N\).
- Cons: Very complex to implement. Makes assumptions about the matrix (e.g., near M-matrix) to build its coarse grids, and can fail if these are not met.
- Applicability: The standard, go-to solver/preconditioner for large sparse systems from elliptic PDEs (e.g., finite element methods).
Domain Decomposition (DD): This is a family of “divide-and-conquer” methods designed for parallel computing. The problem domain is split into subdomains (like a puzzle).
- Pros: Inherently parallel. Designed from the ground up to run on distributed-memory supercomputers.
- Cons: Requires significant setup, and its performance depends on correctly handling the “interface” problem between subdomains, often requiring a “coarse-grid solve.”
- Applicability: The standard for large-scale, parallel PDE simulations (CFD, structural mechanics).
Block / Physics-Based Preconditioners: For systems of coupled PDEs (e.g., Stokes flow, Maxwell’s equations), the matrix \(A\) has a natural block structure.
- Pros: By building a preconditioner \(M\) that respects this block structure, we can achieve far superior results than a “monolithic” preconditioner (like ILU) that ignores it.
- Cons: Highly specialized. You must use knowledge of the physics to design the preconditioner.
- Applicability: Essential for any multi-physics problem (e.g., fluid-structure interaction, electromagnetics).

Summary#

As this survey shows, there is no “best” preconditioner. Jacobi is cheap but weak. ILU is a good default but not parallel. SPAI is parallel but expensive to build. AMG is optimal for its target class but complex and not universal. Domain Decomposition is for supercomputers.

The choice is a key part of the “art” of numerical linear algebra, requiring an understanding of the problem’s origin (PDEs, data science), its structure (SPD, non-symmetric, block), and the computational hardware (CPU, GPU, cluster).