The method of normal equation consists in solving

The solution is:

The matrix is SPD. So the system can be solved using Cholesky.

This method is best for very tall skinny .

One of the main drawbacks is that the condition number grows very quickly! Indeed we can prove that

So the condition number grows much faster than .

This method requires to be non-singular. This is equivalent to saying that should be full column rank.

The computational cost is .

Intuitive explanation

  • : This product represents the “correlation” of A’s columns with each other. It captures how the columns of A interact and overlap.
  • : This term represents the “correlation” of A’s columns with the target vector b. It tells us how much each column of A contributes to explaining b.
  • : Inverting is like “decorrelating” the columns of A. It accounts for any redundancy or overlap in A’s columns.
  • Final multiplication: combines the decorrelated version of A with its correlation to b, giving us the optimal coefficients x.

Least-squares problems, Symmetric Positive Definite Matrices, Cholesky factorization, Conditioning of a linear system, Stability of the Cholesky factorization