The method of normal equation consists in solving
The solution is:
The matrix is SPD. So the system can be solved using Cholesky.
This method is best for very tall skinny .
One of the main drawbacks is that the condition number grows very quickly! Indeed we can prove that
So the condition number grows much faster than .
This method requires to be non-singular. This is equivalent to saying that should be full column rank.
The computational cost is .
Intuitive explanation
- : This product represents the “correlation” of A’s columns with each other. It captures how the columns of A interact and overlap.
- : This term represents the “correlation” of A’s columns with the target vector b. It tells us how much each column of A contributes to explaining b.
- : Inverting is like “decorrelating” the columns of A. It accounts for any redundancy or overlap in A’s columns.
- Final multiplication: combines the decorrelated version of A with its correlation to b, giving us the optimal coefficients x.
Least-squares problems, Symmetric Positive Definite Matrices, Cholesky factorization, Conditioning of a linear system, Stability of the Cholesky factorization