Operator and Matrix Norms#
A matrix norm is a function that assigns a strictly positive number to a matrix, providing a measure of its “size” or “magnitude”. Norms are fundamental tools in numerical analysis for understanding the behavior of matrix operations and for quantifying errors in computations.
The Operator Norm (Induced p-Norm)#
The most common family of matrix norms are operator norms, also known as induced norms. These norms are derived directly from vector p-norms and are defined by how much a matrix can “stretch” a vector.
The operator norm is defined as the maximum possible ratio of \(\|Ax\|_p\) to \(\|x\|_p\) for any non-zero vector \(x\). This is equivalent to finding the maximum length of the vector \(Ax\) for any vector \(x\) with a length of 1.
Geometrically, if you apply the matrix \(A\) to all the vectors on the unit circle (or unit sphere in higher dimensions), you’ll get an ellipse (or ellipsoid). The p-norm \(\|A\|_p\) is the length of the longest vector from the origin to a point on that resulting shape.
The three most widely used operator norms are:
1-Norm: The maximum absolute column sum. It is computed as \(\|A\|_1 = \max_j \sum_i |a_{ij}|\).
\(\infty\)-Norm: The maximum absolute row sum. It is computed as \(\|A\|_\infty = \max_i \sum_j |a_{ij}|\).
2-Norm (Spectral Norm): This norm is equal to the largest singular value of the matrix, denoted \(\sigma_{\max}(A)\). While it corresponds to the true geometric “maximum stretch,” it is generally more computationally intensive than the 1-norm or \(\infty\)-norm.
The Frobenius Norm#
Another useful and easy-to-compute norm is the Frobenius norm. Unlike the operator norms, it is not induced by a vector norm. Instead, it treats the matrix as a single long vector and calculates its standard Euclidean norm (the square root of the sum of the squares of all its elements).
The Frobenius norm is very convenient for analyzing matrices that can be decomposed into blocks.
The two most fundamental inequalities for matrix operator norms are the sub-multiplicative inequality, \(\|AB\| \le \|A\| \|B\|\), and the consistency inequality, \(\|Ax\| \le \|A\| \|x\|\).
Key Inequalities for Operator Norms#
While many inequalities exist, they primarily fall into a few key categories that define how matrix norms behave with respect to multiplication and each other.
Sub-multiplicative Property#
This is the most important property for analyzing products of matrices. For any two matrices \(A\) and \(B\) whose product is defined, the operator and Frobenius norms satisfy:
This inequality is crucial for bounding errors in matrix computations and for proving the convergence of iterative methods, such as in solving systems of linear equations.
Consistency with Vector Norms#
The consistency property for the operator norm connects the matrix norm back to the vector norm it’s based on. For any matrix \(A\) and vector \(x\):
This inequality states that the norm of a matrix tells you the maximum factor by which it can stretch any vector.
Equivalence Between Norms#
Just like vector norms, all matrix norms on a finite-dimensional space are equivalent. This means that for any two matrix norms \(\|\cdot\|_a\) and \(\|\cdot\|_b\), there exist positive constants \(c_1\) and \(c_2\) such that:
The most useful specific inequalities relate the common operator norms (\(1\)-norm, \(2\)-norm, \(\infty\)-norm) and the Frobenius norm (\(F\)) for any \(n \times n\) matrix \(A\). 📐
\(\|A\|_2 \le \|A\|_F \le \sqrt{n}\|A\|_2\)
\(\frac{1}{\sqrt{n}}\|A\|_\infty \le \|A\|_2 \le \sqrt{n}\|A\|_\infty\)
\(\frac{1}{\sqrt{n}}\|A\|_1 \le \|A\|_2 \le \sqrt{n}\|A\|_1\)
\(\frac{1}{n}\|A\|_\infty \le \|A\|_1 \le n\|A\|_\infty\)
Other Fundamental Properties#
All matrix norms must also satisfy the standard properties of a norm:
Triangle Inequality: \(\|A+B\| \le \|A\| + \|B\|\)
Absolute Homogeneity: \(\|\alpha A\| = |\alpha| \|A\|\) for any scalar \(\alpha\).
Positive Definiteness: \(\|A\| \ge 0\), and \(\|A\|=0\) if and only if \(A=0\).
Other Matrix Norms#
While the operator and Frobenius norms are the most common, other specialized norms are valuable in various applications. We introduce two such examples: the simple but non-sub-multiplicative max norm, and the more general family of Schatten norms, which are based on a matrix’s singular values.
Max Norm#
The largest element in the matrix by magnitude. Note that this norm is not sub-multiplicative (\(\|AB\| \le \|A\|\|B\|\) does not hold).
Schatten Norms (based on singular values)#
The Schatten \(p\)-norm is defined by applying the vector \(p\)-norm to the vector of the matrix’s singular values, \(\sigma_i\).
This family generalizes several important norms:
Schatten 1-Norm (Nuclear Norm or Trace Norm): The sum of the singular values. It’s widely used in machine learning for matrix completion and rank minimization problems.
\[\|A\|_* = \sum_{i=1}^{\min(m,n)} \sigma_i\]Schatten 2-Norm: The square root of the sum of the squares of the singular values. This is exactly the same as the Frobenius norm.
\[\|A\|_2 = \left( \sum_{i=1}^{\min(m,n)} \sigma_i^2 \right)^{1/2} = \|A\|_F\]Schatten \(\infty\)-Norm: The limit as \(p \to \infty\), which is simply the largest singular value. This is exactly the same as the spectral norm (operator 2-norm).
\[\|A\|_\infty = \max_i \sigma_i = \|A\|_2\]