Chapter 15 Norms and Perturbation Theory¶
Prerequisites: Ch14 spectral radius and norms
Chapter arc: Vector norms → Matrix norms (operator/Frobenius) → Condition number quantifies ill-conditioning → Bauer--Fike/Weyl quantify eigenvalue sensitivity
Further connections:Condition numbers determine result reliability in numerical weather prediction, finite element analysis, and GPS positioning; eigenvalue perturbation theory (Weyl, Bauer-Fike) underpins operator perturbation analysis in random matrix theory and quantum information
Norms provide a measure of "size" for vector spaces and matrix spaces, and are the cornerstone of matrix analysis and numerical linear algebra. Perturbation theory studies how eigenvalues, singular values, and solutions of linear systems change when a matrix undergoes small perturbations. These theories are essential for understanding the stability and accuracy of numerical computations. This chapter systematically develops the theory of vector norms and matrix norms, introduces the concept of condition number, and discusses perturbation bounds for eigenvalues and singular values in depth.
15.1 Vector norms¶
Chapter arc: \(\ell_p\) norm family (\(p=1,2,\infty\)) → Holder inequality (\(p,q\) conjugate) → finite-dimensional norm equivalence (unique topology)
Definition 15.1 (Vector norm)
A vector norm on \(\mathbb{C}^n\) is a function \(\|\cdot\| : \mathbb{C}^n \to \mathbb{R}\) satisfying for all \(\mathbf{x}, \mathbf{y} \in \mathbb{C}^n\) and \(\alpha \in \mathbb{C}\):
(1) Non-negativity: \(\|\mathbf{x}\| \geq 0\), with equality if and only if \(\mathbf{x} = \mathbf{0}\);
(2) Homogeneity: \(\|\alpha\mathbf{x}\| = |\alpha|\|\mathbf{x}\|\);
(3) Triangle inequality: \(\|\mathbf{x} + \mathbf{y}\| \leq \|\mathbf{x}\| + \|\mathbf{y}\|\).
Definition 15.2 (\(\ell_p\) norm)
For \(1 \leq p \leq \infty\), the \(\ell_p\) norm on \(\mathbb{C}^n\) is defined as
Of particular importance are:
- \(\|\mathbf{x}\|_1 = \sum_{i=1}^{n}|x_i|\) (Manhattan norm);
- \(\|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^{n}|x_i|^2} = \sqrt{\mathbf{x}^*\mathbf{x}}\) (Euclidean norm);
- \(\|\mathbf{x}\|_\infty = \max_i |x_i|\) (Chebyshev norm).
Theorem 15.1 (Holder inequality)
Let \(p, q \geq 1\) satisfy \(\frac{1}{p} + \frac{1}{q} = 1\) (conjugate exponents). Then for any \(\mathbf{x}, \mathbf{y} \in \mathbb{C}^n\),
In particular, \(p = q = 2\) gives the Cauchy--Schwarz inequality.
Proof
The case \(\|\mathbf{x}\|_p = 0\) or \(\|\mathbf{y}\|_q = 0\) is trivial. Otherwise assume \(\|\mathbf{x}\|_p = \|\mathbf{y}\|_q = 1\). By Young's inequality \(ab \leq \frac{a^p}{p} + \frac{b^q}{q}\) (\(a, b \geq 0\)), setting \(a = |x_i|\), \(b = |y_i|\) and summing gives
Therefore \(|\mathbf{x}^*\mathbf{y}| \leq \sum|x_i||y_i| \leq 1 = \|\mathbf{x}\|_p\|\mathbf{y}\|_q\). The general case is reduced to the above by dividing by \(\|\mathbf{x}\|_p\|\mathbf{y}\|_q\). \(\blacksquare\)
Theorem 15.2 (Norm equivalence theorem)
Any two norms \(\|\cdot\|_\alpha\) and \(\|\cdot\|_\beta\) on \(\mathbb{C}^n\) are equivalent, i.e., there exist positive constants \(c_1, c_2 > 0\) such that for all \(\mathbf{x} \in \mathbb{C}^n\),
Proof
It suffices to show that any norm \(\|\cdot\|\) is equivalent to \(\|\cdot\|_2\). Let \(\mathbf{e}_1, \ldots, \mathbf{e}_n\) be the standard basis. For \(\mathbf{x} = \sum x_i\mathbf{e}_i\),
Set \(c_2 = \sqrt{n}\max_i\|\mathbf{e}_i\|\); then \(\|\mathbf{x}\| \leq c_2\|\mathbf{x}\|_2\). This shows \(\|\cdot\|\) is Lipschitz continuous with respect to \(\|\cdot\|_2\).
Consider the compact set \(S = \{\mathbf{x} : \|\mathbf{x}\|_2 = 1\}\). The continuous function \(\|\cdot\|\) attains its minimum \(c_1 > 0\) on \(S\) (since \(\|\mathbf{x}\| > 0\) for \(\mathbf{x} \in S\)). Therefore for \(\mathbf{x} \neq \mathbf{0}\), \(\|\mathbf{x}/\|\mathbf{x}\|_2\| \geq c_1\), i.e., \(\|\mathbf{x}\| \geq c_1\|\mathbf{x}\|_2\). \(\blacksquare\)
Proposition 15.1 (Relations between \(\ell_p\) norms)
For any \(\mathbf{x} \in \mathbb{C}^n\) and \(1 \leq p \leq q \leq \infty\),
In particular:
- \(\|\mathbf{x}\|_2 \leq \|\mathbf{x}\|_1 \leq \sqrt{n}\|\mathbf{x}\|_2\);
- \(\|\mathbf{x}\|_\infty \leq \|\mathbf{x}\|_2 \leq \sqrt{n}\|\mathbf{x}\|_\infty\);
- \(\|\mathbf{x}\|_\infty \leq \|\mathbf{x}\|_1 \leq n\|\mathbf{x}\|_\infty\).
Example 15.1
Let \(\mathbf{x} = (3, -4, 0, 1)^T\). Compute the \(\ell_p\) norms.
- \(\|\mathbf{x}\|_1 = |3| + |-4| + |0| + |1| = 8\);
- \(\|\mathbf{x}\|_2 = \sqrt{9 + 16 + 0 + 1} = \sqrt{26} \approx 5.10\);
- \(\|\mathbf{x}\|_\infty = \max\{3, 4, 0, 1\} = 4\).
Verification: \(\|\mathbf{x}\|_\infty = 4 \leq 5.10 = \|\mathbf{x}\|_2 \leq 8 = \|\mathbf{x}\|_1\).
\(\|\mathbf{x}\|_1 = 8 \leq \sqrt{4}\cdot 5.10 = 10.2\)? Indeed \(\sqrt{n}\|\mathbf{x}\|_2 = 2 \times 5.10 = 10.2\), and \(\|\mathbf{x}\|_1 = 8 \leq 10.2\).
Example 15.2
Unit balls of \(\ell_p\) norms. The unit ball \(B_p = \{\mathbf{x} \in \mathbb{R}^2 : \|\mathbf{x}\|_p \leq 1\}\) changes shape with \(p\):
- \(p = 1\): diamond (square rotated \(45°\)), vertices at \((\pm 1, 0)\) and \((0, \pm 1)\);
- \(p = 2\): disc;
- \(p = \infty\): square \([-1, 1]^2\);
- \(p \to 0^+\): the unit ball degenerates to line segments on the coordinate axes.
As \(p\) increases, the unit ball expands from a diamond to a circle to a square, reflecting the relationship \(\|\mathbf{x}\|_q \leq \|\mathbf{x}\|_p\) (\(p \leq q\)).
15.2 Matrix norms¶
Chapter arc: Matrix norm + submultiplicativity \(\|AB\|\leq\|A\|\|B\|\) → Frobenius norm \(=\sqrt{\sum\sigma_i^2}\) is a unitarily invariant "entrywise" norm
Definition 15.3 (Matrix norm)
A matrix norm on \(\mathbb{C}^{m \times n}\) is a function \(\|\cdot\| : \mathbb{C}^{m \times n} \to \mathbb{R}\) satisfying for all \(A, B\) and \(\alpha \in \mathbb{C}\):
(1) Non-negativity: \(\|A\| \geq 0\), with equality if and only if \(A = O\);
(2) Homogeneity: \(\|\alpha A\| = |\alpha|\|A\|\);
(3) Triangle inequality: \(\|A + B\| \leq \|A\| + \|B\|\).
If it additionally satisfies (4) submultiplicativity: \(\|AB\| \leq \|A\|\|B\|\) (when the product is defined), it is called a submultiplicative norm.
Definition 15.4 (Frobenius norm)
Let \(A = (a_{ij}) \in \mathbb{C}^{m \times n}\). The Frobenius norm is defined as
Theorem 15.3 (Properties of the Frobenius norm)
The Frobenius norm satisfies:
(1) It is a matrix norm and is submultiplicative;
(2) \(\|A\|_F = \|A^*\|_F\);
(3) For unitary (orthogonal) matrices \(U, V\), \(\|UAV\|_F = \|A\|_F\) (unitary invariance);
(4) \(\|A\|_F^2 = \sum_{i=1}^{r}\sigma_i^2\), where \(\sigma_i\) are the singular values of \(A\);
(5) \(\|A\|_2 \leq \|A\|_F \leq \sqrt{r}\|A\|_2\), where \(r = \operatorname{rank}(A)\).
Proof
(1) Submultiplicativity: \(\|AB\|_F^2 = \sum_{i,k}\left|\sum_j a_{ij}b_{jk}\right|^2 \leq \sum_{i,k}\left(\sum_j|a_{ij}|^2\right)\left(\sum_j|b_{jk}|^2\right)\) (by Cauchy--Schwarz) \(= \sum_i\left(\sum_j|a_{ij}|^2\right)\sum_k\left(\sum_j|b_{jk}|^2\right) = \|A\|_F^2\|B\|_F^2\).
(3) \(\|UAV\|_F^2 = \operatorname{tr}(V^*A^*U^*UAV) = \operatorname{tr}(V^*A^*AV) = \operatorname{tr}(A^*AVV^*) = \operatorname{tr}(A^*A) = \|A\|_F^2\).
(4) By SVD, \(A = U\Sigma V^*\), \(\|A\|_F^2 = \|\Sigma\|_F^2 = \sum\sigma_i^2\).
(5) \(\|A\|_F^2 = \sum\sigma_i^2 \geq \sigma_1^2 = \|A\|_2^2\), and \(\sum\sigma_i^2 \leq r\sigma_1^2\). \(\blacksquare\)
Example 15.3
Let \(A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}\).
\(\|A\|_F = \sqrt{1 + 4 + 9 + 16} = \sqrt{30} \approx 5.48\).
\(A^*A = \begin{pmatrix} 10 & 14 \\ 14 & 20 \end{pmatrix}\), \(\operatorname{tr} = 30\), \(\det = 200 - 196 = 4\). Eigenvalues \(\mu = 15 \pm \sqrt{225 - 4} = 15 \pm \sqrt{221}\). \(\sigma_1 = \sqrt{15 + \sqrt{221}} \approx \sqrt{29.87} \approx 5.47\), \(\sigma_2 = \sqrt{15 - \sqrt{221}} \approx \sqrt{0.13} \approx 0.37\).
Verification: \(\|A\|_F = \sqrt{\sigma_1^2 + \sigma_2^2} = \sqrt{30} \approx 5.48\).
15.3 Operator norms (induced norms)¶
Intuition: \(\|A\|_1\)=max column sum, \(\|A\|_\infty\)=max row sum, \(\|A\|_2=\sigma_1(A)\) · Operator norms are automatically submultiplicative and satisfy \(\rho(A)\leq\|A\|\)
Definition 15.5 (Operator norm)
Let \(\|\cdot\|_\alpha\) and \(\|\cdot\|_\beta\) be vector norms on \(\mathbb{C}^n\) and \(\mathbb{C}^m\) respectively. The operator norm induced by them is defined as
When \(\alpha = \beta = p\), we write \(\|A\|_p\).
Theorem 15.4 (Computation formulas for operator norms)
Let \(A = (a_{ij}) \in \mathbb{C}^{m \times n}\). Then:
(1) \(\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^{m} |a_{ij}|\) (maximum column sum);
(2) \(\|A\|_\infty = \max_{1 \leq i \leq m} \sum_{j=1}^{n} |a_{ij}|\) (maximum row sum);
(3) \(\|A\|_2 = \sigma_1(A)\) (largest singular value).
Proof
(1) Let \(\mathbf{x}\) satisfy \(\|\mathbf{x}\|_1 = 1\). Then
Equality is attained at \(\mathbf{x} = \mathbf{e}_{j^*}\) (the standard basis vector corresponding to the maximum column sum).
(2) Similarly, \(\|A\mathbf{x}\|_\infty = \max_i\left|\sum_j a_{ij}x_j\right| \leq \max_i\sum_j|a_{ij}|\|\mathbf{x}\|_\infty\). Equality is attained for a suitable choice of \(\mathbf{x}\).
(3) \(\|A\mathbf{x}\|_2^2 = \mathbf{x}^*A^*A\mathbf{x}\). \(\max_{\|\mathbf{x}\|_2=1}\mathbf{x}^*A^*A\mathbf{x} = \lambda_{\max}(A^*A) = \sigma_1^2(A)\) (by the Rayleigh quotient). Hence \(\|A\|_2 = \sigma_1(A)\). \(\blacksquare\)
Theorem 15.5 (Properties of operator norms)
Operator norms satisfy:
(1) \(\|I\| = 1\);
(2) Submultiplicativity: \(\|AB\| \leq \|A\|\|B\|\);
(3) Consistency: \(\|A\mathbf{x}\| \leq \|A\|\|\mathbf{x}\|\) (for the corresponding vector norm);
(4) \(\rho(A) \leq \|A\|\).
Proof
(2) \(\|AB\mathbf{x}\| \leq \|A\|\|B\mathbf{x}\| \leq \|A\|\|B\|\|\mathbf{x}\|\); taking the supremum over all \(\|\mathbf{x}\| = 1\) gives \(\|AB\| \leq \|A\|\|B\|\).
(3) Follows directly from the definition.
(4) Already proved in Theorem 14.6. \(\blacksquare\)
Example 15.4
Let \(A = \begin{pmatrix} 1 & -2 & 3 \\ 4 & 0 & -1 \end{pmatrix}\).
- \(\|A\|_1 = \max\{|1|+|4|,\; |-2|+|0|,\; |3|+|-1|\} = \max\{5, 2, 4\} = 5\);
- \(\|A\|_\infty = \max\{|1|+|-2|+|3|,\; |4|+|0|+|-1|\} = \max\{6, 5\} = 6\);
- \(\|A\|_2 = \sigma_1(A)\), which requires computing the largest eigenvalue of \(A^TA\).
Example 15.5
For normal matrices, \(\|A\|_2 = \rho(A)\).
Let \(A = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\) (rotation by \(90°\)). \(A\) is a normal matrix (in fact orthogonal), with eigenvalues \(\pm i\), \(\rho(A) = 1\).
\(A^TA = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = I\), \(\sigma_1 = 1\). Therefore \(\|A\|_2 = 1 = \rho(A)\).
15.4 Relations between norms¶
Key inequalities: \(\|A\|_2\leq\|A\|_F\leq\sqrt{r}\|A\|_2\) · \(\|A\|_2\leq\sqrt{\|A\|_1\|A\|_\infty}\) connects three operator norms
Theorem 15.6 (Inequalities between matrix norms)
Let \(A \in \mathbb{C}^{m \times n}\). Then:
(1) \(\|A\|_2 \leq \|A\|_F \leq \sqrt{n}\|A\|_2\);
(2) \(\frac{1}{\sqrt{n}}\|A\|_\infty \leq \|A\|_2 \leq \sqrt{m}\|A\|_\infty\);
(3) \(\frac{1}{\sqrt{m}}\|A\|_1 \leq \|A\|_2 \leq \sqrt{n}\|A\|_1\);
(4) \(\|A\|_2 \leq \sqrt{\|A\|_1\|A\|_\infty}\);
(5) \(\frac{1}{\sqrt{mn}}\|A\|_F \leq \|A\|_\infty\), \(\frac{1}{\sqrt{mn}}\|A\|_F \leq \|A\|_1\).
Proof
(1) Already proved in Theorem 15.3 (5).
(4) More directly: \(\|A\|_2^2 = \rho(A^*A) \leq \|A^*A\|_\infty \leq \|A^*\|_\infty\|A\|_\infty = \|A\|_1\|A\|_\infty\). \(\blacksquare\)
Example 15.6
For \(A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}\):
- \(\|A\|_1 = \max\{4, 6\} = 6\);
- \(\|A\|_\infty = \max\{3, 7\} = 7\);
- \(\|A\|_F = \sqrt{30} \approx 5.48\);
- \(\|A\|_2 \approx 5.46\) (largest singular value).
Verification of inequality (4): \(\|A\|_2^2 \approx 29.87 \leq 42 = 6 \times 7 = \|A\|_1\|A\|_\infty\).
15.5 Condition number¶
Core: \(\kappa(A)=\|A\|\|A^{-1}\|=\sigma_1/\sigma_n\) · The condition number is a "perturbation amplifier": \(\frac{\|\delta\mathbf{x}\|}{\|\mathbf{x}\|}\leq\kappa(A)\frac{\|\delta\mathbf{b}\|}{\|\mathbf{b}\|}\) · For normal matrices \(\kappa_2=\rho(A)/|\lambda_{\min}|\)
The condition number measures the inherent sensitivity of a linear system to input perturbations, and is one of the most important concepts in numerical linear algebra.
Definition 15.6 (Condition number)
Let \(A \in \mathbb{C}^{n \times n}\) be invertible and \(\|\cdot\|\) a matrix norm. The condition number of \(A\) with respect to \(\|\cdot\|\) is defined as
For the \(\ell_2\) norm, \(\kappa_2(A) = \|A\|_2\|A^{-1}\|_2 = \frac{\sigma_1}{\sigma_n}\) (ratio of the largest to the smallest singular value).
Theorem 15.7 (Basic properties of the condition number)
Let \(A\) be invertible. Then:
(1) \(\kappa(A) \geq 1\);
(2) \(\kappa(\alpha A) = \kappa(A)\) for any \(\alpha \neq 0\);
(3) (Property regarding other invertible matrices of the same size);
(4) If \(U\) is unitary, then \(\kappa_2(U) = 1\);
(5) \(\kappa_2(A) = \kappa_2(A^*)\).
Proof
(1) \(\kappa(A) = \|A\|\|A^{-1}\| \geq \|AA^{-1}\| = \|I\| = 1\).
(2) \(\kappa(\alpha A) = \|\alpha A\|\|(\alpha A)^{-1}\| = |\alpha|\|A\|\frac{1}{|\alpha|}\|A^{-1}\| = \kappa(A)\).
(4) \(\|U\|_2 = 1\) (all singular values of a unitary matrix are \(1\)), \(\|U^{-1}\|_2 = \|U^*\|_2 = 1\).
(5) \(\kappa_2(A^*) = \sigma_1(A^*)/\sigma_n(A^*) = \sigma_1(A)/\sigma_n(A) = \kappa_2(A)\). \(\blacksquare\)
Theorem 15.8 (Perturbation theorem for linear systems)
Let \(A\) be invertible, \(A\mathbf{x} = \mathbf{b}\) (\(\mathbf{b} \neq \mathbf{0}\)). If \(\mathbf{b}\) is perturbed to \(\mathbf{b} + \delta\mathbf{b}\), the corresponding solution becomes \(\mathbf{x} + \delta\mathbf{x}\); then
If \(A\) is perturbed to \(A + \delta A\) (\(\|\delta A\| < \|A^{-1}\|^{-1}\)), the solution becomes \(\mathbf{x} + \delta\mathbf{x}\); then
Proof
Right-hand side perturbation: \(A(\mathbf{x} + \delta\mathbf{x}) = \mathbf{b} + \delta\mathbf{b}\), so \(A\delta\mathbf{x} = \delta\mathbf{b}\), \(\delta\mathbf{x} = A^{-1}\delta\mathbf{b}\).
From \(\|\mathbf{b}\| = \|A\mathbf{x}\| \leq \|A\|\|\mathbf{x}\|\), i.e., \(\frac{1}{\|\mathbf{x}\|} \leq \frac{\|A\|}{\|\mathbf{b}\|}\). Therefore
Coefficient matrix perturbation: \((A + \delta A)(\mathbf{x} + \delta\mathbf{x}) = \mathbf{b} = A\mathbf{x}\). Expanding gives \(A\delta\mathbf{x} + \delta A(\mathbf{x} + \delta\mathbf{x}) = \mathbf{0}\), i.e., \(\delta\mathbf{x} = -A^{-1}\delta A(\mathbf{x} + \delta\mathbf{x})\). Therefore
Example 15.7
The famous Hilbert matrix \(H_n = \left(\frac{1}{i+j-1}\right)_{i,j=1}^{n}\) is a classic example of an ill-conditioned matrix.
- \(H_3 = \begin{pmatrix} 1 & 1/2 & 1/3 \\ 1/2 & 1/3 & 1/4 \\ 1/3 & 1/4 & 1/5 \end{pmatrix}\), \(\kappa_2(H_3) \approx 524\);
- \(\kappa_2(H_5) \approx 4.77 \times 10^5\);
- \(\kappa_2(H_{10}) \approx 1.60 \times 10^{13}\).
The condition number grows exponentially with \(n\), meaning that small perturbations in the input data lead to huge changes in the solution when solving linear systems with Hilbert matrices.
Example 15.8
Let \(A = \begin{pmatrix} 1 & 1 \\ 1 & 1.001 \end{pmatrix}\), \(\mathbf{b} = \begin{pmatrix} 2 \\ 2.001 \end{pmatrix}\), with solution \(\mathbf{x} = \begin{pmatrix} 1 \\ 1 \end{pmatrix}\).
Perturbing \(\mathbf{b}' = \begin{pmatrix} 2 \\ 2.002 \end{pmatrix}\) (relative perturbation about \(0.05\%\)), the new solution is \(\mathbf{x}' = \begin{pmatrix} 0 \\ 2 \end{pmatrix}\).
Relative change in the solution: \(\frac{\|\mathbf{x}'-\mathbf{x}\|_2}{\|\mathbf{x}\|_2} = \frac{\sqrt{2}}{\sqrt{2}} = 1 = 100\%\).
\(\kappa_2(A) \approx 4002\), far greater than \(1\), indicating that this problem is extremely ill-conditioned.
15.6 Perturbation of eigenvalues¶
Chapter arc: Bauer--Fike (\(\kappa(X)\|E\|\) bound) → Weyl (Hermitian matrices: \(|\Delta\lambda_i|\leq\|E\|_2\)) → Wielandt--Hoffman (Frobenius norm bound) · Normal matrices are most stable (\(\kappa=1\))
Theorem 15.9 (Bauer--Fike theorem)
Let \(A \in \mathbb{C}^{n \times n}\) be diagonalizable, \(A = X\Lambda X^{-1}\) (\(\Lambda = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)\)). Let \(\mu\) be any eigenvalue of \(A + E\). Then there exists an eigenvalue \(\lambda_j\) of \(A\) such that
where \(\kappa_p(X) = \|X\|_p\|X^{-1}\|_p\) is the condition number of the eigenvector matrix.
Proof
If \(\mu = \lambda_j\) for some \(j\), the inequality is trivial. Otherwise \(\mu\) is not an eigenvalue of \(A\), so \(\mu I - A\) is invertible. From \((A+E)\mathbf{v} = \mu\mathbf{v}\) (\(\mathbf{v} \neq \mathbf{0}\)), we get \((\mu I - A)\mathbf{v} = E\mathbf{v}\), i.e.,
Therefore \(1 \leq \|(\mu I - A)^{-1}E\|_p \leq \|(\mu I - A)^{-1}\|_p\|E\|_p\).
From \(A = X\Lambda X^{-1}\), \((\mu I - A)^{-1} = X(\mu I - \Lambda)^{-1}X^{-1}\). Hence
\((\mu I - \Lambda)^{-1} = \operatorname{diag}\left(\frac{1}{\mu-\lambda_1}, \ldots, \frac{1}{\mu-\lambda_n}\right)\), whose norm is \(\max_j\frac{1}{|\mu-\lambda_j|} = \frac{1}{\min_j|\mu-\lambda_j|}\).
Therefore \(1 \leq \frac{\kappa_p(X)}{\min_j|\mu-\lambda_j|}\|E\|_p\), i.e., \(\min_j|\mu-\lambda_j| \leq \kappa_p(X)\|E\|_p\). \(\blacksquare\)
Note
The Bauer--Fike theorem shows that the sensitivity of eigenvalues to perturbations is controlled by the condition number \(\kappa(X)\) of the eigenvector matrix. For normal matrices, \(X\) can be taken to be unitary, so \(\kappa_2(X) = 1\), and eigenvalues are least sensitive to perturbations.
Theorem 15.10 (Weyl inequality -- Hermitian matrices)
Let \(A, B\) be \(n \times n\) Hermitian matrices with eigenvalues in decreasing order \(\lambda_1(A) \geq \cdots \geq \lambda_n(A)\) and \(\lambda_1(B) \geq \cdots \geq \lambda_n(B)\). Let the eigenvalues of \(A + B\) be \(\lambda_1(A+B) \geq \cdots \geq \lambda_n(A+B)\). Then for all \(i + j - 1 \leq n\),
In particular, taking \(j = 1\): \(\lambda_i(A+B) \leq \lambda_i(A) + \lambda_1(B)\).
Taking \(i = 1\): \(\lambda_j(A+B) \leq \lambda_1(A) + \lambda_j(B)\).
Proof
Using the minimax principle (Courant--Fischer theorem):
Let \(U\) be the \((n-i+1)\)-dimensional subspace achieving \(\lambda_i(A)\), and \(W\) the \((n-j+1)\)-dimensional subspace achieving \(\lambda_j(B)\). Set \(V = U \cap W\); its dimension is \(\geq (n-i+1) + (n-j+1) - n = n-i-j+2\), so \(\dim V \geq n-(i+j-1)+1\).
For \(\mathbf{x} \in V\), \(\|\mathbf{x}\| = 1\):
By the minimax principle, \(\lambda_{i+j-1}(A+B) \leq \max_{\mathbf{x} \in V, \|\mathbf{x}\|=1}\mathbf{x}^*(A+B)\mathbf{x} \leq \lambda_i(A) + \lambda_j(B)\). \(\blacksquare\)
Theorem 15.11 (Wielandt--Hoffman theorem)
Let \(A, B\) be \(n \times n\) Hermitian matrices with eigenvalues \(\lambda_1(A) \geq \cdots \geq \lambda_n(A)\) and \(\lambda_1(B) \geq \cdots \geq \lambda_n(B)\). Then
Proof
Let \(A = U\Lambda_A U^*\), \(B = V\Lambda_B V^*\) (spectral decompositions). Set \(W = U^*V\); then \(W\) is unitary.
By the von Neumann trace inequality, \(\operatorname{Re}\operatorname{tr}(\Lambda_A W\Lambda_B W^*) \leq \sum_i\lambda_i(A)\lambda_i(B)\) (equality when \(W\) is a permutation matrix). Therefore
Example 15.9
Let \(A = \begin{pmatrix} 5 & 1 \\ 1 & 3 \end{pmatrix}\), \(E = \begin{pmatrix} 0.1 & 0 \\ 0 & -0.1 \end{pmatrix}\).
Eigenvalues of \(A\): \(\lambda = 4 \pm \sqrt{2}\), i.e., \(\lambda_1 \approx 5.41\), \(\lambda_2 \approx 2.59\).
\(A + E = \begin{pmatrix} 5.1 & 1 \\ 1 & 2.9 \end{pmatrix}\), eigenvalues \(\lambda = 4 \pm \sqrt{2.21}\), i.e., \(\lambda_1' \approx 5.49\), \(\lambda_2' \approx 2.51\).
\(\max_i|\lambda_i' - \lambda_i| \approx 0.08 \leq \|E\|_2 = 0.1\) (Weyl inequality for Hermitian matrices gives a \(\kappa_2 = 1\) estimate).
15.7 Perturbation of singular values¶
Intuition: Construct the Hermitian dilation \(\hat{A}=\begin{pmatrix}0&A\\A^*&0\end{pmatrix}\) to reduce singular value perturbation to eigenvalue perturbation → Ch18 unified inequality framework
Theorem 15.12 (Weyl inequality for singular values)
Let \(A, B \in \mathbb{C}^{m \times n}\) with singular values in decreasing order \(\sigma_1(A) \geq \cdots \geq \sigma_p(A)\) and \(\sigma_1(B) \geq \cdots \geq \sigma_p(B)\) (\(p = \min(m,n)\)). Then for each \(i\),
More strongly,
Proof
Construct the \((m+n) \times (m+n)\) Hermitian matrices
The eigenvalues of \(\hat{A}\) are \(\pm\sigma_1(A), \ldots, \pm\sigma_p(A)\) and \(|m-n|\) zeros (if \(m \neq n\)). Similarly for \(\hat{B}\).
Applying the Weyl inequality for Hermitian matrices to \(\hat{A}\) and \(\hat{B}\):
The Frobenius norm version follows similarly from the Wielandt--Hoffman theorem. \(\blacksquare\)
Proposition 15.2 (Subadditivity of singular values)
Let \(A, B \in \mathbb{C}^{m \times n}\). Then for each \(i + j - 1 \leq \min(m,n)\),
Example 15.10
Let \(A = \begin{pmatrix} 3 & 0 \\ 0 & 1 \end{pmatrix}\), \(E = \begin{pmatrix} 0 & 0.2 \\ 0.2 & 0 \end{pmatrix}\).
\(\sigma_1(A) = 3\), \(\sigma_2(A) = 1\). \(B = A + E = \begin{pmatrix} 3 & 0.2 \\ 0.2 & 1 \end{pmatrix}\).
\(B^TB = \begin{pmatrix} 9.04 & 0.8 \\ 0.8 & 1.04 \end{pmatrix}\), eigenvalues \(\mu = 5.04 \pm \sqrt{16.64}\). \(\mu_1 \approx 9.12\), \(\mu_2 \approx 0.96\). \(\sigma_1(B) \approx 3.02\), \(\sigma_2(B) \approx 0.98\).
\(|\sigma_1(B) - \sigma_1(A)| \approx 0.02 \leq 0.2 = \|E\|_2\). \(|\sigma_2(B) - \sigma_2(A)| \approx 0.02 \leq 0.2\).
15.8 Perturbation analysis of linear systems¶
Chapter arc: Small residual \(\not\Rightarrow\) small error (condition number amplification) · \(\frac{1}{\kappa}\cdot\frac{\|\mathbf{r}\|}{\|\mathbf{b}\|}\leq\frac{\|\delta\mathbf{x}\|}{\|\mathbf{x}\|}\leq\kappa\cdot\frac{\|\mathbf{r}\|}{\|\mathbf{b}\|}\) → two-sided bound
Definition 15.7 (Forward error and backward error)
Let \(\hat{\mathbf{x}}\) be an approximate solution of the linear system \(A\mathbf{x} = \mathbf{b}\).
- Forward error: \(\|\hat{\mathbf{x}} - \mathbf{x}\|\) (error in the solution);
- Backward error: \(\|\mathbf{r}\| = \|\mathbf{b} - A\hat{\mathbf{x}}\|\) (norm of the residual);
- Relative forward error: \(\frac{\|\hat{\mathbf{x}} - \mathbf{x}\|}{\|\mathbf{x}\|}\);
- Relative backward error: \(\frac{\|\mathbf{r}\|}{\|\mathbf{b}\|}\).
Theorem 15.13 (Relation between forward and backward errors)
Let \(A\) be invertible and \(\mathbf{r} = \mathbf{b} - A\hat{\mathbf{x}}\) be the residual. Then
Proof
From \(A(\hat{\mathbf{x}} - \mathbf{x}) = A\hat{\mathbf{x}} - \mathbf{b} = -\mathbf{r}\), i.e., \(\hat{\mathbf{x}} - \mathbf{x} = -A^{-1}\mathbf{r}\).
Upper bound: \(\|\hat{\mathbf{x}}-\mathbf{x}\| \leq \|A^{-1}\|\|\mathbf{r}\|\), \(\|\mathbf{b}\| \leq \|A\|\|\mathbf{x}\|\), therefore
Lower bound: \(\|\mathbf{r}\| = \|A(\hat{\mathbf{x}}-\mathbf{x})\| \leq \|A\|\|\hat{\mathbf{x}}-\mathbf{x}\|\), \(\|\mathbf{x}\| \leq \|A^{-1}\|\|\mathbf{b}\|\), therefore
That is, \(\frac{1}{\kappa(A)}\frac{\|\mathbf{r}\|}{\|\mathbf{b}\|} \leq \frac{\|\hat{\mathbf{x}}-\mathbf{x}\|}{\|\mathbf{x}\|}\). \(\blacksquare\)
Theorem 15.14 (Simultaneous perturbation of \(A\) and \(\mathbf{b}\))
Let \((A + \delta A)(\mathbf{x} + \delta\mathbf{x}) = \mathbf{b} + \delta\mathbf{b}\), and \(\kappa(A)\frac{\|\delta A\|}{\|A\|} < 1\). Then
Example 15.11
Solving \(A\mathbf{x} = \mathbf{b}\) in floating-point arithmetic, where \(A = \begin{pmatrix} 1 & 1 \\ 1 & 1.0001 \end{pmatrix}\), \(\mathbf{b} = \begin{pmatrix} 2 \\ 2.0001 \end{pmatrix}\).
Exact solution \(\mathbf{x} = \begin{pmatrix} 1 \\ 1 \end{pmatrix}\).
Suppose the computed solution is \(\hat{\mathbf{x}} = \begin{pmatrix} 1.5 \\ 0.5 \end{pmatrix}\). Residual \(\mathbf{r} = \mathbf{b} - A\hat{\mathbf{x}} = \begin{pmatrix} 0 \\ 0.00005 \end{pmatrix}\).
Relative residual \(\frac{\|\mathbf{r}\|_\infty}{\|\mathbf{b}\|_\infty} = \frac{0.00005}{2.0001} \approx 2.5 \times 10^{-5}\) (very small).
But the relative forward error \(\frac{\|\hat{\mathbf{x}}-\mathbf{x}\|_\infty}{\|\mathbf{x}\|_\infty} = \frac{0.5}{1} = 0.5 = 50\%\) (very large).
This illustrates the amplification effect of the condition number. \(\kappa_\infty(A) \approx 4 \times 10^4\), causing a small residual to correspond to a large error.
Example 15.12
Comparing well-conditioned and ill-conditioned systems.
Well-conditioned system: \(A = \begin{pmatrix} 2 & 0 \\ 0 & 1 \end{pmatrix}\), \(\kappa_2(A) = 2\). A perturbation \(\|\delta\mathbf{b}\|/\|\mathbf{b}\| = 10^{-6}\) leads to \(\|\delta\mathbf{x}\|/\|\mathbf{x}\| \leq 2 \times 10^{-6}\).
Ill-conditioned system: \(B = \begin{pmatrix} 1 & 1 \\ 1 & 1+10^{-10} \end{pmatrix}\), \(\kappa_2(B) \approx 4 \times 10^{10}\). The same perturbation can lead to \(\|\delta\mathbf{x}\|/\|\mathbf{x}\| \leq 4 \times 10^4\), meaning the solution is completely unreliable.