Chapter 9 Quadratic Forms¶
Prerequisites: Ch8 Spectral theorem for symmetric matrices
Chapter arc: \(\mathbf{x}^TA\mathbf{x}\) → Completing the square / Orthogonal diagonalization to canonical form → Law of inertia (signature invariance) → Positive definiteness criteria → Geometry (ellipsoids/hyperboloids)
Further connections:Quadratic forms pervade optimization (quadratic programming), statistics (\(\chi^2\) distribution), and differential geometry (Riemannian metric); the antisymmetric case of bilinear forms yields symplectic forms — symplectic geometry is the mathematical language of classical mechanics (Hamilton's equations) and quantum field theory; Hermitian forms provide the mathematical structure for observables in quantum mechanics
A quadratic form is the algebraic theory of homogeneous quadratic polynomials, which has deep connections with symmetric matrices and inner product spaces. The study of quadratic forms is not only an important part of linear algebra but also has broad applications in differential geometry, optimization theory, statistics, and physics. This chapter systematically studies the definition, simplification methods, the law of inertia, and positive definiteness criteria for quadratic forms.
9.1 Definition of Quadratic Forms¶
Symmetric matrix \(A\) \(\leftrightarrow\) quadratic form \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\) is a one-to-one correspondence; cross terms \(x_ix_j\) are split as \(a_{ij} = a_{ji}\)
Definition 9.1 (Quadratic form)
Let \(\mathbb{F} = \mathbb{R}\) (or \(\mathbb{C}\)). A quadratic form in \(n\) variables \(x_1, x_2, \ldots, x_n\) is a homogeneous quadratic polynomial of the form:
where \(a_{ij} \in \mathbb{F}\). In vector notation, let \(\mathbf{x} = (x_1, \ldots, x_n)^T\), then
where \(A = (a_{ij})_{n \times n}\).
Definition 9.2 (Matrix of a quadratic form)
For the quadratic form \(Q(\mathbf{x}) = \mathbf{x}^T A \mathbf{x}\), we can always assume the matrix \(A\) is symmetric. Indeed, for any matrix \(A\), \(\mathbf{x}^T A \mathbf{x} = \mathbf{x}^T \left(\frac{A + A^T}{2}\right) \mathbf{x}\), and \(\frac{A + A^T}{2}\) is symmetric.
The symmetric matrix \(A\) is called the matrix of the quadratic form \(Q\), and \(\operatorname{rank}(A)\) is called the rank of \(Q\).
Theorem 9.1 (One-to-one correspondence between quadratic forms and symmetric matrices)
There is a one-to-one correspondence between real quadratic forms \(Q(\mathbf{x})\) in \(n\) variables and \(n \times n\) real symmetric matrices \(A\): \(Q(\mathbf{x}) = \mathbf{x}^T A \mathbf{x}\).
Proof
Given a symmetric matrix \(A = (a_{ij})\), \(\mathbf{x}^T A \mathbf{x} = \sum_{i,j} a_{ij}x_ix_j\) is a quadratic form.
Conversely, given a quadratic form \(Q(\mathbf{x}) = \sum_{i \leq j} c_{ij}x_ix_j\) (where \(c_{ii}\) is the coefficient of \(x_i^2\) and \(c_{ij}\) (\(i < j\)) is the coefficient of \(x_ix_j\)), define the symmetric matrix \(A\) with entries \(a_{ii} = c_{ii}\), \(a_{ij} = a_{ji} = c_{ij}/2\) (\(i < j\)). Then \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\).
Uniqueness: If \(\mathbf{x}^TA\mathbf{x} = \mathbf{x}^TB\mathbf{x}\) for all \(\mathbf{x}\), with \(A, B\) both symmetric, then \(\mathbf{x}^T(A-B)\mathbf{x} = 0\) for all \(\mathbf{x}\). Taking \(\mathbf{x} = \mathbf{e}_i\) gives \(a_{ii} = b_{ii}\); taking \(\mathbf{x} = \mathbf{e}_i + \mathbf{e}_j\) gives \(a_{ij} + a_{ji} = b_{ij} + b_{ji}\), and by symmetry \(a_{ij} = b_{ij}\). \(\blacksquare\)
Example 9.1
The symmetric matrix of the quadratic form \(Q(x_1, x_2, x_3) = 2x_1^2 + 3x_2^2 - x_3^2 + 4x_1x_2 - 6x_1x_3 + 2x_2x_3\) is
Note that the cross term \(4x_1x_2\) is split as \(a_{12} = a_{21} = 2\).
Example 9.2
The quadratic form \(Q(x_1, x_2) = x_1^2 + x_2^2\) on \(\mathbb{R}^2\) corresponds to the matrix \(A = I_2\), geometrically representing circles centered at the origin. The quadratic form \(Q(x_1, x_2) = x_1^2 - x_2^2\) corresponds to the matrix \(A = \operatorname{diag}(1, -1)\), geometrically representing a hyperbola.
9.2 Canonical Form of Quadratic Forms¶
Eliminate cross terms → Only \(d_i y_i^2\) remains → Completing the square (Lagrange) is a constructive tool; any quadratic form can be reduced to canonical form
Definition 9.3 (Canonical form)
If a quadratic form \(Q(\mathbf{x})\) contains only squared terms (no cross terms), i.e.,
then \(Q\) is said to be in canonical form (or diagonal form), with matrix \(\operatorname{diag}(d_1, \ldots, d_n)\).
Definition 9.4 (Nonsingular linear substitution)
Let \(\mathbf{x} = C\mathbf{y}\), where \(C\) is an \(n \times n\) invertible matrix. This is called a nonsingular linear substitution. Under this substitution,
The matrix of the new quadratic form is \(B = C^TAC\).
Completing the Square¶
Theorem 9.2 (Reduction to canonical form by completing the square)
Any real quadratic form \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\) (\(A\) symmetric) can be reduced to canonical form by a nonsingular linear substitution.
Proof
By completing the square (Lagrange's method). Two cases:
Case 1: If some \(a_{ii} \neq 0\) (say \(a_{11} \neq 0\)), complete the square in all terms containing \(x_1\):
Let \(y_1 = x_1 + \frac{a_{12}}{a_{11}}x_2 + \cdots + \frac{a_{1n}}{a_{11}}x_n\), \(y_i = x_i\) (\(i \geq 2\)); this is a nonsingular linear substitution. Apply the method recursively to \(Q_1\).
Case 2: If all \(a_{ii} = 0\) but some \(a_{ij} \neq 0\) (\(i \neq j\)). Let \(x_i = y_i + y_j\), \(x_j = y_i - y_j\), and \(x_k = y_k\) for all other \(k\). Then \(2a_{ij}x_ix_j = 2a_{ij}(y_i^2 - y_j^2)\), producing squared terms, reducing to Case 1.
By induction, after finitely many steps, \(Q\) is reduced to canonical form. \(\blacksquare\)
Example 9.3
Reduce to canonical form by completing the square: \(Q(x_1, x_2, x_3) = 2x_1x_2 + 2x_1x_3 - 6x_2x_3\).
All squared-term coefficients are zero (Case 2). Let \(x_1 = y_1 + y_2\), \(x_2 = y_1 - y_2\), \(x_3 = y_3\):
Completing the square in \(y_1\): \(2(y_1^2 - 2y_1y_3) = 2(y_1 - y_3)^2 - 2y_3^2\).
Completing the square in \(y_2\): \(-2(y_2^2 - 4y_2y_3) = -2(y_2 - 2y_3)^2 + 8y_3^2\).
Let \(z_1 = y_1 - y_3\), \(z_2 = y_2 - 2y_3\), \(z_3 = y_3\), yielding the canonical form \(Q = 2z_1^2 - 2z_2^2 + 6z_3^2\).
9.3 Orthogonal Diagonalization to Canonical Form¶
The substitution matrix from completing the square is not unique → Use the Ch8 spectral theorem \(A = Q\Lambda Q^T\) for an orthogonal substitution → Canonical form coefficients = eigenvalues, transformation = isometry
The canonical form obtained by completing the square depends on the order of completion, and the transformation matrix is not unique. The orthogonal diagonalization method (using the spectral theorem) provides the most "natural" approach to canonical form reduction.
Theorem 9.3 (Orthogonal diagonalization method)
Let \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\), where \(A\) is an \(n \times n\) real symmetric matrix. By the spectral theorem, there exists an orthogonal matrix \(P\) such that
where \(\lambda_1, \ldots, \lambda_n\) are the eigenvalues of \(A\). Let \(\mathbf{x} = P\mathbf{y}\) (orthogonal substitution), then
Proof
By the spectral theorem for real symmetric matrices (Theorem 8.15), there exists an orthogonal matrix \(P\) (whose columns are orthonormal eigenvectors of \(A\)) such that \(P^TAP = \Lambda\). Under the substitution \(\mathbf{x} = P\mathbf{y}\):
\(\blacksquare\)
Note
The advantage of orthogonal substitution is that it preserves vector lengths and angles (it is an isometric transformation), making it particularly meaningful in geometric applications. The coefficients in the canonical form obtained by orthogonal diagonalization are exactly the eigenvalues, which is very natural from a theoretical standpoint.
Example 9.4
Reduce to canonical form by orthogonal substitution: \(Q(x_1, x_2) = 5x_1^2 + 4x_1x_2 + 8x_2^2\).
Symmetric matrix \(A = \begin{pmatrix} 5 & 2 \\ 2 & 8 \end{pmatrix}\).
Characteristic polynomial: \(\det(A - \lambda I) = (5-\lambda)(8-\lambda) - 4 = \lambda^2 - 13\lambda + 36 = (\lambda - 4)(\lambda - 9)\).
\(\lambda_1 = 4\): \((A - 4I)\mathbf{x} = \mathbf{0}\) \(\Rightarrow\) \(\begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix}\mathbf{x} = \mathbf{0}\), \(\mathbf{v}_1 = \frac{1}{\sqrt{5}}(-2, 1)^T\).
\(\lambda_2 = 9\): \((A - 9I)\mathbf{x} = \mathbf{0}\) \(\Rightarrow\) \(\begin{pmatrix} -4 & 2 \\ 2 & -1 \end{pmatrix}\mathbf{x} = \mathbf{0}\), \(\mathbf{v}_2 = \frac{1}{\sqrt{5}}(1, 2)^T\).
Orthogonal matrix \(P = \frac{1}{\sqrt{5}}\begin{pmatrix} -2 & 1 \\ 1 & 2 \end{pmatrix}\), let \(\mathbf{x} = P\mathbf{y}\), giving \(Q = 4y_1^2 + 9y_2^2\).
9.4 Law of Inertia¶
The coefficients in the canonical form may vary, but the number of positive coefficients \(p\) and negative coefficients \(q\) are invariant — Sylvester's law of inertia is the core invariant of quadratic form theory
Definition 9.5 (Index of inertia)
Suppose the real quadratic form \(Q(\mathbf{x})\) is reduced to the canonical form
by a nonsingular linear substitution, where \(d_i \neq 0\) (\(i = 1, \ldots, r\)) and \(r = \operatorname{rank}(A)\). Let \(p\) be the number of positive coefficients and \(q = r - p\) the number of negative coefficients. Then
- \(p\) is called the positive index of inertia;
- \(q\) is called the negative index of inertia;
- \((p, q)\) is called the signature of the quadratic form.
Insight: The core of the proof is a dimension argument — \(V_1 \cap V_2 \neq \{0\}\) (since \(\dim V_1 + \dim V_2 > n\)) leads to a contradiction; this technique reappears in the Eckart-Young theorem in Ch11
Theorem 9.4 (Sylvester's law of inertia)
The number of positive coefficients \(p\) and negative coefficients \(q\) in the canonical form of a real quadratic form are invariant, independent of the nonsingular linear substitution used. That is, \(p\) and \(q\) are determined solely by the quadratic form itself.
Proof
Suppose the quadratic form \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\) is reduced to canonical form by two different nonsingular linear substitutions \(\mathbf{x} = C_1\mathbf{y}\) and \(\mathbf{x} = C_2\mathbf{z}\):
(Without loss of generality, the coefficients can be normalized to \(\pm 1\).)
By contradiction, assume \(p \neq s\), say \(p > s\).
Let \(\mathbf{y} = C_1^{-1}\mathbf{x}\), \(\mathbf{z} = C_2^{-1}\mathbf{x}\). Consider the two subspaces:
- \(V_1 = \{\mathbf{x} \in \mathbb{R}^n : y_{p+1} = \cdots = y_n = 0\}\), \(\dim V_1 = p\);
- \(V_2 = \{\mathbf{x} \in \mathbb{R}^n : z_1 = \cdots = z_s = 0\}\), \(\dim V_2 = n - s\).
Since \(\dim V_1 + \dim V_2 = p + (n-s) > n\) (because \(p > s\)), we have \(V_1 \cap V_2 \neq \{\mathbf{0}\}\).
Let \(\mathbf{0} \neq \mathbf{x}_0 \in V_1 \cap V_2\).
- In \(V_1\): \(Q(\mathbf{x}_0) = y_1^2 + \cdots + y_p^2 > 0\) (since \(\mathbf{x}_0 \neq \mathbf{0}\) implies at least one \(y_i \neq 0\), \(i \leq p\));
- In \(V_2\): \(Q(\mathbf{x}_0) = -z_{s+1}^2 - \cdots - z_r^2 \leq 0\).
Contradiction! Therefore \(p = s\). \(\blacksquare\)
Corollary 9.1
Two real quadratic forms are equivalent (i.e., can be transformed into each other by a nonsingular linear substitution) if and only if they have the same rank and the same positive index of inertia (or equivalently, the same signature).
Proposition 9.1
The positive index of inertia of a real symmetric matrix \(A\) equals the number of positive eigenvalues (counting multiplicity), and the negative index of inertia equals the number of negative eigenvalues (counting multiplicity).
Proof
By orthogonal diagonalization, \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\) can be reduced to \(\lambda_1 y_1^2 + \cdots + \lambda_n y_n^2\) by an orthogonal substitution, where \(\lambda_i\) are the eigenvalues of \(A\). By the law of inertia, the positive index of inertia equals the number of positive eigenvalues, and the negative index of inertia equals the number of negative eigenvalues. \(\blacksquare\)
Example 9.5
The matrix of the quadratic form \(Q = x_1^2 + 4x_1x_2 + 4x_2^2 + 2x_3^2\) is
Characteristic polynomial: \(\det(A - \lambda I) = (2-\lambda)[(1-\lambda)(4-\lambda) - 4] = (2-\lambda)(\lambda^2 - 5\lambda) = -\lambda(2-\lambda)(\lambda - 5)\).
Eigenvalues: \(\lambda_1 = 0, \lambda_2 = 2, \lambda_3 = 5\). Positive index of inertia \(p = 2\), negative index of inertia \(q = 0\), rank \(r = 2\).
9.5 Congruence Transformations and Congruent Matrices¶
Similarity \(P^{-1}AP\) (preserves eigenvalues) vs Congruence \(C^TAC\) (preserves signature) — Congruence is the natural equivalence relation for quadratic forms; symmetry and rank are preserved but eigenvalues may change
Definition 9.6 (Congruence)
Let \(A, B\) be \(n \times n\) real matrices. If there exists an invertible matrix \(C\) such that
then \(A\) and \(B\) are said to be congruent, denoted \(A \simeq B\). The map \(\mathbf{x} \mapsto C\mathbf{x}\) is called a congruence transformation.
Proposition 9.2 (Congruence is an equivalence relation)
Matrix congruence is an equivalence relation:
- Reflexivity: \(A \simeq A\) (take \(C = I\));
- Symmetry: If \(A \simeq B\), then \(B \simeq A\);
- Transitivity: If \(A \simeq B\) and \(B \simeq D\), then \(A \simeq D\).
Proof
(2) If \(B = C^TAC\), then \(A = (C^{-1})^T B C^{-1} = (C^{-1})^T B (C^{-1})\), so \(B \simeq A\).
(3) If \(B = C_1^TAC_1\) and \(D = C_2^TBC_2\), then \(D = C_2^T(C_1^TAC_1)C_2 = (C_1C_2)^T A (C_1C_2)\). \(\blacksquare\)
Theorem 9.5 (Congruence canonical form)
Any \(n \times n\) real symmetric matrix \(A\) (of rank \(r\)) is congruent to
where \(p\) is the positive index of inertia and \(q = r - p\) is the negative index of inertia. This canonical form is unique by the law of inertia.
Proof
By completing the square (Theorem 9.2), there exists an invertible matrix \(C_1\) such that \(C_1^TAC_1 = \operatorname{diag}(d_1, \ldots, d_r, 0, \ldots, 0)\), where \(d_i \neq 0\). Without loss of generality, assume \(d_1, \ldots, d_p > 0\) and \(d_{p+1}, \ldots, d_r < 0\). Let
Then \(C_2^T(\operatorname{diag}(d_1, \ldots, d_r, 0, \ldots, 0))C_2 = \operatorname{diag}(1, \ldots, 1, -1, \ldots, -1, 0, \ldots, 0)\).
Taking \(C = C_1C_2\) gives the result. \(\blacksquare\)
Note
Congruence preserves symmetry and rank: if \(A\) is symmetric and \(B = C^TAC\), then \(B\) is also symmetric and \(\operatorname{rank}(B) = \operatorname{rank}(A)\). However, congruence does not preserve eigenvalues. By contrast, similarity preserves eigenvalues but does not necessarily preserve symmetry.
Example 9.6
The matrix \(A = \begin{pmatrix} 1 & 2 \\ 2 & 1 \end{pmatrix}\) has eigenvalues \(3\) and \(-1\), so the positive index of inertia is \(p = 1\) and the negative index of inertia is \(q = 1\). \(A\) is congruent to \(\begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}\).
Verification: Taking \(C = \begin{pmatrix} 1 & -1 \\ 0 & 1 \end{pmatrix}\) (corresponding to the completed square \((x_1+2x_2)^2 - 3x_2^2\) with rescaling), one can explicitly compute the congruence transformation matrix.
9.6 Positive Definite Quadratic Forms and Positive Definite Matrices¶
Signature \((n,0)\) \(\leftrightarrow\) all eigenvalues \(> 0\) \(\leftrightarrow\) \(A = C^TC\) \(\leftrightarrow\) all leading principal minors positive → Positive definite matrices lead directly to Ch10 Cholesky decomposition \(A = LL^T\)
Positive definiteness is one of the most important properties of quadratic forms and symmetric matrices, playing a central role in optimization, statistics, differential equations, and other fields.
Definition 9.7 (Definiteness classification)
Let \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\) be a real quadratic form with \(A\) an \(n \times n\) real symmetric matrix. We say \(Q\) (or \(A\)) is:
- Positive definite: if \(Q(\mathbf{x}) > 0\) for all \(\mathbf{x} \neq \mathbf{0}\);
- Positive semidefinite: if \(Q(\mathbf{x}) \geq 0\) for all \(\mathbf{x}\);
- Negative definite: if \(Q(\mathbf{x}) < 0\) for all \(\mathbf{x} \neq \mathbf{0}\);
- Negative semidefinite: if \(Q(\mathbf{x}) \leq 0\) for all \(\mathbf{x}\);
- Indefinite: if \(Q\) takes both positive and negative values.
Insight: The five equivalent conditions unify the algebraic (eigenvalues), geometric (\(A = C^TC\)), and combinatorial (leading principal minors) perspectives
Theorem 9.6 (Equivalent conditions for positive definiteness)
Let \(A\) be an \(n \times n\) real symmetric matrix. The following conditions are equivalent:
- \(A\) is positive definite;
- All eigenvalues \(\lambda_1, \ldots, \lambda_n\) of \(A\) are positive;
- The positive index of inertia is \(p = n\) (i.e., the canonical form of \(Q\) is \(y_1^2 + \cdots + y_n^2\));
- There exists an invertible matrix \(C\) such that \(A = C^TC\);
- All leading principal minors of \(A\) are positive.
Proof
(1)\(\Leftrightarrow\)(2): If \(A\) is positive definite and \(A\mathbf{v} = \lambda\mathbf{v}\), \(\mathbf{v} \neq \mathbf{0}\), then \(\lambda\|\mathbf{v}\|^2 = \mathbf{v}^TA\mathbf{v} > 0\), so \(\lambda > 0\). Conversely, if all eigenvalues are positive, \(Q(\mathbf{x}) = \mathbf{y}^T\Lambda\mathbf{y} = \sum \lambda_i y_i^2 > 0\) (\(\mathbf{x} \neq \mathbf{0}\)).
(2)\(\Leftrightarrow\)(3): The positive index of inertia equals the number of positive eigenvalues (Proposition 9.1).
(1)\(\Rightarrow\)(4): By orthogonal diagonalization \(A = P\Lambda P^T\), \(\Lambda = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)\), \(\lambda_i > 0\). Let \(C = \Lambda^{1/2}P^T\) (where \(\Lambda^{1/2} = \operatorname{diag}(\sqrt{\lambda_1}, \ldots, \sqrt{\lambda_n})\)), then \(C^TC = P\Lambda^{1/2}\Lambda^{1/2}P^T = P\Lambda P^T = A\).
(4)\(\Rightarrow\)(1): \(\mathbf{x}^TA\mathbf{x} = \mathbf{x}^TC^TC\mathbf{x} = \|C\mathbf{x}\|^2 \geq 0\), with equality if and only if \(C\mathbf{x} = \mathbf{0}\), i.e., \(\mathbf{x} = \mathbf{0}\) (since \(C\) is invertible).
(1)\(\Leftrightarrow\)(5): This is the Sylvester criterion. Let \(A_k\) be the \(k\)-th leading principal submatrix of \(A\) and \(\Delta_k = \det(A_k)\).
If \(A\) is positive definite, then \(A_k\) is also positive definite (for \(\mathbf{x} = (x_1, \ldots, x_k, 0, \ldots, 0)^T\), \(\mathbf{x}^TA\mathbf{x} = \mathbf{y}^TA_k\mathbf{y}\) where \(\mathbf{y} = (x_1, \ldots, x_k)^T\)). \(A_k\) positive definite \(\Rightarrow\) all eigenvalues positive \(\Rightarrow\) \(\Delta_k = \prod \lambda_i^{(k)} > 0\).
Conversely, prove by induction on \(n\). For \(n=1\), \(\Delta_1 = a_{11} > 0\) is positive definite. Assume the result holds for \(n-1\). By \(\Delta_1, \ldots, \Delta_{n-1} > 0\), \(A_{n-1}\) is positive definite. Using the Schur complement, one can show \(A\) is positive definite. \(\blacksquare\)
Theorem 9.7 (Equivalent conditions for positive semidefiniteness)
Let \(A\) be an \(n \times n\) real symmetric matrix. The following conditions are equivalent:
- \(A\) is positive semidefinite;
- All eigenvalues \(\lambda_i \geq 0\);
- There exists a matrix \(B\) (not necessarily invertible) such that \(A = B^TB\);
- All principal minors (not just leading principal minors) of \(A\) are nonnegative.
Proof
The proofs of (1)\(\Leftrightarrow\)(2) and (1)\(\Leftrightarrow\)(3) are similar to the positive definite case.
(4) Necessity: \(A\) positive semidefinite \(\Rightarrow\) every principal submatrix is also positive semidefinite \(\Rightarrow\) principal minors (= products of eigenvalues) \(\geq 0\). \(\blacksquare\)
Note
To test for negative definiteness: \(A\) is negative definite if and only if \(-A\) is positive definite, equivalently \((-1)^k\Delta_k > 0\) (\(k = 1, \ldots, n\)), i.e., odd-order leading principal minors are negative and even-order ones are positive.
Example 9.7
Determine the definiteness of \(A = \begin{pmatrix} 2 & -1 & 0 \\ -1 & 2 & -1 \\ 0 & -1 & 2 \end{pmatrix}\).
\(\Delta_1 = 2 > 0\), \(\Delta_2 = \det\begin{pmatrix} 2 & -1 \\ -1 & 2 \end{pmatrix} = 3 > 0\), \(\Delta_3 = \det(A) = 2(4-1) - (-1)(-2) = 6 - 2 = 4 > 0\).
All leading principal minors are positive, so \(A\) is positive definite.
Example 9.8
Determine the definiteness of \(A = \begin{pmatrix} 1 & 2 \\ 2 & 1 \end{pmatrix}\).
\(\Delta_1 = 1 > 0\), \(\Delta_2 = 1 - 4 = -3 < 0\).
\(A\) is not positive definite. In fact, the eigenvalues of \(A\) are \(3\) and \(-1\), so \(A\) is indefinite.
9.7 Geometric Meaning of Quadratic Forms¶
\(\mathbf{x}^TA\mathbf{x} = c\) defines a quadric surface → Orthogonal substitution along eigenvector directions (principal axes) eliminates cross terms → Surface type is determined by the signature \((p,q)\)
Quadratic forms geometrically describe quadratic curves and surfaces.
Definition 9.8 (Quadric surface)
The set defined by the equation \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x} = c\) (\(c\) a constant) in \(\mathbb{R}^n\) is called a quadric surface. In \(\mathbb{R}^2\) it is a conic section.
Theorem 9.8 (Principal axis theorem)
Let \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\) be a real quadratic form with \(A\) an \(n \times n\) real symmetric matrix. Through the orthogonal substitution \(\mathbf{x} = P\mathbf{y}\) (\(P\) has columns that are orthonormal eigenvectors of \(A\)), the quadratic form reduces to canonical form
The directions of the new coordinate axes \(y_1, y_2, \ldots, y_n\) (i.e., the columns of \(P\)) are called the principal axes of the quadric surface.
Proof
This follows directly from the spectral theorem and orthogonal substitution. The orthogonal substitution amounts to rotating the coordinate system so that the new axes align with the eigenvector directions. In the new coordinates, the quadratic form has no cross terms, giving the simplest form of the quadric surface equation. \(\blacksquare\)
Example 9.9
Classify the conic \(5x_1^2 + 4x_1x_2 + 8x_2^2 = 36\) in \(\mathbb{R}^2\).
From Example 9.4, after orthogonal substitution we get \(4y_1^2 + 9y_2^2 = 36\), i.e., \(\dfrac{y_1^2}{9} + \dfrac{y_2^2}{4} = 1\). This is an ellipse with principal axes along the \(y_1\) and \(y_2\) axes, semi-major axis \(a = 3\) and semi-minor axis \(b = 2\).
The principal axis directions are the eigenvectors of \(A\): \(\mathbf{v}_1 = \frac{1}{\sqrt{5}}(-2, 1)^T\) (for \(\lambda = 4\)) and \(\mathbf{v}_2 = \frac{1}{\sqrt{5}}(1, 2)^T\) (for \(\lambda = 9\)).
Example 9.10
Standard classification of quadric surfaces in \(\mathbb{R}^3\) (with \(Q = \lambda_1 y_1^2 + \lambda_2 y_2^2 + \lambda_3 y_3^2 = 1\)):
- Ellipsoid: \(\lambda_1, \lambda_2, \lambda_3 > 0\), e.g., \(\frac{y_1^2}{a^2} + \frac{y_2^2}{b^2} + \frac{y_3^2}{c^2} = 1\);
- Hyperboloid of one sheet: Two positive, one negative, e.g., \(\frac{y_1^2}{a^2} + \frac{y_2^2}{b^2} - \frac{y_3^2}{c^2} = 1\);
- Hyperboloid of two sheets: One positive, two negative, e.g., \(\frac{y_1^2}{a^2} - \frac{y_2^2}{b^2} - \frac{y_3^2}{c^2} = 1\);
- If \(Q = 0\) (homogeneous case), it is a quadric cone.
The type of quadric surface is determined by the signature \((p, q)\) of the quadratic form.
9.8 Bilinear Forms¶
Prerequisites: Quadratic form \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\) is a "one-variable function" → Bilinear form \(f(\mathbf{x}, \mathbf{y}) = \mathbf{x}^TA\mathbf{y}\) is a "two-variable function" → Through the polarization identity a quadratic form recovers the bilinear form → Symmetric/antisymmetric bilinear forms lead to orthogonal and symplectic groups
A quadratic form is the "diagonal" value \(Q(\mathbf{x}) = f(\mathbf{x}, \mathbf{x})\). To fully understand quadratic forms, one must first understand the more general bilinear forms. Bilinear forms are the most fundamental "scalar functions of two vectors" in linear algebra — inner products, determinants, and area forms are all special cases.
Definition 9.9 (Bilinear form)
Let \(V\) be an \(n\)-dimensional vector space over a field \(\mathbb{F}\). A map \(f: V \times V \to \mathbb{F}\) is called a bilinear form on \(V\) if \(f\) is linear in each variable:
- In the first variable: \(f(\alpha \mathbf{x}_1 + \beta \mathbf{x}_2, \mathbf{y}) = \alpha f(\mathbf{x}_1, \mathbf{y}) + \beta f(\mathbf{x}_2, \mathbf{y})\)
- In the second variable: \(f(\mathbf{x}, \alpha \mathbf{y}_1 + \beta \mathbf{y}_2) = \alpha f(\mathbf{x}, \mathbf{y}_1) + \beta f(\mathbf{x}, \mathbf{y}_2)\)
for all \(\mathbf{x}, \mathbf{x}_1, \mathbf{x}_2, \mathbf{y}, \mathbf{y}_1, \mathbf{y}_2 \in V\) and \(\alpha, \beta \in \mathbb{F}\).
Definition 9.10 (Matrix of a bilinear form)
Let \(\mathcal{B} = \{\mathbf{e}_1, \ldots, \mathbf{e}_n\}\) be a basis of \(V\). The matrix (also called the Gram matrix) of the bilinear form \(f\) with respect to \(\mathcal{B}\) is defined as
If \(\mathbf{x} = \sum x_i \mathbf{e}_i\) and \(\mathbf{y} = \sum y_j \mathbf{e}_j\), then
Theorem 9.9 (Change of basis for bilinear forms)
Let \(f\) have matrix \(A\) with respect to basis \(\mathcal{B}\) and matrix \(A'\) with respect to basis \(\mathcal{B}'\). If \(C\) is the change-of-basis matrix from \(\mathcal{B}\) to \(\mathcal{B}'\), then
Proof
Let \(\mathcal{B}' = \{\mathbf{e}_1', \ldots, \mathbf{e}_n'\}\) with \((\mathbf{e}_1', \ldots, \mathbf{e}_n') = (\mathbf{e}_1, \ldots, \mathbf{e}_n)C\). Then
In matrix form, this is \(A' = C^TAC\). \(\blacksquare\)
Definition 9.11 (Rank and nondegeneracy)
The rank of a bilinear form \(f\) is defined as the rank of its Gram matrix (by Theorem 9.9, this is independent of the choice of basis).
If \(\operatorname{rank}(f) = n\) (i.e., the Gram matrix is invertible), then \(f\) is called nondegenerate. Equivalently, \(f\) is nondegenerate if and only if: whenever \(f(\mathbf{x}, \mathbf{y}) = 0\) for all \(\mathbf{y} \in V\), it follows that \(\mathbf{x} = \mathbf{0}\).
Theorem 9.10 (Polarization identity)
Suppose \(\operatorname{char}(\mathbb{F}) \neq 2\). A symmetric bilinear form \(f\) and its associated quadratic form \(Q(\mathbf{x}) = f(\mathbf{x}, \mathbf{x})\) mutually determine each other through the polarization identity:
Proof
Expand \(Q(\mathbf{x}+\mathbf{y}) = f(\mathbf{x}+\mathbf{y}, \mathbf{x}+\mathbf{y})\):
By symmetry \(f(\mathbf{x}, \mathbf{y}) = f(\mathbf{y}, \mathbf{x})\), so \(Q(\mathbf{x}+\mathbf{y}) = Q(\mathbf{x}) + 2f(\mathbf{x}, \mathbf{y}) + Q(\mathbf{y})\). Solving for \(f(\mathbf{x}, \mathbf{y})\) gives the result. \(\blacksquare\)
Definition 9.12 (Symmetric and antisymmetric bilinear forms)
A bilinear form \(f\) is called
- symmetric: if \(f(\mathbf{x}, \mathbf{y}) = f(\mathbf{y}, \mathbf{x})\) for all \(\mathbf{x}, \mathbf{y}\);
- antisymmetric (skew-symmetric): if \(f(\mathbf{x}, \mathbf{y}) = -f(\mathbf{y}, \mathbf{x})\) for all \(\mathbf{x}, \mathbf{y}\).
The matrix of a symmetric bilinear form satisfies \(A = A^T\); the matrix of an antisymmetric bilinear form satisfies \(A = -A^T\) (with zero diagonal entries).
Theorem 9.11 (Decomposition of bilinear forms)
Suppose \(\operatorname{char}(\mathbb{F}) \neq 2\). Any bilinear form \(f\) can be uniquely decomposed into the sum of a symmetric part and an antisymmetric part:
Proof
Direct verification shows \(f_s\) is symmetric, \(f_a\) is antisymmetric, and \(f = f_s + f_a\). Uniqueness: if \(f = g_s + g_a\) (\(g_s\) symmetric, \(g_a\) antisymmetric), then \(f(\mathbf{x},\mathbf{y}) + f(\mathbf{y},\mathbf{x}) = 2g_s(\mathbf{x},\mathbf{y})\), so \(g_s = f_s\), and thus \(g_a = f_a\). \(\blacksquare\)
Theorem 9.12 (Canonical form of symmetric bilinear forms)
Suppose \(\operatorname{char}(\mathbb{F}) \neq 2\) and \(f\) is a symmetric bilinear form on \(V\). Then there exists a basis \(\{\mathbf{e}_1, \ldots, \mathbf{e}_n\}\) of \(V\) such that
That is, the matrix of \(f\) with respect to this basis is diagonal.
Proof
By induction on \(n\). The case \(n = 1\) is trivial. Assume the result holds for \((n-1)\)-dimensional spaces.
If \(f \equiv 0\), any basis works. Otherwise, there exists \(\mathbf{v}\) with \(f(\mathbf{v}, \mathbf{v}) \neq 0\) (if \(f(\mathbf{v}, \mathbf{v}) = 0\) for all \(\mathbf{v}\), the polarization identity gives \(f \equiv 0\), a contradiction). Let \(\mathbf{e}_1 = \mathbf{v}\) and \(d = f(\mathbf{e}_1, \mathbf{e}_1) \neq 0\).
Let \(W = \{\mathbf{w} \in V : f(\mathbf{e}_1, \mathbf{w}) = 0\}\). Then \(V = \operatorname{span}\{\mathbf{e}_1\} \oplus W\) (for any \(\mathbf{x} \in V\), let \(\mathbf{w} = \mathbf{x} - \frac{f(\mathbf{e}_1, \mathbf{x})}{d}\mathbf{e}_1\), then \(f(\mathbf{e}_1, \mathbf{w}) = 0\)).
By the induction hypothesis, \(f|_W\) can be diagonalized with respect to some basis. Combining \(\mathbf{e}_1\) with the basis of \(W\) gives the result. \(\blacksquare\)
Definition 9.13 (Orthogonal complement)
Let \(f\) be a bilinear form on \(V\) and \(S \subseteq V\). The left orthogonal complement and right orthogonal complement of \(S\) with respect to \(f\) are
If \(f\) is symmetric, then \(S^{\perp_L} = S^{\perp_R}\), denoted simply \(S^\perp\). The radical of \(V\) is \(\operatorname{rad}(f) = V^\perp\). The form \(f\) is nondegenerate if and only if \(\operatorname{rad}(f) = \{\mathbf{0}\}\).
Example 9.11
The bilinear form \(f(\mathbf{x}, \mathbf{y}) = x_1y_1 + x_1y_2 + x_2y_1 + 3x_2y_2 - x_3y_3\) on \(\mathbb{R}^3\) has Gram matrix
\(\det(A) = (3-1)(-1) = -2 \neq 0\), so \(f\) is nondegenerate. Since \(A\) is symmetric, \(f\) is a symmetric bilinear form.
Example 9.12
The antisymmetric bilinear form \(f(\mathbf{x}, \mathbf{y}) = x_1y_2 - x_2y_1\) on \(\mathbb{R}^3\) has matrix
\(\operatorname{rank}(A) = 2\), \(\operatorname{rad}(f) = \operatorname{span}\{\mathbf{e}_3\}\), so \(f\) is degenerate.
Example 9.13
Verifying the polarization identity. Let \(Q(x_1,x_2) = 2x_1^2 + 3x_1x_2 + x_2^2\), with symmetric matrix \(A = \begin{pmatrix} 2 & 3/2 \\ 3/2 & 1 \end{pmatrix}\).
Take \(\mathbf{x} = (1,0)^T\), \(\mathbf{y} = (0,1)^T\):
Theorem 9.13 (Canonical form of antisymmetric bilinear forms)
Let \(f\) be an antisymmetric bilinear form on a finite-dimensional vector space \(V\). Then there exists a basis \(\{\mathbf{e}_1, \mathbf{f}_1, \ldots, \mathbf{e}_m, \mathbf{f}_m, \mathbf{g}_1, \ldots, \mathbf{g}_k\}\) such that
and \(f(\mathbf{g}_s, \cdot) \equiv 0\). In particular, the rank of an antisymmetric bilinear form is always even, equal to \(2m\).
Proof
If \(f \equiv 0\), the result is trivial. Otherwise, there exist \(\mathbf{x}, \mathbf{y}\) with \(f(\mathbf{x}, \mathbf{y}) = c \neq 0\). Let \(\mathbf{e}_1 = \mathbf{x}\) and \(\mathbf{f}_1 = \mathbf{y}/c\), so that \(f(\mathbf{e}_1, \mathbf{f}_1) = 1\).
Let \(W = \{\mathbf{w} : f(\mathbf{e}_1, \mathbf{w}) = 0 \text{ and } f(\mathbf{f}_1, \mathbf{w}) = 0\}\). One can verify that \(V = \operatorname{span}\{\mathbf{e}_1, \mathbf{f}_1\} \oplus W\) (for any \(\mathbf{v}\), let \(\mathbf{w} = \mathbf{v} - f(\mathbf{v}, \mathbf{f}_1)\mathbf{e}_1 + f(\mathbf{v}, \mathbf{e}_1)\mathbf{f}_1\), and check \(\mathbf{w} \in W\)).
Apply the procedure recursively to \(f|_W\). \(\blacksquare\)
9.9 Symplectic Spaces¶
Nondegenerate antisymmetric bilinear form = symplectic form → Symplectic spaces are necessarily even-dimensional → Darboux theorem: all symplectic spaces of the same dimension are equivalent → Symplectic geometry is the mathematical language of Hamiltonian mechanics
A symplectic space is a vector space equipped with a nondegenerate antisymmetric bilinear form. It plays a foundational role in classical mechanics (Hamiltonian systems), quantum mechanics, and modern differential geometry.
Definition 9.14 (Symplectic form and symplectic space)
Let \(V\) be a finite-dimensional vector space over a field \(\mathbb{F}\) (\(\operatorname{char}(\mathbb{F}) \neq 2\)). A symplectic form on \(V\) is a nondegenerate antisymmetric bilinear form \(\omega: V \times V \to \mathbb{F}\). The pair \((V, \omega)\) is called a symplectic space.
Theorem 9.14 (Dimension of a symplectic space)
The dimension of a symplectic space is necessarily even.
Proof
Let \((V, \omega)\) be a symplectic space with \(\dim V = n\). The matrix \(A\) of \(\omega\) with respect to any basis satisfies \(A^T = -A\) (antisymmetric), so
Since \(\omega\) is nondegenerate, \(\det(A) \neq 0\), so \((-1)^n = 1\), meaning \(n\) is even. \(\blacksquare\)
Definition 9.15 (Symplectic basis)
Let \((V, \omega)\) be a \(2n\)-dimensional symplectic space. A basis \(\{\mathbf{e}_1, \ldots, \mathbf{e}_n, \mathbf{f}_1, \ldots, \mathbf{f}_n\}\) of \(V\) is called a symplectic basis (or Darboux basis) if
With respect to a symplectic basis, the matrix of \(\omega\) is the standard symplectic matrix
Theorem 9.15 (Darboux theorem — canonical form of symplectic spaces)
Every symplectic space \((V, \omega)\) admits a symplectic basis. Consequently, all \(2n\)-dimensional symplectic spaces (over the same field) are isomorphic to each other.
Proof
This is a direct corollary of Theorem 9.13 in the nondegenerate case (\(k = 0\)). Since \(\omega\) is nondegenerate, the rank equals \(\dim V = 2m\), and Theorem 9.13 provides a basis \(\{\mathbf{e}_1, \mathbf{f}_1, \ldots, \mathbf{e}_m, \mathbf{f}_m\}\) satisfying \(\omega(\mathbf{e}_i, \mathbf{f}_j) = \delta_{ij}\) and \(\omega(\mathbf{e}_i, \mathbf{e}_j) = \omega(\mathbf{f}_i, \mathbf{f}_j) = 0\). This is precisely a symplectic basis. \(\blacksquare\)
Definition 9.16 (Symplectic matrix and symplectic group)
A \(2n \times 2n\) real matrix \(M\) is called a symplectic matrix if
The set of all \(2n \times 2n\) symplectic matrices forms a group under matrix multiplication, called the symplectic group, denoted \(\operatorname{Sp}(2n, \mathbb{F})\).
Theorem 9.16 (Determinant of a symplectic matrix)
The determinant of a symplectic matrix equals \(1\).
Proof
From \(M^T J M = J\), taking determinants gives \(\det(M)^2 \det(J) = \det(J)\). Since \(\det(J) = 1\) (this can be computed directly or deduced from \(J^2 = -I\) giving \(\det(J)^2 = 1\)), we get \(\det(M)^2 = 1\), so \(\det(M) = \pm 1\).
To show \(\det(M) = 1\), note that the symplectic group \(\operatorname{Sp}(2n, \mathbb{R})\) is connected (one can continuously deform \(M\) to \(I_{2n}\)), and the determinant is a continuous function with \(\det(I_{2n}) = 1\), so \(\det(M) = 1\).
An alternative algebraic proof: write \(M\) in block form \(M = \begin{pmatrix} A & B \\ C & D \end{pmatrix}\). From \(M^TJM = J\) one obtains \(A^TD - C^TB = I\), and the sign of \(\det(M)\) can be determined to be \(+1\) via a Pfaffian argument. \(\blacksquare\)
Definition 9.17 (Lagrangian subspace)
Let \((V, \omega)\) be a \(2n\)-dimensional symplectic space. A subspace \(L \subseteq V\) is called a Lagrangian subspace if \(L = L^\perp\) (with respect to \(\omega\)), i.e.,
and \(\dim L = n\) (the maximum dimension of an isotropic subspace).
Theorem 9.17 (Dimension bound for isotropic subspaces)
Let \((V, \omega)\) be a \(2n\)-dimensional symplectic space and \(L\) an isotropic subspace (i.e., \(\omega|_{L \times L} = 0\)). Then \(\dim L \leq n\), with equality if and only if \(L\) is a Lagrangian subspace.
Proof
Let \(L^\perp = \{\mathbf{v} \in V : \omega(\mathbf{v}, \mathbf{l}) = 0, \forall \mathbf{l} \in L\}\). Since \(\omega\) is nondegenerate, the map \(V \to V^*\), \(\mathbf{v} \mapsto \omega(\mathbf{v}, \cdot)\) is an isomorphism, so \(\dim L^\perp = 2n - \dim L\).
The isotropic condition means \(L \subseteq L^\perp\), so \(\dim L \leq \dim L^\perp = 2n - \dim L\), giving \(\dim L \leq n\).
Equality holds if and only if \(L = L^\perp\), i.e., \(L\) is a Lagrangian subspace. \(\blacksquare\)
Example 9.14
The standard symplectic space \((\mathbb{R}^4, \omega)\) with symplectic form defined by the matrix \(J_4 = \begin{pmatrix} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ -1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \end{pmatrix}\).
The standard symplectic basis is \(\mathbf{e}_1 = (1,0,0,0)^T\), \(\mathbf{e}_2 = (0,1,0,0)^T\), \(\mathbf{f}_1 = (0,0,1,0)^T\), \(\mathbf{f}_2 = (0,0,0,1)^T\).
\(L_1 = \operatorname{span}\{\mathbf{e}_1, \mathbf{e}_2\}\) is a Lagrangian subspace: \(\omega(\mathbf{e}_1, \mathbf{e}_2) = 0\) and \(\dim L_1 = 2 = n\).
\(L_2 = \operatorname{span}\{\mathbf{f}_1, \mathbf{f}_2\}\) is also a Lagrangian subspace.
Example 9.15
An example of a symplectic matrix. The matrix \(M = \begin{pmatrix} a & 0 & 0 & b \\ 0 & d & -c & 0 \\ 0 & c & d & 0 \\ -b & 0 & 0 & a \end{pmatrix}\) (where \(a^2 + b^2 = 1\) and \(c^2 + d^2 = 1\)) is a \(4 \times 4\) symplectic matrix. The condition \(M^TJ_4M = J_4\) can be verified by direct block computation. \(\det(M) = (a^2+b^2)(c^2+d^2) = 1\).
Example 9.16
Application to Hamiltonian mechanics. The phase space \(\mathbb{R}^{2n}\) of classical mechanics uses generalized coordinates \((q_1, \ldots, q_n)\) and generalized momenta \((p_1, \ldots, p_n)\). The symplectic form is
Hamilton's equations \(\dot{q}_i = \frac{\partial H}{\partial p_i}\), \(\dot{p}_i = -\frac{\partial H}{\partial q_i}\) can be compactly written as
where \(\mathbf{z} = (q_1, \ldots, q_n, p_1, \ldots, p_n)^T\). Canonical transformations (variable substitutions that preserve the form of Hamilton's equations) correspond precisely to symplectic matrices.
9.10 Hermitian Forms¶
Real \(\to\) Complex: inner products change from bilinear to sesquilinear → Hermitian form \(H(\mathbf{x}) = \mathbf{x}^*A\mathbf{x}\) (\(A = A^*\) Hermitian matrix) → Law of inertia generalized to the complex field → Signature remains a complete invariant
When the base field extends from \(\mathbb{R}\) to \(\mathbb{C}\), the natural generalization of symmetric bilinear forms is the Hermitian form. Hermitian forms play a fundamental role in quantum mechanics (observables correspond to Hermitian operators) and unitary geometry.
Definition 9.18 (Sesquilinear form)
Let \(V\) be a complex vector space. A map \(f: V \times V \to \mathbb{C}\) is called a sesquilinear form if
- It is conjugate-linear in the first variable: \(f(\alpha\mathbf{x}_1 + \beta\mathbf{x}_2, \mathbf{y}) = \bar{\alpha}f(\mathbf{x}_1, \mathbf{y}) + \bar{\beta}f(\mathbf{x}_2, \mathbf{y})\)
- It is linear in the second variable: \(f(\mathbf{x}, \alpha\mathbf{y}_1 + \beta\mathbf{y}_2) = \alpha f(\mathbf{x}, \mathbf{y}_1) + \beta f(\mathbf{x}, \mathbf{y}_2)\)
(Here we follow the physics convention where the first variable is conjugated. Some mathematical references use the opposite convention.)
Definition 9.19 (Hermitian form)
A sesquilinear form \(f\) is called a Hermitian form if it satisfies conjugate symmetry:
In particular, \(f(\mathbf{x}, \mathbf{x}) = \overline{f(\mathbf{x}, \mathbf{x})}\), so \(f(\mathbf{x}, \mathbf{x}) \in \mathbb{R}\).
Definition 9.20 (Hermitian matrix representation)
Let \(\mathcal{B} = \{\mathbf{e}_1, \ldots, \mathbf{e}_n\}\) be a basis of \(V\), and let \(A = (a_{ij})\) with \(a_{ij} = f(\mathbf{e}_i, \mathbf{e}_j)\) be the matrix of the Hermitian form \(f\). Then
where \(\mathbf{x}^* = \bar{\mathbf{x}}^T\) is the conjugate transpose. The matrix \(A\) satisfies \(A^* = A\) (Hermitian matrix).
Theorem 9.18 (Change of basis for Hermitian forms)
Let the Hermitian form \(f\) have matrix \(A\) with respect to basis \(\mathcal{B}\) and matrix \(A'\) with respect to basis \(\mathcal{B}'\), with change-of-basis matrix \(C\). Then
That is, \(A'\) and \(A\) are \(^*\)-congruent (conjugate congruent).
Proof
Let \(\mathcal{B}' = \{\mathbf{e}_1', \ldots, \mathbf{e}_n'\}\) with \((\mathbf{e}_1', \ldots, \mathbf{e}_n') = (\mathbf{e}_1, \ldots, \mathbf{e}_n)C\). Then
That is, \(A' = C^*AC\). \(\blacksquare\)
Theorem 9.19 (Canonical form of Hermitian forms)
Let \(f\) be a Hermitian form on a complex vector space \(V\). There exists a basis of \(V\) with respect to which the matrix of \(f\) is a diagonal matrix \(\operatorname{diag}(d_1, \ldots, d_n)\) with \(d_i \in \mathbb{R}\).
Furthermore, after appropriate rescaling, this can be reduced to
Proof
The proof is analogous to the completing-the-square method for real symmetric bilinear forms. If there exists \(\mathbf{v}\) with \(f(\mathbf{v}, \mathbf{v}) \neq 0\) (this value is real), let \(\mathbf{e}_1 = \mathbf{v}\), take the orthogonal complement \(W = \{\mathbf{w} : f(\mathbf{e}_1, \mathbf{w}) = 0\}\), and diagonalize \(f|_W\) by induction.
If \(f(\mathbf{v}, \mathbf{v}) = 0\) for all \(\mathbf{v}\), the Hermitian version of the polarization identity
shows \(f \equiv 0\). Thus the induction proceeds.
Rescaling: if \(d_k > 0\), let \(\mathbf{e}_k' = \mathbf{e}_k/\sqrt{d_k}\); if \(d_k < 0\), let \(\mathbf{e}_k' = \mathbf{e}_k/\sqrt{|d_k|}\). \(\blacksquare\)
Theorem 9.20 (Law of inertia for Hermitian forms)
The number of positive terms \(p\) and negative terms \(q\) in the canonical form of a Hermitian form are invariants, independent of the choice of basis. The pair \((p, q)\) is called the signature of the Hermitian form.
Proof
The proof is entirely analogous to Sylvester's law of inertia for real quadratic forms (Theorem 9.4). Assuming two canonical forms have different numbers of positive terms \(p > s\), construct subspaces \(V_1\) (dimension \(p\), on which \(f > 0\)) and \(V_2\) (dimension \(n-s\), on which \(f \leq 0\)). The dimension argument \(p + (n-s) > n\) gives \(V_1 \cap V_2 \neq \{\mathbf{0}\}\), leading to a contradiction. \(\blacksquare\)
Theorem 9.21 (Spectral theorem for Hermitian matrices)
A Hermitian matrix \(A\) (\(A^* = A\)) has all real eigenvalues, and \(A\) can be unitarily diagonalized: there exists a unitary matrix \(U\) such that
Proof
Eigenvalues are real: If \(A\mathbf{v} = \lambda\mathbf{v}\), \(\mathbf{v} \neq \mathbf{0}\), then \(\lambda \mathbf{v}^*\mathbf{v} = \mathbf{v}^*A\mathbf{v} = (A\mathbf{v})^*\mathbf{v} = \bar{\lambda}\mathbf{v}^*\mathbf{v}\) (using \(A^* = A\)), so \(\lambda = \bar{\lambda}\), meaning \(\lambda \in \mathbb{R}\).
Unitary diagonalization: Eigenvectors for distinct eigenvalues are orthogonal (the proof is the same as in the real case). Apply Gram-Schmidt orthogonalization within each eigenspace and combine to form the unitary matrix \(U\). \(\blacksquare\)
Definition 9.21 (Positive definite Hermitian form)
A Hermitian form \(f\) is called positive definite if \(f(\mathbf{x}, \mathbf{x}) > 0\) for all \(\mathbf{x} \neq \mathbf{0}\). The equivalent conditions are analogous to the real case:
- All eigenvalues of the Hermitian matrix \(A\) are positive;
- There exists an invertible matrix \(C\) such that \(A = C^*C\);
- All leading principal minors of \(A\) are positive.
Example 9.17
The Hermitian form \(f(\mathbf{x}, \mathbf{y}) = 2\bar{x}_1y_1 + (1+i)\bar{x}_1y_2 + (1-i)\bar{x}_2y_1 + 3\bar{x}_2y_2\) has Hermitian matrix
Verification that \(A^* = A\): \(\overline{(1+i)} = 1-i = a_{21}\) \(\checkmark\). \(\Delta_1 = 2 > 0\), \(\Delta_2 = 6 - |1+i|^2 = 6 - 2 = 4 > 0\), so \(A\) is positive definite.
Example 9.18
Find the eigenvalues and unitary diagonalization of the Hermitian matrix \(A = \begin{pmatrix} 1 & i \\ -i & 1 \end{pmatrix}\).
Characteristic polynomial: \((1-\lambda)^2 - i(-i) = (1-\lambda)^2 - 1 = \lambda^2 - 2\lambda = \lambda(\lambda - 2)\).
\(\lambda_1 = 0\): \(\mathbf{v}_1 = \frac{1}{\sqrt{2}}(i, 1)^T\).
\(\lambda_2 = 2\): \(\mathbf{v}_2 = \frac{1}{\sqrt{2}}(-i, 1)^T\).
Unitary matrix \(U = \frac{1}{\sqrt{2}}\begin{pmatrix} i & -i \\ 1 & 1 \end{pmatrix}\), \(U^*AU = \begin{pmatrix} 0 & 0 \\ 0 & 2 \end{pmatrix}\).
Signature \((p, q) = (1, 0)\), rank \(r = 1\). \(A\) is positive semidefinite but not positive definite.
Example 9.19
Anti-Hermitian (skew-Hermitian) forms. If a sesquilinear form \(f\) satisfies \(f(\mathbf{x}, \mathbf{y}) = -\overline{f(\mathbf{y}, \mathbf{x})}\), then \(f(\mathbf{x}, \mathbf{x})\) is purely imaginary. The corresponding matrix \(A\) satisfies \(A^* = -A\) (anti-Hermitian matrix), and all its eigenvalues are purely imaginary.
For example, \(A = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}\) is an anti-Hermitian matrix (also a real antisymmetric matrix) with eigenvalues \(\pm i\).