14  Diagonalization and the Spectral Theorem

14.1 Diagonalizable Matrices: \(A = PDP^{-1}\)

Chapter 13 taught us to find eigenvalues and eigenvectors of a matrix \(A\). This chapter asks a more structural question: can we choose a basis of eigenvectors and, if so, what does \(A\) look like in that basis?

The answer is the eigendecomposition (also called diagonalization):

\[A = P D P^{-1},\]

where the columns of \(P\) are eigenvectors and \(D\) is the diagonal matrix of the corresponding eigenvalues. When this factorization exists, \(A\) is called diagonalizable.

Why does this matter? A diagonal matrix is trivial to work with — powers, functions, and the matrix exponential all reduce to scalar operations on the diagonal entries. Diagonalization converts \(A\) from an arbitrary matrix into something as easy as a collection of independent scalars.


14.1.1 Construction: Building \(P\) and \(D\)

Suppose \(A\) has \(n\) linearly independent eigenvectors \(\mathbf{v}_1, \ldots, \mathbf{v}_n\) with eigenvalues \(\lambda_1, \ldots, \lambda_n\). Stack the eigenvectors as columns:

\[P = \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix}, \qquad D = \begin{bmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_n \end{bmatrix}.\]

Then:

\[AP = A\begin{bmatrix}\mathbf{v}_1 & \cdots & \mathbf{v}_n\end{bmatrix} = \begin{bmatrix}\lambda_1\mathbf{v}_1 & \cdots & \lambda_n\mathbf{v}_n\end{bmatrix} = PD.\]

Since the eigenvectors are linearly independent, \(P\) is invertible, so \(A = PDP^{-1}\).

import numpy as np

A = np.array([[4., 1.],
              [2., 3.]])   # shape (2, 2)

vals, vecs = np.linalg.eig(A)   # vals shape (2,), vecs shape (2, 2)
P    = vecs                      # columns are eigenvectors
D    = np.diag(vals)             # shape (2, 2)
Pinv = np.linalg.inv(P)         # shape (2, 2)

A_reconstructed = P @ D @ Pinv   # shape (2, 2)

print("A =\n", A)
print("\nP (eigenvectors as columns) =\n", P.round(4))
print("\nD (eigenvalues on diagonal) =\n", D.round(4))
print("\nPDP⁻¹ =\n", A_reconstructed.round(4))
print("\nMax reconstruction error:", np.abs(A - A_reconstructed).max().round(12))
A =
 [[4. 1.]
 [2. 3.]]

P (eigenvectors as columns) =
 [[ 0.7071 -0.4472]
 [ 0.7071  0.8944]]

D (eigenvalues on diagonal) =
 [[5. 0.]
 [0. 2.]]

PDP⁻¹ =
 [[4. 1.]
 [2. 3.]]

Max reconstruction error: 0.0

14.1.2 When Is a Matrix Diagonalizable?

The key condition is the existence of \(n\) linearly independent eigenvectors.

Condition Diagonalizable?
\(n\) distinct eigenvalues Always yes
Repeated eigenvalue \(\lambda_k\) with \(k\) independent eigenvectors Yes
Repeated eigenvalue \(\lambda_k\) with fewer than \(k\) independent eigenvectors No (defective)
Symmetric matrix (\(A = A^T\)) Always yes (Spectral Theorem, §14.3)

The most common practical case: if all eigenvalues are distinct, the matrix is guaranteed to be diagonalizable.

import numpy as np

# Case 1: distinct eigenvalues — always diagonalizable
A1 = np.array([[3., 1.],
               [0., 2.]])   # shape (2, 2)

# Case 2: repeated eigenvalue, full eigenspace — diagonalizable (scalar multiple of I)
A2 = np.array([[5., 0.],
               [0., 5.]])   # shape (2, 2)

# Case 3: defective — NOT diagonalizable
A3 = np.array([[5., 1.],
               [0., 5.]])   # shape (2, 2)

for label, A in [("Distinct eigenvalues", A1),
                 ("Repeated, full eigenspace", A2),
                 ("Defective (Jordan block)", A3)]:
    vals, vecs = np.linalg.eig(A)
    rank = np.linalg.matrix_rank(vecs)
    n    = A.shape[0]
    tag  = "✓ diagonalizable" if rank == n else "✗ NOT diagonalizable"
    print(f"{label:35s}  eigenvalues={np.round(vals.real, 2)}  {tag}")
Distinct eigenvalues                 eigenvalues=[3. 2.]  ✓ diagonalizable
Repeated, full eigenspace            eigenvalues=[5. 5.]  ✓ diagonalizable
Defective (Jordan block)             eigenvalues=[5. 5.]  ✓ diagonalizable

14.1.3 A Geometric Interpretation

Diagonalization is a change of basis. In the eigenvector basis \(\mathcal{B} = \{\mathbf{v}_1, \ldots, \mathbf{v}_n\}\), the transformation \(A\) acts as pure scaling along coordinate axes:

\[[A]_{\mathcal{B}} = D = P^{-1} A P.\]

Think of it this way: \(P^{-1}\) re-expresses any vector in the eigenvector coordinate system, \(D\) scales each component by the corresponding eigenvalue, and \(P\) converts back to the standard basis. The round-trip is \(A\).

import numpy as np
import matplotlib.pyplot as plt

A = np.array([[3., 1.],
              [0., 2.]])   # shape (2, 2)

vals, vecs = np.linalg.eig(A)
P    = vecs
Pinv = np.linalg.inv(P)

# Unit circle
theta  = np.linspace(0, 2 * np.pi, 300)
circle = np.vstack([np.cos(theta), np.sin(theta)])   # shape (2, 300)

image_A   = A @ circle                             # shape (2, 300)
image_PDP = P @ np.diag(vals) @ Pinv @ circle      # shape (2, 300)

fig, axes = plt.subplots(1, 3, figsize=(12, 4))

for ax, data, title in zip(
        axes,
        [circle, image_A, image_PDP],
        ['Input (unit circle)', r'$A\mathbf{x}$ (direct)', r'$PDP^{-1}\mathbf{x}$ (via diag.)']):
    ax.plot(data[0], data[1], color='#4a90d9', lw=1.5)
    ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.2)
    ax.set_title(title, fontsize=9)

# Show eigenvector directions in the output panel
colors = ['tomato', '#2ecc71']
for i, (lam, v) in enumerate(zip(vals.real, vecs.T)):
    axes[1].annotate('', xy=(v[0]*lam, v[1]*lam), xytext=(0, 0),
                     arrowprops=dict(arrowstyle='->', color=colors[i], lw=2))
    axes[1].text(v[0]*lam*1.1, v[1]*lam*1.1 + 0.1,
                 rf'$\lambda={lam:.0f}$', color=colors[i], fontsize=8)

plt.suptitle(r'$A = PDP^{-1}$: change to eigenbasis → scale → change back',
             fontsize=10)
plt.tight_layout()
plt.show()

The middle and right panels are identical — confirming that \(A\) and \(PDP^{-1}\) are the same transformation.


14.1.4 Diagonalizability and Similarity

Two matrices \(A\) and \(B\) are similar if \(B = P^{-1}AP\) for some invertible \(P\). Diagonalizable matrices are exactly those similar to a diagonal matrix. Similar matrices represent the same linear transformation in different bases, so they share:

  • The same eigenvalues (with the same algebraic multiplicities)
  • The same determinant
  • The same trace
  • The same rank
import numpy as np

rng = np.random.default_rng(42)
A = rng.standard_normal((4, 4))   # shape (4, 4)
P = rng.standard_normal((4, 4))   # shape (4, 4) — change-of-basis matrix
B = np.linalg.inv(P) @ A @ P      # shape (4, 4) — similar to A

print("tr(A) = {:.4f},  tr(B) = {:.4f}".format(np.trace(A), np.trace(B)))
print("det(A) = {:.4f},  det(B) = {:.4f}".format(
    np.linalg.det(A), np.linalg.det(B)))

lam_A = np.sort(np.linalg.eigvals(A).real)
lam_B = np.sort(np.linalg.eigvals(B).real)
print("eigenvalues of A:", lam_A.round(4))
print("eigenvalues of B:", lam_B.round(4))
tr(A) = -0.9774,  tr(B) = -0.9774
det(A) = 0.5722,  det(B) = 0.5722
eigenvalues of A: [-2.1778 -0.5658  0.3214  1.4448]
eigenvalues of B: [-2.1778 -0.5658  0.3214  1.4448]

14.1.5 Exercises

14.1.1 Verify by hand that \(A = \begin{bmatrix}1 & 2 \\ 0 & 3\end{bmatrix}\) is diagonalizable. Write out \(P\), \(D\), and \(P^{-1}\) explicitly, then confirm \(PDP^{-1} = A\).

14.1.2 The matrix \(A = \begin{bmatrix}2 & 0 \\ 0 & 2\end{bmatrix}\) has a repeated eigenvalue. Is it diagonalizable? What is \(P\) in that case — is \(P\) unique?

14.1.3 For the defective matrix \(A = \begin{bmatrix}5 & 1 \\ 0 & 5\end{bmatrix}\), how many linearly independent eigenvectors does it have? Attempt to compute the reconstruction error \(\|A - PDP^{-1}\|\) with numpy and explain why it is nonzero.

14.1.4 If \(A = PDP^{-1}\), show that \(A^{-1} = PD^{-1}P^{-1}\), provided all eigenvalues are nonzero. What are the eigenvalues of \(A^{-1}\)?

14.1.5 Show that if \(A = PDP^{-1}\) and \(B = PEP^{-1}\) (same \(P\)), then \(AB = P(DE)P^{-1}\). Conclude that two matrices sharing the same eigenvector matrix commute: \(AB = BA\).

14.2 Matrix Powers and the Matrix Exponential

Diagonalization is not just an algebraic curiosity — it unlocks two operations that are essential for dynamics and control: matrix powers \(A^k\) and the matrix exponential \(e^{At}\).

Both become trivial once \(A = PDP^{-1}\).


14.2.1 Matrix Powers: \(A^k = PD^kP^{-1}\)

Compute \(A^2\):

\[A^2 = (PDP^{-1})(PDP^{-1}) = PD(P^{-1}P)DP^{-1} = PD^2P^{-1}.\]

By induction, for any positive integer \(k\):

\[\boxed{A^k = P D^k P^{-1},}\]

where \(D^k = \operatorname{diag}(\lambda_1^k, \ldots, \lambda_n^k)\) — just raise each diagonal entry to the \(k\)-th power. This reduces an \(n\times n\) matrix exponentiation to \(n\) scalar exponentiations.

Why is this useful? Discrete-time dynamical systems evolve as \(\mathbf{x}_k = A^k \mathbf{x}_0\). The state after 1000 steps is trivially \(P D^{1000} P^{-1} \mathbf{x}_0\) — no repeated matrix multiplication required.

import numpy as np

A = np.array([[2., 1.],
              [0., 3.]])   # shape (2, 2) — upper triangular, eigenvalues 2 and 3

vals, vecs = np.linalg.eig(A)
P    = vecs                # shape (2, 2)
Pinv = np.linalg.inv(P)   # shape (2, 2)

k = 10
# Via diagonalization: fast
Ak_diag = P @ np.diag(vals**k) @ Pinv   # shape (2, 2)

# Via repeated multiplication: reference
Ak_ref = np.linalg.matrix_power(A, k)   # shape (2, 2)

print(f"A^{k} via PD^kP⁻¹ =\n", Ak_diag.round(2))
print(f"A^{k} via np.linalg.matrix_power =\n", Ak_ref.round(2))
print("Max error:", np.abs(Ak_diag - Ak_ref).max().round(8))
A^10 via PD^kP⁻¹ =
 [[ 1024. 58025.]
 [    0. 59049.]]
A^10 via np.linalg.matrix_power =
 [[ 1024. 58025.]
 [    0. 59049.]]
Max error: 0.0

14.2.2 Long-Term Behavior: Which Eigenvalue Dominates?

Write the initial state \(\mathbf{x}_0\) in the eigenvector basis: \(\mathbf{x}_0 = P\mathbf{c}\), so \(\mathbf{c} = P^{-1}\mathbf{x}_0\). Then:

\[A^k \mathbf{x}_0 = P D^k \mathbf{c} = \sum_{i=1}^n c_i \lambda_i^k \mathbf{v}_i.\]

For large \(k\) the term with the largest \(|\lambda_i|\) dominates — the dominant eigenvalue governs the long-term direction and growth.

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(11)
A = np.array([[0.9,  0.4],
              [0.1,  0.7]])   # shape (2, 2)

vals, vecs = np.linalg.eig(A)
P    = vecs
Pinv = np.linalg.inv(P)

x0 = np.array([1.0, 0.5])   # shape (2,)
steps = 40

trajectory = np.zeros((steps + 1, 2))   # shape (41, 2)
trajectory[0] = x0
for k in range(1, steps + 1):
    Dk = np.diag(vals**k)
    trajectory[k] = (P @ Dk @ Pinv @ x0).real

fig, axes = plt.subplots(1, 2, figsize=(11, 4))

axes[0].plot(trajectory[:, 0], color='#4a90d9', label='$x_1$')
axes[0].plot(trajectory[:, 1], color='tomato',  label='$x_2$')
axes[0].set_xlabel('step $k$')
axes[0].set_ylabel('component value')
axes[0].set_title(r'$\mathbf{x}_k = A^k \mathbf{x}_0$')
axes[0].legend()
axes[0].grid(True, alpha=0.2)

axes[1].plot(trajectory[1:, 0] / trajectory[:-1, 0],
             color='#4a90d9', label='ratio $x_1(k)/x_1(k-1)$')
axes[1].axhline(max(abs(vals.real)), color='tomato', ls='--',
                label=rf'dominant $|\lambda|={max(abs(vals.real)):.3f}$')
axes[1].set_xlabel('step $k$')
axes[1].set_title('Component ratio converges to dominant eigenvalue')
axes[1].legend(fontsize=8)
axes[1].grid(True, alpha=0.2)

plt.tight_layout()
plt.show()

print("Eigenvalues:", vals.round(4))
print("Dominant eigenvalue:", vals[np.argmax(np.abs(vals))].real.round(4))

Eigenvalues: [1.0236 0.5764]
Dominant eigenvalue: 1.0236

14.2.3 The Matrix Exponential: \(e^{At} = Pe^{Dt}P^{-1}\)

For continuous-time systems \(\dot{\mathbf{x}} = A\mathbf{x}\), the solution is \(\mathbf{x}(t) = e^{At}\mathbf{x}_0\). The matrix exponential is defined by its Taylor series:

\[e^{At} = I + At + \frac{(At)^2}{2!} + \frac{(At)^3}{3!} + \cdots\]

For a diagonalizable \(A = PDP^{-1}\):

\[e^{At} = P e^{Dt} P^{-1}, \qquad e^{Dt} = \operatorname{diag}(e^{\lambda_1 t}, \ldots, e^{\lambda_n t}).\]

Again, \(n\) scalar exponentials replace a full matrix series.

import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import expm   # reference implementation

A = np.array([[-1.,  2.],
              [ 0., -3.]])   # shape (2, 2) — stable system (both eigenvalues < 0)

vals, vecs = np.linalg.eig(A)
P    = vecs
Pinv = np.linalg.inv(P)

x0 = np.array([2.0, 1.0])   # shape (2,)
t  = np.linspace(0, 4, 300)

# Trajectory via matrix exponential (diagonalization)
traj_diag  = np.zeros((len(t), 2))   # shape (300, 2)
traj_scipy = np.zeros((len(t), 2))   # shape (300, 2)

for i, ti in enumerate(t):
    eDt           = np.diag(np.exp(vals * ti))
    eAt_diag      = P @ eDt @ Pinv
    traj_diag[i]  = (eAt_diag @ x0).real
    traj_scipy[i] = expm(A * ti) @ x0

fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(t, traj_diag[:, 0],  color='#4a90d9', lw=2, label=r'$x_1(t)$ via $PDP^{-1}$')
ax.plot(t, traj_diag[:, 1],  color='#2ecc71', lw=2, label=r'$x_2(t)$ via $PDP^{-1}$')
ax.plot(t, traj_scipy[:, 0], color='tomato', ls='--', lw=1.5, label='scipy expm (reference)')
ax.plot(t, traj_scipy[:, 1], color='#e67e22', ls='--', lw=1.5)
ax.set_xlabel('time $t$')
ax.set_ylabel('state')
ax.set_title(r'$\dot{\mathbf{x}} = A\mathbf{x}$: solution via $e^{At}\mathbf{x}_0$')
ax.legend(fontsize=8)
ax.grid(True, alpha=0.2)
plt.tight_layout()
plt.show()


14.2.4 Solving \(\dot{\mathbf{x}} = A\mathbf{x}\): Step by Step

Given \(A = PDP^{-1}\) and initial condition \(\mathbf{x}(0) = \mathbf{x}_0\):

  1. Change variables: let \(\mathbf{y} = P^{-1}\mathbf{x}\). Then \(\dot{\mathbf{y}} = D\mathbf{y}\) — a diagonal system.
  2. Each component decouples: \(\dot{y}_i = \lambda_i y_i\), with solution \(y_i(t) = c_i e^{\lambda_i t}\) where \(c_i = (P^{-1}\mathbf{x}_0)_i\).
  3. Convert back: \(\mathbf{x}(t) = P\mathbf{y}(t) = \sum_i c_i e^{\lambda_i t} \mathbf{v}_i\).

This modal decomposition reveals the natural modes of the system — each eigenvector direction oscillates or decays independently at its own rate \(\lambda_i\).

import numpy as np

A = np.array([[-1.,  2.],
              [ 0., -3.]])   # shape (2, 2)

vals, vecs = np.linalg.eig(A)
P    = vecs
Pinv = np.linalg.inv(P)

x0 = np.array([2.0, 1.0])   # shape (2,)
c  = Pinv @ x0               # shape (2,) — modal coordinates

print("Eigenvalues:", vals.round(4))
print("Modal coordinates c = P⁻¹x₀:", c.round(4))
print()
print("Solution: x(t) = ", end="")
for i, (ci, li, vi) in enumerate(zip(c.real, vals.real, vecs.T)):
    sign = "+" if (i > 0 and ci >= 0) else ""
    print(f"{sign}{ci:.3f}·e^({li:.1f}t)·{np.round(vi.real, 3)}", end=" ")
print()

# Verify at t = 1
t = 1.0
x_modal = sum(c[i].real * np.exp(vals[i].real * t) * vecs[:, i].real
              for i in range(2))
from scipy.linalg import expm
x_exact = expm(A * t) @ x0
print(f"\nx(t=1) via modes : {x_modal.round(6)}")
print(f"x(t=1) via expm  : {x_exact.round(6)}")
Eigenvalues: [-1. -3.]
Modal coordinates c = P⁻¹x₀: [3.     1.4142]

Solution: x(t) = 3.000·e^(-1.0t)·[1. 0.] +1.414·e^(-3.0t)·[-0.707  0.707] 

x(t=1) via modes : [1.053851 0.049787]
x(t=1) via expm  : [1.053851 0.049787]

14.2.5 Functions of a Matrix

More generally, for any scalar function \(f\) with a convergent Taylor series, we can define \(f(A)\) for diagonalizable \(A\):

\[f(A) = P \operatorname{diag}(f(\lambda_1), \ldots, f(\lambda_n)) P^{-1}.\]

import numpy as np

A = np.array([[2., 1.],
              [0., 3.]])   # shape (2, 2)

vals, vecs = np.linalg.eig(A)
P    = vecs
Pinv = np.linalg.inv(P)

# Matrix square root: f(λ) = √λ
sqrt_A = P @ np.diag(np.sqrt(vals)) @ Pinv   # shape (2, 2)
print("sqrt(A) =\n", sqrt_A.round(4))
print("\nsqrt(A) @ sqrt(A) =\n", (sqrt_A @ sqrt_A).round(4))
print("A =\n", A)

# Matrix logarithm: f(λ) = log(λ)
log_A = P @ np.diag(np.log(vals)) @ Pinv   # shape (2, 2)
from scipy.linalg import logm
print("\nlog(A) via diagonalization =\n", log_A.round(4))
print("log(A) via scipy.linalg.logm =\n", logm(A).round(4))
sqrt(A) =
 [[1.4142 0.3178]
 [0.     1.7321]]

sqrt(A) @ sqrt(A) =
 [[2. 1.]
 [0. 3.]]
A =
 [[2. 1.]
 [0. 3.]]

log(A) via diagonalization =
 [[0.6931 0.4055]
 [0.     1.0986]]
log(A) via scipy.linalg.logm =
 [[0.6931 0.4055]
 [0.     1.0986]]

14.2.6 Exercises

14.2.1 For \(A = \begin{bmatrix}3 & 0 \\ 0 & -2\end{bmatrix}\), compute \(A^5\) using \(PD^5P^{-1}\) (here \(P = I\) since \(A\) is diagonal). Verify with np.linalg.matrix_power.

14.2.2 A population matrix \(A = \begin{bmatrix}0.8 & 0.4 \\ 0.3 & 0.6\end{bmatrix}\) models a two-species ecosystem. Use diagonalization to compute \(A^{50}\mathbf{x}_0\) for \(\mathbf{x}_0 = (100, 50)^T\). What is the long-run ratio of the two populations?

14.2.3 Show that \(\frac{d}{dt}e^{At} = Ae^{At}\). Use this to verify that \(\mathbf{x}(t) = e^{At}\mathbf{x}_0\) satisfies \(\dot{\mathbf{x}} = A\mathbf{x}\).

14.2.4 For a stable continuous system, all eigenvalues of \(A\) must have negative real parts. For a stable discrete system, all eigenvalues must satisfy \(|\lambda| < 1\). Explain why by examining the long-term behavior of \(e^{\lambda_i t}\) and \(\lambda_i^k\) respectively.

14.2.5 Use the diagonalization formula to compute \(\sin(A)\) for \(A = \begin{bmatrix}0 & \pi/2 \\ 0 & \pi\end{bmatrix}\). Is the result the same as scipy.linalg.sinm(A)?

14.3 The Spectral Theorem for Symmetric Matrices

The general diagonalization of §14.1 requires \(P\) to be invertible but not necessarily orthogonal. For symmetric matrices the story is dramatically better: the eigenvector matrix can always be chosen to be orthogonal.

Important

Spectral Theorem. If \(A\) is a real symmetric matrix (\(A = A^T\)), then there exists an orthogonal matrix \(Q\) (i.e. \(Q^{-1} = Q^T\)) and a real diagonal matrix \(\Lambda\) such that

\[A = Q \Lambda Q^T.\]

Moreover:

  1. All eigenvalues are real.
  2. Eigenvectors from different eigenvalues are orthogonal.

This theorem is the workhorse of PCA, the Kalman filter, and all of Part VI. It is worth understanding deeply.


14.3.1 Why All Eigenvalues Are Real

For a real symmetric matrix \(A\) and any eigenvalue \(\lambda\) (possibly complex), take the eigenvector equation \(A\mathbf{v} = \lambda\mathbf{v}\) and compute \(\bar{\mathbf{v}}^T A \mathbf{v}\) in two ways:

\[\bar{\mathbf{v}}^T A \mathbf{v} = \lambda \|\mathbf{v}\|^2.\]

But \(\bar{\mathbf{v}}^T A \mathbf{v} = \overline{(A\mathbf{v})^T \mathbf{v}} = \overline{\lambda}\|\mathbf{v}\|^2\).

So \(\lambda = \bar{\lambda}\), meaning \(\lambda\) is real. \(\square\)


14.3.2 Why Eigenvectors from Different Eigenvalues Are Orthogonal

Let \(A\mathbf{u} = \lambda_1 \mathbf{u}\) and \(A\mathbf{v} = \lambda_2 \mathbf{v}\) with \(\lambda_1 \neq \lambda_2\). Then:

\[\lambda_1 (\mathbf{u} \cdot \mathbf{v}) = (A\mathbf{u}) \cdot \mathbf{v} = \mathbf{u} \cdot (A^T\mathbf{v}) = \mathbf{u} \cdot (A\mathbf{v}) = \lambda_2 (\mathbf{u} \cdot \mathbf{v}).\]

Since \(\lambda_1 \neq \lambda_2\), we must have \(\mathbf{u} \cdot \mathbf{v} = 0\). \(\square\)

import numpy as np

rng = np.random.default_rng(3)
M = rng.standard_normal((4, 4))   # shape (4, 4)
A = M + M.T                        # shape (4, 4) — symmetric by construction

# np.linalg.eigh is specialized for symmetric/Hermitian matrices:
# - guaranteed real eigenvalues
# - orthonormal eigenvectors
vals, Q = np.linalg.eigh(A)   # vals shape (4,), Q shape (4, 4)

print("Eigenvalues (all real):", vals.round(4))
print("\nQ^T Q (should be identity):\n", (Q.T @ Q).round(6))

A_reconstructed = Q @ np.diag(vals) @ Q.T   # shape (4, 4)
print("\nMax reconstruction error:", np.abs(A - A_reconstructed).max().round(12))
Eigenvalues (all real): [-2.6815 -1.6239  1.8538  5.7722]

Q^T Q (should be identity):
 [[ 1.  0. -0. -0.]
 [ 0.  1.  0. -0.]
 [-0.  0.  1. -0.]
 [-0. -0. -0.  1.]]

Max reconstruction error: 0.0

Always use np.linalg.eigh for symmetric matrices. It is faster than np.linalg.eig, it guarantees real output, and it returns orthonormal eigenvectors sorted by eigenvalue.


14.3.3 Geometric Interpretation: Axes of the Ellipsoid

The spectral theorem says that every symmetric matrix is a stretch along orthogonal axes in some coordinate system. The columns of \(Q\) are those axes; the diagonal entries of \(\Lambda\) are the stretch factors.

Applied to the unit sphere, \(A\) maps it to an ellipsoid whose semi-axes lie along the columns of \(Q\) with lengths \(|\lambda_i|\).

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(7)
M = rng.standard_normal((2, 2))
A = M + M.T           # shape (2, 2) — random symmetric

vals, Q = np.linalg.eigh(A)

theta  = np.linspace(0, 2 * np.pi, 400)
circle = np.vstack([np.cos(theta), np.sin(theta)])   # shape (2, 400)
ellipse = A @ circle                                  # shape (2, 400)

fig, axes = plt.subplots(1, 2, figsize=(10, 4.5))

for ax, data, title in zip(axes,
                            [circle, ellipse],
                            ['Unit circle (input)', r'$A\mathbf{x}$ (output)']):
    ax.plot(data[0], data[1], color='#4a90d9', lw=1.5)
    ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.2)
    ax.set_title(title, fontsize=10)

# Draw eigenvector axes on the ellipse
colors = ['tomato', '#2ecc71']
for i in range(2):
    v   = Q[:, i]         # eigenvector
    lam = vals[i]         # eigenvalue (stretch factor)
    axes[1].annotate('', xy=(v[0]*lam, v[1]*lam), xytext=(0, 0),
                     arrowprops=dict(arrowstyle='->', color=colors[i], lw=2.5))
    axes[1].text(v[0]*lam*1.15, v[1]*lam*1.15,
                 rf'$\lambda={lam:.2f}$', color=colors[i], fontsize=9)

plt.suptitle(r'Spectral Theorem: $A = Q\Lambda Q^T$ — symmetric $A$ stretches along orthogonal eigenvectors',
             fontsize=10)
plt.tight_layout()
plt.show()


14.3.4 The Spectral Theorem in Action: Covariance Matrices

The covariance matrix of a data set is always symmetric and positive semidefinite (§14.4). Its spectral decomposition reveals the principal axes of the data ellipsoid — which is precisely PCA.

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(99)
# Generate correlated 2D data
angle = np.radians(35)
R = np.array([[np.cos(angle), -np.sin(angle)],
              [np.sin(angle),  np.cos(angle)]])
scales = np.array([[3., 0.],
                   [0., 1.]])
X_raw = rng.standard_normal((300, 2))    # shape (300, 2)
X = (R @ scales @ X_raw.T).T             # shape (300, 2)
X -= X.mean(axis=0)                       # center

Sigma = (X.T @ X) / (X.shape[0] - 1)    # shape (2, 2) covariance matrix
vals, Q = np.linalg.eigh(Sigma)

fig, ax = plt.subplots(figsize=(7, 5))
ax.scatter(X[:, 0], X[:, 1], alpha=0.3, s=10, color='#4a90d9', label='data')

colors = ['tomato', '#2ecc71']
for i in range(2):
    v = Q[:, i]
    s = np.sqrt(vals[i])
    ax.annotate('', xy=(v[0]*s*2, v[1]*s*2), xytext=(0, 0),
                arrowprops=dict(arrowstyle='->', color=colors[i], lw=3))
    ax.text(v[0]*s*2.2, v[1]*s*2.2,
            rf'$\lambda={vals[i]:.2f}$', color=colors[i], fontsize=9)

ax.set_aspect('equal')
ax.grid(True, alpha=0.2)
ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
ax.set_title('Spectral decomposition of covariance = principal axes of data cloud', fontsize=10)
ax.legend()
plt.tight_layout()
plt.show()

The eigenvectors of the covariance matrix align with the major and minor axes of the data ellipse. This is the geometric content of PCA (Chapter 20).


14.3.5 eigh vs. eig: When to Use Which

Function Input requirement Output Speed
np.linalg.eig(A) Any square matrix Complex in general Slower
np.linalg.eigh(A) Symmetric/Hermitian Real, sorted ~2× faster
scipy.linalg.eigh(A, B) Symmetric \(A\), PD \(B\) Generalized eigenproblem Specialized

Always use eigh when you know \(A\) is symmetric. Passing a non-symmetric matrix to eigh gives wrong results without error.


14.3.6 Exercises

14.3.1 Prove that for a symmetric matrix, the eigenvalues of \(A^2\) are \(\lambda_i^2 \geq 0\). Does this mean \(A^2\) is always positive semidefinite?

14.3.2 Show that if \(A\) is symmetric with eigendecomposition \(A = Q\Lambda Q^T\), then \(A^k = Q\Lambda^k Q^T\) for any integer \(k\).

14.3.3 Construct a \(3\times3\) symmetric matrix with eigenvalues \(\{5, 2, -1\}\) by choosing any orthogonal \(Q\) and forming \(A = Q \operatorname{diag}(5, 2, -1) Q^T\). Verify with eigh.

14.3.4 You are given a \(4\times4\) matrix \(B\) that is not symmetric. Is \(B^T B\) always symmetric? Always diagonalizable? Always positive semidefinite? Prove your answers.

14.3.5 A classmate claims that “orthogonal matrices are symmetric.” Is this true in general? Give a counterexample. Under what special condition is an orthogonal matrix also symmetric?

14.4 Positive Definite and Semidefinite Matrices

You will encounter the phrase “positive definite” (PD) or “positive semidefinite” (PSD) dozens of times in machine learning and robotics: covariance matrices, Hessians, Gram matrices, information matrices, and regularizers are all PSD or PD. This section gives you three equivalent characterizations and the geometric picture.


14.4.1 Definitions

A real symmetric matrix \(A\) is:

Name Notation Definition
Positive definite \(A \succ 0\) \(\mathbf{x}^T A \mathbf{x} > 0\) for all \(\mathbf{x} \neq \mathbf{0}\)
Positive semidefinite \(A \succeq 0\) \(\mathbf{x}^T A \mathbf{x} \geq 0\) for all \(\mathbf{x}\)
Negative definite \(A \prec 0\) \(\mathbf{x}^T A \mathbf{x} < 0\) for all \(\mathbf{x} \neq \mathbf{0}\)
Indefinite \(\mathbf{x}^T A \mathbf{x}\) takes both positive and negative values

The scalar \(\mathbf{x}^T A \mathbf{x}\) is called a quadratic form. In two dimensions it defines a surface: a bowl (PD), a flat-bottomed bowl (PSD), an inverted bowl (ND), or a saddle (indefinite).

import numpy as np
import matplotlib.pyplot as plt

# Four quadratic form surfaces
mats = {
    'Positive definite': np.array([[3., 1.], [1., 2.]]),
    'Positive semidefinite': np.array([[1., 1.], [1., 1.]]),
    'Negative definite': np.array([[-3., -1.], [-1., -2.]]),
    'Indefinite': np.array([[2., 0.], [0., -1.]]),
}

x1 = np.linspace(-2, 2, 60)
x2 = np.linspace(-2, 2, 60)
X1, X2 = np.meshgrid(x1, x2)   # each shape (60, 60)

fig, axes = plt.subplots(2, 2, figsize=(11, 8),
                          subplot_kw={'projection': '3d'})

for ax, (title, A) in zip(axes.ravel(), mats.items()):
    # Evaluate x^T A x on the grid
    Z = A[0,0]*X1**2 + (A[0,1]+A[1,0])*X1*X2 + A[1,1]*X2**2
    ax.plot_surface(X1, X2, Z, cmap='Blues', alpha=0.85, linewidth=0)
    ax.set_title(title, fontsize=9)
    ax.set_xlabel('$x_1$', fontsize=8)
    ax.set_ylabel('$x_2$', fontsize=8)
    ax.set_zlabel(r'$\mathbf{x}^T A \mathbf{x}$', fontsize=8)
    ax.tick_params(labelsize=6)

plt.suptitle(r'Quadratic form $\mathbf{x}^TA\mathbf{x}$ for four definiteness types',
             fontsize=11)
plt.tight_layout()
plt.show()


14.4.2 Three Equivalent Tests for Positive Definiteness

For a real symmetric \(n\times n\) matrix \(A\), the following are equivalent:

Note

Test 1 — Eigenvalues. \(A \succ 0 \Leftrightarrow\) all eigenvalues \(\lambda_i > 0\). (For PSD: all \(\lambda_i \geq 0\).)

Test 2 — Leading Principal Minors (Sylvester’s Criterion). \(A \succ 0 \Leftrightarrow\) all leading principal minors are positive: \(\det(A_{1:k, 1:k}) > 0\) for \(k = 1, \ldots, n\).

Test 3 — Cholesky Factorization. \(A \succ 0 \Leftrightarrow A\) has a Cholesky factorization \(A = LL^T\) with \(L\) having positive diagonal entries.

In practice, Test 3 is the cheapest to compute: attempt a Cholesky decomposition; if it succeeds without error, the matrix is PD.

import numpy as np

def pd_tests(A, name="A"):
    print(f"\n--- {name} ---")
    # Test 1: eigenvalues
    eigs = np.linalg.eigvalsh(A)   # shape (n,)
    print(f"Eigenvalues: {eigs.round(4)}")
    print(f"Test 1 (all λ > 0): {(eigs > 0).all()}")

    # Test 2: leading principal minors
    minors = [np.linalg.det(A[:k, :k]) for k in range(1, A.shape[0]+1)]
    print(f"Leading principal minors: {[round(m, 4) for m in minors]}")
    print(f"Test 2 (all > 0): {all(m > 0 for m in minors)}")

    # Test 3: Cholesky
    try:
        np.linalg.cholesky(A)
        print("Test 3 (Cholesky): ✓ PD")
    except np.linalg.LinAlgError:
        print("Test 3 (Cholesky): ✗ not PD")

# Positive definite
A_pd = np.array([[4., 2., 1.],
                 [2., 3., 1.],
                 [1., 1., 2.]])   # shape (3, 3)
pd_tests(A_pd, "Positive definite")

# Positive semidefinite (rank-deficient)
v = np.array([1., 2., 3.])
A_psd = np.outer(v, v)            # shape (3, 3) — rank 1, PSD
pd_tests(A_psd, "Positive semidefinite")

# Indefinite
A_indef = np.array([[2., 0., 0.],
                    [0.,-1., 0.],
                    [0., 0., 3.]])  # shape (3, 3)
pd_tests(A_indef, "Indefinite")

--- Positive definite ---
Eigenvalues: [1.308  1.6431 6.0489]
Test 1 (all λ > 0): True
Leading principal minors: [np.float64(4.0), np.float64(8.0), np.float64(13.0)]
Test 2 (all > 0): True
Test 3 (Cholesky): ✓ PD

--- Positive semidefinite ---
Eigenvalues: [-0.  0. 14.]
Test 1 (all λ > 0): False
Leading principal minors: [np.float64(1.0), np.float64(0.0), np.float64(0.0)]
Test 2 (all > 0): False
Test 3 (Cholesky): ✗ not PD

--- Indefinite ---
Eigenvalues: [-1.  2.  3.]
Test 1 (all λ > 0): False
Leading principal minors: [np.float64(2.0), np.float64(-2.0), np.float64(-6.0)]
Test 2 (all > 0): False
Test 3 (Cholesky): ✗ not PD

14.4.3 Covariance Matrices Are Always PSD

For any data matrix \(X \in \mathbb{R}^{n \times p}\) (mean-centered):

\[\Sigma = \frac{1}{n-1} X^T X.\]

Then for any \(\mathbf{z} \in \mathbb{R}^p\):

\[\mathbf{z}^T \Sigma \mathbf{z} = \frac{1}{n-1} \mathbf{z}^T X^T X \mathbf{z} = \frac{1}{n-1} \|X\mathbf{z}\|^2 \geq 0.\]

So \(\Sigma \succeq 0\) always. It is PD if and only if \(X\) has full column rank (no perfectly collinear features).

import numpy as np

rng = np.random.default_rng(5)

# Full-rank data: PD covariance
X_full = rng.standard_normal((100, 4))   # shape (100, 4)
X_full -= X_full.mean(axis=0)
Sigma_full = X_full.T @ X_full / 99     # shape (4, 4)

# Collinear data: PSD covariance (rank-deficient)
X_col = rng.standard_normal((100, 3))   # shape (100, 3)
X_col -= X_col.mean(axis=0)
# Add a 4th feature that is exactly the sum of the first two
X_col = np.hstack([X_col, (X_col[:, 0] + X_col[:, 1]).reshape(-1, 1)])  # shape (100, 4)
X_col -= X_col.mean(axis=0)
Sigma_col = X_col.T @ X_col / 99        # shape (4, 4)

for label, S in [("Full rank (PD)", Sigma_full), ("Collinear (PSD)", Sigma_col)]:
    eigs = np.linalg.eigvalsh(S)
    pd = "PD" if eigs.min() > 1e-10 else "PSD (not PD)"
    print(f"{label}: min eigenvalue = {eigs.min():.6f}{pd}")
Full rank (PD): min eigenvalue = 0.717380  → PD
Collinear (PSD): min eigenvalue = -0.000000  → PSD (not PD)

14.4.4 The Ellipsoid Geometry of PD Matrices

A positive definite matrix \(A\) defines an inner product \(\langle \mathbf{x}, \mathbf{y}\rangle_A = \mathbf{x}^T A \mathbf{y}\). The “unit ball” in this inner product is an ellipsoid:

\[\mathcal{E}_A = \{\mathbf{x} : \mathbf{x}^T A \mathbf{x} \leq 1\}.\]

The semi-axes have lengths \(1/\sqrt{\lambda_i}\) along the eigenvector directions \(\mathbf{q}_i\).

import numpy as np
import matplotlib.pyplot as plt

A = np.array([[4., 2.],
              [2., 3.]])   # shape (2, 2) — positive definite

vals, Q = np.linalg.eigh(A)   # vals sorted ascending

# Parametrize the ellipsoid: x^T A x = 1
# In eigenbasis coordinates y = Q^T x, this is sum λ_i y_i^2 = 1
# → y = (cos(t)/√λ1, sin(t)/√λ2)
t = np.linspace(0, 2 * np.pi, 400)
y = np.vstack([np.cos(t) / np.sqrt(vals[0]),
               np.sin(t) / np.sqrt(vals[1])])   # shape (2, 400)
x = Q @ y   # rotate back to original coordinates, shape (2, 400)

fig, ax = plt.subplots(figsize=(6, 5))
ax.plot(x[0], x[1], color='#4a90d9', lw=2, label=r'$\mathbf{x}^T A\mathbf{x}=1$')
ax.fill(x[0], x[1], alpha=0.15, color='#4a90d9')

colors = ['tomato', '#2ecc71']
for i in range(2):
    v     = Q[:, i]
    semi  = 1 / np.sqrt(vals[i])
    ax.annotate('', xy=(v[0]*semi, v[1]*semi), xytext=(0, 0),
                arrowprops=dict(arrowstyle='->', color=colors[i], lw=2.5))
    ax.text(v[0]*semi*1.15, v[1]*semi*1.15,
            rf'$1/\sqrt{{\lambda_{i+1}}}={semi:.2f}$', color=colors[i], fontsize=9)

ax.set_aspect('equal')
ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
ax.grid(True, alpha=0.2)
ax.set_title(r'Ellipsoid $\{\mathbf{x}: \mathbf{x}^TA\mathbf{x}=1\}$ — semi-axes $1/\sqrt{\lambda_i}$',
             fontsize=10)
plt.tight_layout()
plt.show()


14.4.5 PD Matrices in ML and Robotics

Context Matrix Why PD/PSD matters
Kalman filter Covariance \(P\) PSD always; PD means no perfectly known state
Gaussian process Kernel matrix \(K\) Must be PSD for a valid covariance function
Neural network loss Hessian \(H\) at minimum PD at a local minimum (second-order condition)
Optimization Regularizer \(\lambda I\) PD → strongly convex objective
Factor graph Information matrix \(\Omega\) PD ↔︎ all states observable

14.4.6 Exercises

14.4.1 Show that \(A \succ 0\) implies \(A\) is invertible. (Hint: what is \(\mathbf{x}^T A \mathbf{x}\) if \(A\) has a zero eigenvalue?)

14.4.2 The matrix \(A = \begin{bmatrix}2 & 3 \\ 3 & 2\end{bmatrix}\) has a positive diagonal. Is it positive definite? Apply all three tests.

14.4.3 Show that if \(A \succ 0\) and \(B \succ 0\), then \(A + B \succ 0\). Does the same hold for \(AB\)?

14.4.4 Let \(A \succeq 0\) with \(\ker(A) \neq \{\mathbf{0}\}\). Show there exists a nonzero \(\mathbf{x}\) with \(A\mathbf{x} = \mathbf{0}\). What does this mean for the Cholesky factorization?

14.4.5 Prove that \(A^T A \succeq 0\) for any matrix \(A\). Under what condition is \(A^T A \succ 0\)?

14.5 Spectral Decomposition: \(A = \sum_i \lambda_i \mathbf{q}_i \mathbf{q}_i^T\)

The spectral theorem \(A = Q\Lambda Q^T\) can be unpacked column by column into a sum of rank-1 matrices:

\[\boxed{A = \sum_{i=1}^{n} \lambda_i \, \mathbf{q}_i \mathbf{q}_i^T.}\]

Each term \(\lambda_i \mathbf{q}_i \mathbf{q}_i^T\) is a rank-1 projection scaled by \(\lambda_i\). The full matrix \(A\) is their sum — an outer product decomposition along orthogonal directions.

This representation is the algebraic backbone of PCA, low-rank approximation, and, one step removed, the SVD of Chapter 18.


14.5.1 Derivation

Write \(Q = [\mathbf{q}_1 \mid \cdots \mid \mathbf{q}_n]\) and \(\Lambda = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)\). Then:

\[A = Q\Lambda Q^T = \sum_{i=1}^n \lambda_i \, \mathbf{q}_i \mathbf{q}_i^T.\]

This follows from the identity \((Q\Lambda Q^T)_{jk} = \sum_i \lambda_i (q_i)_j (q_i)_k\).

Each outer product \(P_i = \mathbf{q}_i \mathbf{q}_i^T\) is an orthogonal projection onto the 1-dimensional subspace spanned by \(\mathbf{q}_i\). The properties:

  • \(P_i^2 = P_i\) (idempotent)
  • \(P_i^T = P_i\) (symmetric)
  • \(P_i P_j = 0\) for \(i \neq j\) (orthogonal projectors)
  • \(\sum_i P_i = I\) (completeness relation)
import numpy as np

rng = np.random.default_rng(21)
M = rng.standard_normal((4, 4))
A = M + M.T                      # shape (4, 4) — symmetric

vals, Q = np.linalg.eigh(A)

# Spectral decomposition: sum of rank-1 projectors
A_spectral = sum(vals[i] * np.outer(Q[:, i], Q[:, i]) for i in range(4))   # shape (4, 4)

print("Max error ||A - Σ λᵢ qᵢqᵢᵀ||:", np.abs(A - A_spectral).max().round(12))

# Verify projector properties
P0 = np.outer(Q[:, 0], Q[:, 0])   # shape (4, 4) — first projector
print("\nP₀² == P₀:", np.allclose(P0 @ P0, P0))
print("Σ Pᵢ == I:", np.allclose(
    sum(np.outer(Q[:, i], Q[:, i]) for i in range(4)),
    np.eye(4)))
Max error ||A - Σ λᵢ qᵢqᵢᵀ||: 0.0

P₀² == P₀: True
Σ Pᵢ == I: True

14.5.2 Low-Rank Approximation by Truncation

If we keep only the \(k\) terms with the largest \(|\lambda_i|\):

\[A_k = \sum_{i=1}^k \lambda_i \, \mathbf{q}_i \mathbf{q}_i^T,\]

we obtain the best rank-\(k\) approximation of \(A\) in the Frobenius norm — the direct analogue of the Eckart–Young theorem for symmetric matrices.

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(55)
M = rng.standard_normal((8, 8))
A = M + M.T                       # shape (8, 8)

vals, Q = np.linalg.eigh(A)
# eigh returns ascending order; sort by |λ| descending for best approx
order = np.argsort(np.abs(vals))[::-1]
vals_sorted = vals[order]
Q_sorted    = Q[:, order]

errors = []
for k in range(1, 9):
    Ak  = sum(vals_sorted[i] * np.outer(Q_sorted[:, i], Q_sorted[:, i])
              for i in range(k))   # shape (8, 8)
    errors.append(np.linalg.norm(A - Ak, 'fro'))

fig, ax = plt.subplots(figsize=(7, 4))
ax.plot(range(1, 9), errors, 'o-', color='#4a90d9', lw=2)
ax.set_xlabel('rank $k$')
ax.set_ylabel(r'$\|A - A_k\|_F$')
ax.set_title('Frobenius error of rank-$k$ spectral approximation', fontsize=10)
ax.grid(True, alpha=0.2)
plt.tight_layout()
plt.show()

print("Frobenius errors:", [round(e, 4) for e in errors])

Frobenius errors: [np.float64(12.2508), np.float64(9.721), np.float64(7.6242), np.float64(4.9905), np.float64(3.685), np.float64(2.3146), np.float64(0.4901), np.float64(0.0)]

14.5.3 Application to PCA: Preview

For a mean-centered data matrix \(X\) with covariance \(\Sigma = \frac{1}{n-1} X^T X\), the spectral decomposition is:

\[\Sigma = \sum_{i=1}^p \lambda_i \mathbf{q}_i \mathbf{q}_i^T.\]

  • \(\mathbf{q}_i\) are the principal directions (axes of maximum variance).
  • \(\lambda_i\) are the variances along those directions.
  • Truncating to \(k\) terms gives the best rank-\(k\) approximation of \(\Sigma\), discarding directions of smallest variance.

This is the algebraic core of PCA (Chapter 20).

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(77)
# Correlated 2D data with known structure
cov_true = np.array([[4., 2.], [2., 1.5]])   # shape (2, 2)
L_true = np.linalg.cholesky(cov_true)
X = (L_true @ rng.standard_normal((2, 200))).T   # shape (200, 2)
X -= X.mean(axis=0)

Sigma = X.T @ X / (X.shape[0] - 1)              # shape (2, 2)
vals, Q = np.linalg.eigh(Sigma)

# Rank-1 approximation (first principal component only)
Sigma_1 = vals[-1] * np.outer(Q[:, -1], Q[:, -1])   # shape (2, 2) — largest eigenvector
Sigma_2 = Sigma_1 + vals[-2] * np.outer(Q[:, -2], Q[:, -2])  # rank-2 = full

fig, axes = plt.subplots(1, 3, figsize=(13, 4))

def draw_ellipse(ax, S, color, label):
    ev, eq = np.linalg.eigh(S + 1e-8*np.eye(2))
    ev = np.maximum(ev, 0)
    t = np.linspace(0, 2*np.pi, 200)
    y = np.vstack([np.cos(t)*np.sqrt(ev[0]),
                   np.sin(t)*np.sqrt(ev[1])])
    x = eq @ y
    ax.plot(x[0], x[1], color=color, lw=2, label=label)

for ax, S, title in zip(axes,
                         [Sigma, Sigma_2, Sigma_1],
                         ['True Σ (rank 2)', 'Rank-2 approx', 'Rank-1 approx']):
    ax.scatter(X[:, 0], X[:, 1], alpha=0.2, s=8, color='#4a90d9')
    draw_ellipse(ax, S, 'tomato', r'$A_k$ ellipse')
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.2)
    ax.set_title(title, fontsize=9)
    ax.set_xlim(-7, 7); ax.set_ylim(-5, 5)

plt.suptitle('Low-rank spectral approximation of covariance matrix', fontsize=10)
plt.tight_layout()
plt.show()


14.5.4 Matrix Functions Revisited

The spectral decomposition makes any scalar function \(f\) applicable to a symmetric matrix \(A\):

\[f(A) = \sum_i f(\lambda_i) \, \mathbf{q}_i \mathbf{q}_i^T.\]

The matrix square root \(A^{1/2}\), the matrix inverse \(A^{-1}\), and the matrix logarithm \(\log A\) are all defined this way for symmetric matrices with appropriate eigenvalue constraints.

import numpy as np

rng = np.random.default_rng(9)
M = rng.standard_normal((3, 3))
A = M @ M.T + 0.5 * np.eye(3)   # shape (3, 3) — PD by construction

vals, Q = np.linalg.eigh(A)

# Matrix square root via spectral decomp
sqrt_A = Q @ np.diag(np.sqrt(vals)) @ Q.T   # shape (3, 3)
print("sqrt(A) @ sqrt(A) ≈ A:")
print("Max error:", np.abs(sqrt_A @ sqrt_A - A).max().round(10))

# Matrix inverse via spectral decomp
inv_A_spectral = Q @ np.diag(1.0 / vals) @ Q.T   # shape (3, 3)
inv_A_numpy    = np.linalg.inv(A)                  # shape (3, 3)
print("\n||inv_A_spectral - inv_A_numpy||:", np.abs(inv_A_spectral - inv_A_numpy).max().round(12))
sqrt(A) @ sqrt(A) ≈ A:
Max error: 0.0

||inv_A_spectral - inv_A_numpy||: 0.0

14.5.5 Exercises

14.5.1 Show that the projectors \(P_i = \mathbf{q}_i \mathbf{q}_i^T\) satisfy \(P_i P_j = 0\) for \(i \neq j\). Use the orthonormality of the eigenvectors.

14.5.2 Let \(A\) be \(n\times n\) symmetric with spectral decomposition. Show that the trace of \(A\) equals \(\sum_i \lambda_i\) by taking the trace of \(A = Q\Lambda Q^T\) and using \(\operatorname{tr}(BC) = \operatorname{tr}(CB)\).

14.5.3 Compute the rank-1 approximation \(A_1 = \lambda_1 \mathbf{q}_1 \mathbf{q}_1^T\) of the covariance matrix from the data below, where \(\lambda_1\) is the largest eigenvalue. What percentage of the total variance does it capture?

import numpy as np
rng = np.random.default_rng(2)
X = rng.standard_normal((50, 3))
X[:, 1] += 2 * X[:, 0]   # introduce correlation
X -= X.mean(axis=0)
Sigma = X.T @ X / 49

14.5.4 For a PD matrix \(A\), show that the matrix inverse can be written \(A^{-1} = \sum_i \frac{1}{\lambda_i} \mathbf{q}_i \mathbf{q}_i^T\). What happens if one eigenvalue is zero?

14.5.5 Explain geometrically why the rank-\(k\) spectral approximation captures the subspace of maximum variance. What is the Frobenius error of the rank-\(k\) approximation in terms of the discarded eigenvalues?

14.6 Jordan Normal Form (Boxed Note)

NoteWhen Diagonalization Fails: The Jordan Form

Most matrices you encounter in ML, CV, and robotics are diagonalizable — either because their eigenvalues happen to be distinct, or because they are symmetric (Spectral Theorem). Occasionally a defective matrix appears, most often in degenerate control systems or when two physical modes coalesce.

What is a Jordan block?

A \(k\times k\) Jordan block for eigenvalue \(\lambda\) is:

\[J_k(\lambda) = \begin{bmatrix} \lambda & 1 & & \\ & \lambda & 1 & \\ & & \ddots & 1 \\ & & & \lambda \end{bmatrix}.\]

It is “almost diagonal” — \(\lambda\) on the diagonal, \(1\)’s on the superdiagonal. When a matrix has an eigenvalue with algebraic multiplicity \(k\) but only \(r < k\) linearly independent eigenvectors (geometric multiplicity \(r\)), its Jordan form has \(r\) Jordan blocks for that eigenvalue, with sizes summing to \(k\).

The Jordan Decomposition. Every square matrix \(A\) over \(\mathbb{C}\) is similar to a block-diagonal Jordan form:

\[A = P J P^{-1}, \qquad J = \begin{bmatrix} J_{k_1}(\lambda_{a}) & & \\ & \ddots & \\ & & J_{k_r}(\lambda_{r}) \end{bmatrix}.\]

Diagonalization is the special case where every Jordan block is \(1\times1\).

Powers of a Jordan block.

\[J_k(\lambda)^n = \begin{bmatrix} \lambda^n & \binom{n}{1}\lambda^{n-1} & \binom{n}{2}\lambda^{n-2} & \cdots \\ & \lambda^n & \binom{n}{1}\lambda^{n-1} & \cdots \\ & & \ddots & \\ & & & \lambda^n \end{bmatrix}.\]

For \(|\lambda| < 1\), \(J_k(\lambda)^n \to 0\) but polynomially in \(n\) (not purely exponentially as in the diagonalizable case). This matters for stability analysis: a defective eigenvalue at \(\lambda = 1\) produces polynomial growth in the system response even though \(|\lambda| = 1\).

A minimal Python demonstration.

import numpy as np

# A 3x3 Jordan block for λ = 0.95
lam = 0.95
J3 = np.array([[lam, 1., 0.],
               [0., lam, 1.],
               [0.,  0., lam]])   # shape (3, 3)

# Compare Jk^n decay to λ^n
n_steps = 80
norms_J  = [np.linalg.norm(np.linalg.matrix_power(J3, n)) for n in range(n_steps)]
norms_D  = [lam**n for n in range(n_steps)]

import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8, 4))
ax.semilogy(norms_J, color='tomato', lw=2,
            label=r'$\|J_3^n\|$ (defective, $\lambda=0.95$)')
ax.semilogy(norms_D, color='#4a90d9', lw=2,
            label=r'$|\lambda|^n$ (diagonalizable)')
ax.set_xlabel('step $n$')
ax.set_ylabel('magnitude (log scale)')
ax.set_title('Defective vs. diagonalizable: both decay, but Jordan has polynomial transient',
             fontsize=9)
ax.legend()
ax.grid(True, alpha=0.2)
plt.tight_layout()
plt.show()

When do you actually encounter Jordan forms?

  • Degenerate control systems: two system modes merge at the same frequency (e.g. critically damped oscillator at its breakpoint).
  • Computer graphics: certain projective transformations used in shadow algorithms.
  • Theoretical analysis: proving exponential stability requires ruling out Jordan blocks at the unit circle.

For routine ML, CV, and robotics work you rarely need to compute the Jordan form explicitly. scipy.linalg.jordan_form exists if you need it. No exercises are assigned for this section.

14.7 Application: Stability of a Discrete-Time Robot Controller

A robot joint is controlled by a discrete-time linear system:

\[\mathbf{x}_{k+1} = A\,\mathbf{x}_k,\]

where \(\mathbf{x}_k = (\theta_k, \dot\theta_k)^T\) encodes joint angle and angular velocity at time step \(k\). The question that every control engineer asks before deploying the loop: will the system settle to zero, grow without bound, or oscillate?

The complete answer comes directly from the eigenvalues of \(A\).


14.7.1 Stability Criterion

For a discrete-time linear system \(\mathbf{x}_{k+1} = A\mathbf{x}_k\):

Condition Behavior
All \(|\lambda_i| < 1\) \(\mathbf{x}_k \to \mathbf{0}\): asymptotically stable
Some \(|\lambda_i| = 1\) bounded oscillation: marginally stable
Some \(|\lambda_i| > 1\) \(\|\mathbf{x}_k\| \to \infty\): unstable

The dominant eigenvalue (largest \(|\lambda_i|\)) controls the long-run behavior; its eigenvector is the direction the state eventually aligns with.


14.7.2 Three Controllers Side by Side

We compare three joint controllers modeled by the same physical plant but different gains:

import numpy as np
import matplotlib.pyplot as plt

# Physical plant: Euler-discretized double-integrator with dt = 0.05 s
# x1 = angle, x2 = angular velocity
dt = 0.05

# Three feedback gains: stable, marginal, unstable
controllers = {
    'Stable (k=8)':   np.array([[1 - 8*dt,  dt],
                                 [-8*dt,      1 - 2*dt]]),
    'Marginal (k=4)': np.array([[1 - 4*dt,  dt],
                                 [-4*dt,      1 - 2*dt]]),
    'Unstable (k=0)': np.array([[1.,         dt],
                                 [0.,         1 - 2*dt]]),
}

x0 = np.array([1.0, 0.0])   # shape (2,) — start at angle = 1 rad, velocity = 0
n_steps = 60

fig, axes = plt.subplots(2, 3, figsize=(13, 7))

for col, (label, A) in enumerate(controllers.items()):
    vals, vecs = np.linalg.eig(A)
    P    = vecs
    Pinv = np.linalg.inv(P)

    # Simulate via diagonalization
    trajectory = np.zeros((n_steps + 1, 2))   # shape (61, 2)
    trajectory[0] = x0
    for k in range(1, n_steps + 1):
        Dk              = np.diag(vals**k)
        trajectory[k]  = np.real(P @ Dk @ Pinv @ x0)

    # Top row: state trajectories
    ax_top = axes[0, col]
    ax_top.plot(trajectory[:, 0], color='#4a90d9', lw=2, label=r'$\theta_k$')
    ax_top.plot(trajectory[:, 1], color='tomato',  lw=2, label=r'$\dot\theta_k$')
    ax_top.axhline(0, color='#333333', lw=0.5, alpha=0.4)
    ax_top.set_title(label, fontsize=9)
    ax_top.set_xlabel('step $k$')
    ax_top.legend(fontsize=8)
    ax_top.grid(True, alpha=0.2)
    max_abs = max(1.5, np.abs(trajectory).max() * 1.1)
    ax_top.set_ylim(-max_abs, max_abs)

    # Bottom row: eigenvalues on the complex plane
    ax_bot = axes[1, col]
    circle_t = np.linspace(0, 2 * np.pi, 300)
    ax_bot.plot(np.cos(circle_t), np.sin(circle_t),
                color='#333333', lw=0.8, ls='--', alpha=0.5, label='unit circle')
    ax_bot.scatter(vals.real, vals.imag, s=80, zorder=5,
                   color=['tomato' if abs(v) >= 1 else '#2ecc71' for v in vals])
    for i, v in enumerate(vals):
        tag = rf'$|\lambda|={abs(v):.3f}$'
        ax_bot.text(v.real + 0.04, v.imag + 0.04, tag, fontsize=7)
    ax_bot.axhline(0, color='#333333', lw=0.4, alpha=0.5)
    ax_bot.axvline(0, color='#333333', lw=0.4, alpha=0.5)
    ax_bot.set_aspect('equal')
    ax_bot.set_xlim(-1.5, 1.5)
    ax_bot.set_ylim(-1.5, 1.5)
    ax_bot.grid(True, alpha=0.2)
    ax_bot.set_title(r'Eigenvalues in $\mathbb{C}$', fontsize=9)
    ax_bot.legend(fontsize=7)

plt.suptitle(r'Discrete-time stability: $\mathbf{x}_{k+1}=A\mathbf{x}_k$'
             '\nGreen = inside unit circle (stable), Red = on/outside (unstable)',
             fontsize=10)
plt.tight_layout()
plt.show()


14.7.3 Finding the Dominant Mode and Its Decay Rate

For the stable controller, the dominant eigenvalue determines how fast the system settles. We extract it and express the solution in modal form:

import numpy as np

dt = 0.05
A = np.array([[1 - 8*dt,  dt],
              [-8*dt,      1 - 2*dt]])   # shape (2, 2) — stable controller

vals, vecs = np.linalg.eig(A)
P    = vecs
Pinv = np.linalg.inv(P)

x0 = np.array([1.0, 0.0])   # shape (2,)
c  = Pinv @ x0               # shape (2,) — modal coordinates

print("Eigenvalues:", vals.round(4))
print("Magnitudes: ", np.abs(vals).round(4))
print()

dom_idx = np.argmax(np.abs(vals))
dom_lam = vals[dom_idx]
dom_vec = vecs[:, dom_idx]

print(f"Dominant eigenvalue: λ = {dom_lam:.4f}  (|λ| = {abs(dom_lam):.4f})")
print(f"Dominant eigenvector: {dom_vec.real.round(4)}")
print()

# Settling time: how many steps to decay by 99%?
# |λ|^k < 0.01  →  k > log(0.01)/log(|λ|)
k_settle = int(np.log(0.01) / np.log(abs(dom_lam))) + 1
print(f"Steps to 99% decay: {k_settle}  (≈ {k_settle * dt:.2f} s at dt={dt} s)")

# Modal decomposition of the solution
print("\nModal decomposition of x(k):")
for i in range(len(vals)):
    vi = vecs[:, i].real
    ci = c[i].real
    print(f"  Mode {i+1}: {ci:.4f} × (λ={vals[i].real:.4f})^k × {vi}")
Eigenvalues: [0.7 0.8]
Magnitudes:  [0.7 0.8]

Dominant eigenvalue: λ = 0.8000  (|λ| = 0.8000)
Dominant eigenvector: [-0.2425 -0.9701]

Steps to 99% decay: 21  (≈ 1.05 s at dt=0.05 s)

Modal decomposition of x(k):
  Mode 1: -4.4721 × (λ=0.7000)^k × [-0.4472136  -0.89442719]
  Mode 2: 4.1231 × (λ=0.8000)^k × [-0.24253563 -0.9701425 ]

14.7.4 Designing for a Target Settling Time

Given a desired settling time (in steps), we can work backwards to the required spectral radius \(\rho = |\lambda_{\max}|\):

\[\rho^k < \epsilon \implies \rho < \epsilon^{1/k}.\]

import numpy as np

dt = 0.05
target_steps  = 30    # settle in 30 steps = 1.5 s
epsilon       = 0.01  # 99% decay
rho_required  = epsilon ** (1 / target_steps)

print(f"Target settling: {target_steps} steps ({target_steps*dt:.2f} s)")
print(f"Required spectral radius ρ < {rho_required:.4f}")
print()

# Sweep gains and check spectral radius
for gain in [2, 4, 6, 8, 10, 14]:
    A = np.array([[1 - gain*dt,  dt],
                  [-gain*dt,      1 - 2*dt]])   # shape (2, 2)
    rho = np.max(np.abs(np.linalg.eigvals(A)))
    meets = "✓" if rho < rho_required else "✗"
    print(f"gain={gain:2d}:  ρ = {rho:.4f}  {meets}")
Target settling: 30 steps (1.50 s)
Required spectral radius ρ < 0.8577

gain= 2:  ρ = 0.9028  ✗
gain= 4:  ρ = 0.8544  ✓
gain= 6:  ρ = 0.8031  ✓
gain= 8:  ρ = 0.8000  ✓
gain=10:  ρ = 0.8225  ✓
gain=14:  ρ = 0.8345  ✓

14.7.5 Exploration

Dataset suggestion: The OpenAI Gym CartPole environment or any physics simulator provides linearized system matrices around the upright equilibrium. Try the following:

  1. Linearize the CartPole dynamics at the operating point to get a \(4\times4\) matrix \(A\) (state: \([x, \dot x, \theta, \dot\theta]\)).
  2. Compute its eigenvalues. Is the open-loop system stable?
  3. Design a proportional feedback gain \(K\) and check whether \(A - BK\) has all eigenvalues inside the unit circle.
  4. For the closed-loop system, use the spectral decomposition to compute the settling time and the dominant mode shape.

14.7.6 Exercises

App.1 For the stable controller above with \(k=8\), compute \(A^{100}\) using diagonalization and verify numerically that it is nearly the zero matrix.

App.2 A continuous-time system \(\dot{\mathbf{x}} = A_c \mathbf{x}\) is discretized as \(A = e^{A_c \Delta t}\). Show that the discrete system is stable (\(|\lambda_d| < 1\)) if and only if the continuous system is stable (\(\operatorname{Re}(\lambda_c) < 0\)).

App.3 The controller matrix \(A\) has complex conjugate eigenvalues \(\lambda = re^{\pm i\omega}\). Describe the resulting motion in the \((\theta, \dot\theta)\) plane. Under what condition is the motion oscillatory but decaying? How does the oscillation frequency relate to \(\omega\)?

14.8 Exercises and Solutions


14.8.1 Exercise 14.1.1 — Diagonalization by hand

Diagonalize \(A = \begin{bmatrix}1 & 2 \\ 0 & 3\end{bmatrix}\) and verify \(PDP^{-1} = A\).

import numpy as np

A = np.array([[1., 2.],
              [0., 3.]])   # shape (2, 2)

vals, vecs = np.linalg.eig(A)
P    = vecs
D    = np.diag(vals)
Pinv = np.linalg.inv(P)

print("Eigenvalues:", vals.round(4))
print("\nP =\n", P.round(4))
print("\nP⁻¹ =\n", Pinv.round(4))
print("\nPDP⁻¹ =\n", (P @ D @ Pinv).round(4))
print("\nA =\n", A)
print("\nMax error:", np.abs(A - P @ D @ Pinv).max().round(12))
Eigenvalues: [1. 3.]

P =
 [[1.     0.7071]
 [0.     0.7071]]

P⁻¹ =
 [[ 1.     -1.    ]
 [ 0.      1.4142]]

PDP⁻¹ =
 [[1. 2.]
 [0. 3.]]

A =
 [[1. 2.]
 [0. 3.]]

Max error: 0.0

Solution. Eigenvalues are \(\lambda_1 = 1\) (upper triangular → diagonal entries). For \(\lambda_1 = 1\): \((A-I)\mathbf{v} = 0\)\(\mathbf{v}_1 = (1, 0)^T\). For \(\lambda_2 = 3\): \((A-3I)\mathbf{v} = 0\) → row reduce \(\begin{bmatrix}-2&2\\0&0\end{bmatrix}\), so \(v_1 = v_2\), \(\mathbf{v}_2 = (1, 1)^T/\sqrt{2}\).

\[P = \begin{bmatrix}1 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2}\end{bmatrix}, \quad D = \begin{bmatrix}1 & 0 \\ 0 & 3\end{bmatrix}.\]


14.8.2 Exercise 14.1.3 — Defective matrix

import numpy as np

A = np.array([[5., 1.],
              [0., 5.]])   # shape (2, 2) — defective

vals, vecs = np.linalg.eig(A)
P    = vecs
D    = np.diag(vals)
Pinv = np.linalg.inv(P)

print("Eigenvalues:", vals)
print("Eigenvector matrix rank:", np.linalg.matrix_rank(P))
print("Reconstruction error:", np.abs(A - P @ D @ Pinv).max().round(6))
Eigenvalues: [5. 5.]
Eigenvector matrix rank: 2
Reconstruction error: 1.25

Solution. Both eigenvalues equal 5. The null space of \(A - 5I = \begin{bmatrix}0&1\\0&0\end{bmatrix}\) is spanned by \((1, 0)^T\) only — one independent eigenvector, not two. Numpy returns the same eigenvector twice, so \(P\) is singular. \(P^{-1}\) is numerically huge and the reconstruction error is large — confirming the matrix is defective and not diagonalizable over \(\mathbb{R}\).


14.8.3 Exercise 14.1.4 — Eigenvalues of \(A^{-1}\)

Solution. If \(A = PDP^{-1}\) with all \(\lambda_i \neq 0\): \[A^{-1} = (PDP^{-1})^{-1} = PD^{-1}P^{-1},\] where \(D^{-1} = \operatorname{diag}(1/\lambda_1, \ldots, 1/\lambda_n)\). So the eigenvalues of \(A^{-1}\) are \(\{1/\lambda_i\}\) with the same eigenvectors.

import numpy as np

A = np.array([[3., 1.],
              [1., 2.]])   # shape (2, 2) symmetric PD

vals, P = np.linalg.eigh(A)
Ainv_diag  = P @ np.diag(1./vals) @ P.T    # shape (2, 2)
Ainv_numpy = np.linalg.inv(A)              # shape (2, 2)

print("Eigenvalues of A:   ", vals.round(4))
print("Eigenvalues of A⁻¹:", np.linalg.eigvalsh(Ainv_numpy).round(4))
print("Max error:", np.abs(Ainv_diag - Ainv_numpy).max().round(12))
Eigenvalues of A:    [1.382 3.618]
Eigenvalues of A⁻¹: [0.2764 0.7236]
Max error: 0.0

14.8.4 Exercise 14.2.2 — Population model long-run ratio

import numpy as np

A = np.array([[0.8, 0.4],
              [0.3, 0.6]])   # shape (2, 2)

vals, vecs = np.linalg.eig(A)
P    = vecs
Pinv = np.linalg.inv(P)

x0 = np.array([100., 50.])   # shape (2,)

k = 50
Dk = np.diag(vals**k)
x50 = np.real(P @ Dk @ Pinv @ x0)   # shape (2,)

print(f"x₅₀ = {x50.round(4)}")
print(f"Ratio x₁/x₂ after 50 steps: {x50[0]/x50[1]:.4f}")

# Long-run ratio is the dominant eigenvector
dom = np.argmax(np.abs(vals))
v_dom = vecs[:, dom].real
v_norm = v_dom / v_dom[1]
print(f"\nDominant eigenvector (normalized to x₂=1): [{v_norm[0]:.4f}, {v_norm[1]:.4f}]")
print(f"Long-run ratio x₁/x₂ from eigenvector: {v_dom[0]/v_dom[1]:.4f}")
x₅₀ = [1732.0874 1128.2606]
Ratio x₁/x₂ after 50 steps: 1.5352

Dominant eigenvector (normalized to x₂=1): [1.5352, 1.0000]
Long-run ratio x₁/x₂ from eigenvector: 1.5352

Solution. The dominant eigenvalue is \(\approx 1.0\) (or whatever the largest is); the state vector \(\mathbf{x}_k\) converges to a multiple of the dominant eigenvector. The long-run ratio \(x_1/x_2\) equals the ratio of the dominant eigenvector’s components.


14.8.5 Exercise 14.3.3 — Constructing a symmetric matrix with prescribed eigenvalues

import numpy as np

target_vals = np.array([5., 2., -1.])
# Choose a random orthogonal Q via QR
rng = np.random.default_rng(13)
Q, _ = np.linalg.qr(rng.standard_normal((3, 3)))   # shape (3, 3)

A = Q @ np.diag(target_vals) @ Q.T   # shape (3, 3)

recovered = np.linalg.eigvalsh(A)
print("Constructed A =\n", A.round(4))
print("\nTarget eigenvalues  :", sorted(target_vals))
print("Recovered eigenvalues:", recovered.round(4).tolist())
print("Symmetric:", np.allclose(A, A.T))
Constructed A =
 [[ 3.1121 -0.8554  1.92  ]
 [-0.8554 -0.1526  1.0515]
 [ 1.92    1.0515  3.0405]]

Target eigenvalues  : [np.float64(-1.0), np.float64(2.0), np.float64(5.0)]
Recovered eigenvalues: [-1.0, 2.0, 5.0]
Symmetric: True

14.8.6 Exercise 14.4.2 — Testing \(A = \begin{bmatrix}2&3\\3&2\end{bmatrix}\)

import numpy as np

A = np.array([[2., 3.],
              [3., 2.]])   # shape (2, 2)

# Test 1: eigenvalues
eigs = np.linalg.eigvalsh(A)
print("Eigenvalues:", eigs)
print("Test 1 (PD):", (eigs > 0).all())

# Test 2: leading principal minors
m1 = A[0, 0]
m2 = np.linalg.det(A)
print(f"\nLeading minors: {m1}, {m2:.2f}")
print("Test 2 (all > 0):", m1 > 0 and m2 > 0)

# Test 3: Cholesky
try:
    np.linalg.cholesky(A)
    print("Test 3: Cholesky ✓")
except np.linalg.LinAlgError:
    print("Test 3: Cholesky failed → NOT PD")
Eigenvalues: [-1.  5.]
Test 1 (PD): False

Leading minors: 2.0, -5.00
Test 2 (all > 0): False
Test 3: Cholesky failed → NOT PD

Solution. \(\det(A) = 4 - 9 = -5 < 0\), so the second leading principal minor is negative. Equivalently, the eigenvalues are \(2 \pm 3 = 5\) and \(-1\) — one negative eigenvalue confirms \(A\) is indefinite (not PD) despite having a positive diagonal.


14.8.7 Exercise 14.5.3 — Rank-1 approximation and explained variance

import numpy as np

rng = np.random.default_rng(2)
X = rng.standard_normal((50, 3))
X[:, 1] += 2 * X[:, 0]   # introduce correlation
X -= X.mean(axis=0)
Sigma = X.T @ X / 49       # shape (3, 3) covariance

vals, Q = np.linalg.eigh(Sigma)   # ascending order

# Rank-1 approximation using the largest eigenvalue
lam1    = vals[-1]
q1      = Q[:, -1]
Sigma_1 = lam1 * np.outer(q1, q1)   # shape (3, 3)

explained = lam1 / vals.sum()
print("Eigenvalues (ascending):", vals.round(4))
print(f"\nLargest eigenvalue: {lam1:.4f}")
print(f"Explained variance by rank-1 approx: {explained:.1%}")
print("\nRank-1 approximation Σ₁ =\n", Sigma_1.round(4))
Eigenvalues (ascending): [0.1506 0.937  5.737 ]

Largest eigenvalue: 5.7370
Explained variance by rank-1 approx: 84.1%

Rank-1 approximation Σ₁ =
 [[0.9274 2.112  0.0063]
 [2.112  4.8095 0.0144]
 [0.0063 0.0144 0.    ]]

14.8.8 Exercise App.2 — Continuous-to-discrete stability equivalence

Solution. By the spectral theorem, \(e^{A_c \Delta t}\) has eigenvalues \(e^{\lambda_c \Delta t}\). For stability of the discrete system we need \(|e^{\lambda_c \Delta t}| < 1\), i.e. \(e^{\operatorname{Re}(\lambda_c) \Delta t} < 1\), which holds if and only if \(\operatorname{Re}(\lambda_c) < 0\).

import numpy as np
from scipy.linalg import expm

dt = 0.1

# Stable continuous system
Ac = np.array([[-2., 1.],
               [ 0., -3.]])   # shape (2, 2)  — both eigenvalues < 0

Ad = expm(Ac * dt)   # shape (2, 2)

print("Continuous eigenvalues:", np.linalg.eigvals(Ac).round(4))
print("Discrete eigenvalues:  ", np.linalg.eigvals(Ad).round(4))
print("Discrete |λ|:          ", np.abs(np.linalg.eigvals(Ad)).round(4))
print("Discrete stable (all |λ| < 1):", (np.abs(np.linalg.eigvals(Ad)) < 1).all())
Continuous eigenvalues: [-2. -3.]
Discrete eigenvalues:   [0.8187 0.7408]
Discrete |λ|:           [0.8187 0.7408]
Discrete stable (all |λ| < 1): True