27  Appendix B: Matrix Factorization Summary

27.1 Matrix Factorization Summary

Every major matrix factorization decomposes a matrix into structured factors that expose a different geometric or algebraic property. Knowing which factorization to reach for – and why – is as important as knowing the API.


27.1.1 At a Glance

Factorization Form Requirement Cost Primary uses
LU \(PA = LU\) Square \(O(n^3)\) Solve \(Ax = b\), compute det
Cholesky \(A = LL^T\) Symmetric PD \(\tfrac{1}{2}O(n^3)\) Kalman filter, multivariate sampling
QR \(A = QR\) Any \(O(mn^2)\) Least squares, eigenvalue algorithms
Eigendecomposition \(A = P\Lambda P^{-1}\) Square, diagonalizable \(O(n^3)\) Dynamics, stability, PageRank
Spectral \(A = Q\Lambda Q^T\) Symmetric \(O(n^3)\) PCA, quadratic forms, Hessians
SVD \(A = U\Sigma V^T\) Any \(O(mn^2)\) Low-rank approx, pseudoinverse, PCA
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import scipy.linalg

rng = np.random.default_rng(42)
n = 6

A_sq = rng.standard_normal((n, n))
A_sq = A_sq @ A_sq.T + n * np.eye(n)   # shape (n, n)  SPD

P_mat, L, U = scipy.linalg.lu(A_sq)             # PA = LU
L_cho = np.linalg.cholesky(A_sq)                # A = L L^T
Q_qr, R_qr = np.linalg.qr(A_sq)                # A = Q R
vals, P_eig = np.linalg.eigh(A_sq)              # symmetric -- spectral
Lam = np.diag(vals)                              # shape (n, n)
U_svd, s, Vt = np.linalg.svd(A_sq, full_matrices=False)  # shape (n,)

threshold = 1e-10
patterns = [
    (np.abs(L) > threshold,        'L\n(lower)'),
    (np.abs(U) > threshold,        'U\n(upper)'),
    (np.abs(L_cho) > threshold,    'L\n(Cholesky)'),
    (np.abs(Q_qr) > threshold,     'Q\n(orthogonal)'),
    (np.abs(R_qr) > threshold,     'R\n(upper)'),
    (np.abs(P_eig) > threshold,    'P\n(eigenvecs)'),
    (np.abs(Lam) > threshold,      'Lambda\n(diagonal)'),
    (np.abs(U_svd) > threshold,    'U\n(left sv)'),
    (np.abs(Vt) > threshold,       'V^T\n(right sv)'),
]

fig, axes = plt.subplots(1, 9, figsize=(13, 2.0))
for ax, (pat, label) in zip(axes, patterns):
    ax.imshow(pat, cmap='Blues', vmin=0, vmax=1, aspect='auto')
    ax.set_title(label, fontsize=7.5)
    ax.set_xticks([])
    ax.set_yticks([])

plt.suptitle('Factor sparsity patterns (n = 6)', fontsize=9, y=1.03)
plt.tight_layout()
plt.show()
Figure 27.1: Sparsity patterns of the factor matrices produced by each decomposition for a 6x6 example. Dark cells are non-zero entries; light cells are structural zeros. L (lower triangular), U (upper triangular), Q (orthogonal – dense), R (upper triangular), P (eigenvector matrix – dense), and U/V^T (singular vector matrices – dense).

27.2 LU Decomposition

Form: \(PA = LU\)

Gaussian elimination with partial pivoting factors a square matrix \(A\) into a permutation matrix \(P\), a unit lower-triangular matrix \(L\) (ones on diagonal), and an upper-triangular matrix \(U\).

27.2.1 Structure

Factor Shape Properties
\(P\) \((n, n)\) Permutation; \(P^T = P^{-1}\)
\(L\) \((n, n)\) Lower triangular; diagonal entries all 1
\(U\) \((n, n)\) Upper triangular; diagonal entries are pivots

The factorization always exists for any square matrix (no symmetry or positive-definiteness required). After factoring, each additional right-hand side costs \(O(n^2)\) via forward and back substitution.


27.2.2 API

Call Description
np.linalg.solve(A, b) Solve once; computes LU internally
scipy.linalg.lu_factor(A) Return packed (lu, piv)
scipy.linalg.lu_solve((lu, piv), b) Solve for each RHS using stored LU
scipy.linalg.lu(A) Return explicit (P, L, U) matrices
import numpy as np
import scipy.linalg

rng = np.random.default_rng(0)
n = 4
A = rng.standard_normal((n, n))     # shape (4, 4)  general square
B = rng.standard_normal((n, 3))     # shape (4, 3)  multiple right-hand sides

# Factor once
lu, piv = scipy.linalg.lu_factor(A)        # packed representation

# Solve three right-hand sides cheaply
X = scipy.linalg.lu_solve((lu, piv), B)    # shape (4, 3)
print(f"||AX - B|| = {np.linalg.norm(A @ X - B):.2e}")

# Inspect factors explicitly
P, L, U = scipy.linalg.lu(A)              # P shape (4,4), L shape (4,4), U shape (4,4)
print(f"||PA - LU|| = {np.linalg.norm(P @ A - L @ U):.2e}")
print(f"L diagonal (all 1): {np.diag(L).round(4)}")
print(f"U diagonal (pivots): {np.diag(U).round(4)}")
||AX - B|| = 6.03e-16
||PA - LU|| = 1.16e-16
L diagonal (all 1): [1. 1. 1. 1.]
U diagonal (pivots): [-2.325  -1.1992  1.5065 -0.4487]

27.2.3 Determinant and Singular Systems

Once LU is available, the determinant is the product of the \(U\) diagonal (times the permutation sign), without a separate \(O(n^3)\) computation:

import numpy as np
import scipy.linalg

rng = np.random.default_rng(1)
A = rng.standard_normal((5, 5))   # shape (5, 5)

lu, piv = scipy.linalg.lu_factor(A)

# Sign of permutation: count swaps in piv
n = A.shape[0]
sign = (-1) ** int(np.sum(piv != np.arange(n)))
det_fast = sign * np.prod(np.diag(lu))
det_ref  = np.linalg.det(A)
print(f"det (fast): {det_fast:.6f}")
print(f"det (ref) : {det_ref:.6f}")
det (fast): 1.658587
det (ref) : 1.658587

scipy.linalg.lu_factor returns piv as a 1-D array of pivot indices used by LAPACK’s dgetrf, not a full permutation matrix. Use scipy.linalg.lu(A) when you need \(P\), \(L\), \(U\) as separate arrays.


27.2.4 Numerical Properties

Property Value
Stability Backward stable with partial pivoting
Condition sensitivity \(\kappa(A) = \|A\| \cdot \|A^{-1}\|\) governs error
Memory In-place; \(L\) and \(U\) share the storage of lu
Fails Singular matrix (zero pivot encountered)

When the condition number \(\kappa(A)\) is large, small perturbations in \(b\) produce large changes in \(x\). Check np.linalg.cond(A) before trusting the solution.


27.3 Cholesky Decomposition

Form: \(A = LL^T\) (or equivalently \(A = U^T U\) with an upper triangular \(U\))

Cholesky factors a symmetric positive definite (SPD) matrix into a lower triangular matrix \(L\) whose diagonal entries are positive. It requires roughly half the floating-point operations of LU and is numerically very stable.

27.3.1 Structure

Factor Shape Properties
\(L\) \((n, n)\) Lower triangular; positive diagonal

Existence and uniqueness are guaranteed if and only if \(A\) is SPD. In practice, a failed Cholesky is the cheapest way to test whether a matrix is positive definite.


27.3.2 API

Call Description
np.linalg.cholesky(A) Return lower triangular \(L\); raises LinAlgError if not PD
scipy.linalg.cho_factor(A) Return packed (c, lower) for repeated solves
scipy.linalg.cho_solve((c, lower), b) Solve each RHS with stored Cholesky
import numpy as np
import scipy.linalg

rng = np.random.default_rng(0)
n = 5
A = rng.standard_normal((n, n))      # shape (5, 5)
A = A @ A.T + n * np.eye(n)          # shape (5, 5)  SPD
b = rng.standard_normal((n,))        # shape (5,)

# Single solve
L = np.linalg.cholesky(A)            # shape (5, 5)  lower triangular
print(f"||LL^T - A|| = {np.linalg.norm(L @ L.T - A):.2e}")
print(f"L diagonal (positive): {np.diag(L).round(4)}")

# Pre-factor for multiple right-hand sides
c, lower = scipy.linalg.cho_factor(A)         # packed Cholesky
x = scipy.linalg.cho_solve((c, lower), b)     # shape (5,)
print(f"||Ax - b||   = {np.linalg.norm(A @ x - b):.2e}")
||LL^T - A|| = 2.88e-15
L diagonal (positive): [2.3961 3.1016 3.4962 2.5086 2.8318]
||Ax - b||   = 2.31e-16

27.3.3 Applications in the Book

Sampling from a multivariate Gaussian \(\mathcal{N}(\mu, \Sigma)\):

If \(\Sigma = LL^T\), then \(x = \mu + L z\) where \(z \sim \mathcal{N}(0, I)\) produces samples from \(\mathcal{N}(\mu, \Sigma)\).

import numpy as np

rng = np.random.default_rng(7)
n_dim, n_samples = 3, 10000

# Covariance matrix and mean
Sigma = np.array([[2.0, 0.8, 0.3],
                  [0.8, 1.5, 0.1],
                  [0.3, 0.1, 1.0]])    # shape (3, 3)  SPD
mu    = np.array([1.0, 2.0, 0.5])     # shape (3,)

L = np.linalg.cholesky(Sigma)         # shape (3, 3)

z = rng.standard_normal((n_dim, n_samples))    # shape (3, 10000)
samples = mu[:, np.newaxis] + L @ z            # shape (3, 10000)

# Verify: empirical covariance should match Sigma
Sigma_emp = np.cov(samples)            # shape (3, 3)
print(f"Max error in recovered covariance: {np.abs(Sigma_emp - Sigma).max():.4f}")
Max error in recovered covariance: 0.0404

Kalman filter covariance update uses Cholesky for numerically stable square-root forms. Sparse SLAM solves the normal equations via sparse Cholesky (via scikit-sparse or cholmod).


27.3.4 Numerical Properties

Property Value
Stability Highly stable; no pivoting needed
Cost \(\approx \frac{1}{3} n^3\) FLOPs (half of LU)
Failure mode Raises np.linalg.LinAlgError if matrix is not PD
Near-singular case Add small ridge A + eps * np.eye(n) before factoring

Diagonal check: np.all(np.diag(L) > 0) confirms positive definiteness after a successful Cholesky.


27.4 QR Decomposition

Form: \(A = QR\)

QR factors any \(m \times n\) matrix (\(m \ge n\)) into an orthogonal (or unitary) matrix \(Q\) and an upper-triangular matrix \(R\).

27.4.1 Thin vs Full QR

Mode \(Q\) shape \(R\) shape When to use
Thin (economy) \((m, n)\) \((n, n)\) Least squares, numerical linear algebra
Full \((m, m)\) \((m, n)\) Completing orthonormal bases, null space

In the thin form, \(Q\) has orthonormal columns (\(Q^T Q = I_n\)) but is not square, so \(Q Q^T \ne I_m\). In the full form, \(Q\) is square and orthogonal (\(Q^T Q = Q Q^T = I_m\)).


27.4.2 API

Call Description
np.linalg.qr(A) Thin QR by default; returns (Q, R)
np.linalg.qr(A, mode='complete') Full QR; \(Q\) is square \((m, m)\)
np.linalg.qr(A, mode='r') \(R\) only (faster when \(Q\) not needed)
scipy.linalg.qr(A, pivoting=True) Column-pivoted QR; reveals rank
import numpy as np

rng = np.random.default_rng(0)
m, n = 6, 4
A = rng.standard_normal((m, n))      # shape (6, 4)

# Thin QR
Q, R = np.linalg.qr(A)              # Q shape (6, 4), R shape (4, 4)
print(f"Thin QR: Q {Q.shape}, R {R.shape}")
print(f"||A - QR||        = {np.linalg.norm(A - Q @ R):.2e}")
print(f"||Q^T Q - I||     = {np.linalg.norm(Q.T @ Q - np.eye(n)):.2e}")

# Full QR
Q_full, R_full = np.linalg.qr(A, mode='complete')  # Q shape (6, 6)
print(f"\nFull QR: Q {Q_full.shape}, R {R_full.shape}")
print(f"||Q Q^T - I||     = {np.linalg.norm(Q_full @ Q_full.T - np.eye(m)):.2e}")
Thin QR: Q (6, 4), R (4, 4)
||A - QR||        = 1.60e-15
||Q^T Q - I||     = 1.00e-15

Full QR: Q (6, 6), R (6, 4)
||Q Q^T - I||     = 1.15e-15

27.4.3 QR for Least Squares

For an overdetermined system \(Ax \approx b\) (\(m > n\), full column rank), thin QR gives the least-squares solution without forming the normal equations \(A^T A x = A^T b\):

import numpy as np

rng = np.random.default_rng(3)
m, n = 20, 4
A = rng.standard_normal((m, n))    # shape (20, 4)
b = rng.standard_normal((m,))      # shape (20,)

# QR least squares: A = QR => min||Ax-b|| => Rx = Q^T b
Q, R = np.linalg.qr(A)            # Q shape (20,4), R shape (4,4)
x_qr = np.linalg.solve(R, Q.T @ b)          # shape (4,)

# Reference via lstsq
x_ls, *_ = np.linalg.lstsq(A, b, rcond=None)   # shape (4,)

print(f"||x_qr - x_ls||  = {np.linalg.norm(x_qr - x_ls):.2e}")
print(f"||Ax_qr - b||    = {np.linalg.norm(A @ x_qr - b):.4f}")
||x_qr - x_ls||  = 2.84e-16
||Ax_qr - b||    = 3.9128

27.4.4 Column-Pivoted QR and Rank Revelation

scipy.linalg.qr(A, pivoting=True) returns \(AP = QR\) where \(P\) is a permutation that places the largest pivots first on the diagonal of \(R\). The numerical rank is the number of diagonal entries above a tolerance.

import numpy as np
import scipy.linalg

rng = np.random.default_rng(5)
# Rank-2 matrix embedded in R^4
U = rng.standard_normal((5, 2))    # shape (5, 2)
V = rng.standard_normal((2, 4))    # shape (2, 4)
A = U @ V                          # shape (5, 4)  rank 2

Q, R, piv = scipy.linalg.qr(A, pivoting=True)
diag_R = np.abs(np.diag(R))
tol    = diag_R[0] * max(A.shape) * np.finfo(float).eps
rank   = int(np.sum(diag_R > tol))
print(f"Diagonal of R: {diag_R.round(4)}")
print(f"Numerical rank: {rank}")   # 2
Diagonal of R: [5.6351 2.0388 0.     0.    ]
Numerical rank: 2

27.4.5 Numerical Properties

Property Value
Stability Backward stable (Householder reflections or Givens rotations)
Cost \(O(mn^2 - \frac{1}{3}n^3)\) for \(m \ge n\)
Orthogonality loss Accumulates slowly; re-orthogonalize long iteration sequences
Rank-deficient case Use column pivoting (scipy.linalg.qr(A, pivoting=True))

The thin \(Q\) from np.linalg.qr uses Householder reflections internally. It is more stable than the Gram-Schmidt orthogonalization you might write by hand.


27.5 Eigendecomposition and Spectral Decomposition

27.5.1 Eigendecomposition (General): \(A = P\Lambda P^{-1}\)

Any square, diagonalizable matrix \(A\) can be written as \(A = P \Lambda P^{-1}\) where the columns of \(P\) are the eigenvectors and \(\Lambda\) is diagonal with the eigenvalues.

Factor Shape Properties
\(P\) \((n, n)\) Columns are right eigenvectors; may be ill-conditioned
\(\Lambda\) \((n, n)\) Diagonal; entries may be complex for non-symmetric \(A\)

When to use: discrete-time dynamics (\(A^k = P \Lambda^k P^{-1}\)), continuous-time stability (sign of real parts of eigenvalues), PageRank, Markov chains.


27.5.2 Spectral Decomposition (Symmetric): \(A = Q\Lambda Q^T\)

When \(A\) is real symmetric (or complex Hermitian), the Spectral Theorem guarantees:

  • All eigenvalues are real.
  • Eigenvectors can be chosen orthonormal: \(Q^T Q = Q Q^T = I\).
  • The decomposition is \(A = Q \Lambda Q^T\) (no matrix inversion needed).

This is the numerically preferable form; always use eigh for symmetric matrices.

Factor Shape Properties
\(Q\) \((n, n)\) Orthogonal; columns are orthonormal eigenvectors
\(\Lambda\) \((n, n)\) Diagonal; real, ascending by default

27.5.3 API

Call Returns Notes
np.linalg.eig(A) (vals, vecs) General; eigenvalues may be complex
np.linalg.eigh(A) (vals, vecs) Symmetric/Hermitian; real, ascending order
np.linalg.eigvals(A) vals Eigenvalues only; faster than eig
scipy.linalg.eigh(A, B) (vals, vecs) Generalized problem \(Ax = \lambda Bx\)
scipy.sparse.linalg.eigsh(A, k) (vals, vecs) Top-\(k\) from large sparse symmetric \(A\)
import numpy as np
import scipy.linalg

rng = np.random.default_rng(0)
n = 5
A = rng.standard_normal((n, n))     # shape (5, 5)
A = A @ A.T                         # shape (5, 5)  symmetric PSD

# Spectral decomposition
vals, Q = np.linalg.eigh(A)         # vals shape (5,) ascending; Q shape (5, 5)
print(f"Eigenvalues (ascending): {vals.round(4)}")

# Reconstruct A
Lam = np.diag(vals)                 # shape (5, 5)
A_rec = Q @ Lam @ Q.T              # shape (5, 5)
print(f"||A - Q Lam Q^T||     = {np.linalg.norm(A - A_rec):.2e}")
print(f"||Q^T Q - I||         = {np.linalg.norm(Q.T @ Q - np.eye(n)):.2e}")
Eigenvalues (ascending): [0.112  0.2429 3.3676 6.6151 8.0793]
||A - Q Lam Q^T||     = 1.22e-14
||Q^T Q - I||         = 1.80e-15

27.5.4 General (Non-Symmetric) Eigendecomposition

import numpy as np

rng = np.random.default_rng(2)
n = 4
A = rng.standard_normal((n, n))    # shape (4, 4)  general (not symmetric)

vals, P = np.linalg.eig(A)         # vals may be complex; P shape (4, 4)
print(f"Eigenvalues: {vals.round(3)}")
print(f"Eigenvalues are real: {np.allclose(vals.imag, 0)}")

# Check A P = P Lambda (element-wise: A @ P[:,i] = vals[i] * P[:,i])
residuals = np.linalg.norm(A @ P - P @ np.diag(vals), axis=0)
print(f"Eigenvector residuals: {residuals.round(6)}")
Eigenvalues: [-0.128+1.256j -0.128-1.256j  1.838+0.j     0.631+0.j   ]
Eigenvalues are real: False
Eigenvector residuals: [0. 0. 0. 0.]

27.5.5 Power of a Matrix via Eigendecomposition

\(A^k = P \Lambda^k P^{-1}\) (spectral form: \(A^k = Q \Lambda^k Q^T\))

import numpy as np

rng = np.random.default_rng(9)
n = 3
A = rng.standard_normal((n, n))    # shape (3, 3)
A = A @ A.T                        # shape (3, 3)  symmetric PSD
k = 10

# Via spectral: A^k = Q Lambda^k Q^T
vals, Q = np.linalg.eigh(A)       # vals shape (3,), Q shape (3, 3)
Ak_spectral = Q @ np.diag(vals**k) @ Q.T   # shape (3, 3)

# Via repeated multiplication (reference)
Ak_ref = np.linalg.matrix_power(A, k)      # shape (3, 3)

print(f"||A^{k} via spectral - direct||: {np.linalg.norm(Ak_spectral - Ak_ref):.2e}")
||A^10 via spectral - direct||: 5.40e-10

27.5.6 Pitfalls

Pitfall Symptom Fix
Using eig on a symmetric matrix Spurious imaginary parts Use eigh instead
eigh returns ascending order First eigenvalue is smallest Flip: vals[::-1], Q[:, ::-1] for descending
Eigenvectors not unique Sign flip between runs Normalize signs; dot product with reference vector
Ill-conditioned \(P\) in eig Large inversion error Condition number of \(P\): np.linalg.cond(P)
eig eigenvalues complex for real \(A\) Complex conjugate pairs Appear when \(A\) has rotation component

np.linalg.eigh sorts eigenvalues in ascending order. To get descending (largest first, as in PCA), reverse both arrays: vals = vals[::-1]; Q = Q[:, ::-1].


27.6 Singular Value Decomposition

Form: \(A = U \Sigma V^T\)

SVD factors any \(m \times n\) matrix into two orthogonal matrices (\(U\) and \(V\)) and a diagonal matrix \(\Sigma\) of singular values \(\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_k \ge 0\) where \(k = \min(m, n)\).

27.6.1 Thin vs Full SVD

Mode \(U\) shape \(\Sigma\) (as 1-D s) \(V^T\) shape When to use
Thin (economy) \((m, k)\) \((k,)\) \((k, n)\) PCA, least squares, low-rank approx
Full \((m, m)\) \((k,)\) \((n, n)\) Null space, complete bases

where \(k = \min(m, n)\). In the thin form, \(U\) and \(V\) have orthonormal columns; in the full form both are square orthogonal matrices.


27.6.2 API

Call Returns Notes
np.linalg.svd(A, full_matrices=False) (U, s, Vt) Thin; s is 1-D descending
np.linalg.svd(A, full_matrices=True) (U, s, Vt) Full; s is still 1-D
np.linalg.svd(A, compute_uv=False) s 1-D array directly – not a tuple
scipy.linalg.svd(A, full_matrices=False) (U, s, Vt) Adds lapack_driver option
scipy.sparse.linalg.svds(A, k) (U, s, Vt) Top-\(k\); returns ascending order
import numpy as np

rng = np.random.default_rng(0)
m, n = 6, 4
A = rng.standard_normal((m, n))      # shape (6, 4)

# Thin SVD
U, s, Vt = np.linalg.svd(A, full_matrices=False)
# U shape (6, 4), s shape (4,) descending, Vt shape (4, 4)
print(f"U {U.shape}, s {s.shape}, Vt {Vt.shape}")
print(f"Singular values: {s.round(4)}")

# Reconstruct A from thin factors
A_rec = U @ np.diag(s) @ Vt         # shape (6, 4)
print(f"||A - U S V^T|| = {np.linalg.norm(A - A_rec):.2e}")

# Singular values only -- returns 1-D array directly, NOT a tuple
s_only = np.linalg.svd(A, compute_uv=False)   # shape (4,)
print(f"s_only type: {type(s_only)}, shape: {s_only.shape}")
U (6, 4), s (4,), Vt (4, 4)
Singular values: [3.0361 2.0335 1.8563 0.8954]
||A - U S V^T|| = 9.83e-15
s_only type: <class 'numpy.ndarray'>, shape: (4,)

27.6.3 Low-Rank Approximation

The best rank-\(r\) approximation to \(A\) in both the 2-norm and Frobenius norm is \(A_r = U_r \Sigma_r V_r^T\) (Eckart-Young theorem).

import numpy as np

rng = np.random.default_rng(4)
m, n = 50, 40
A = rng.standard_normal((m, n))    # shape (50, 40)

U, s, Vt = np.linalg.svd(A, full_matrices=False)
# U shape (50,40), s shape (40,), Vt shape (40,40)

for r in [1, 5, 10, 20]:
    A_r = U[:, :r] @ np.diag(s[:r]) @ Vt[:r, :]   # shape (50, 40)
    err = np.linalg.norm(A - A_r, ord='fro')
    frac = (s[:r]**2).sum() / (s**2).sum()
    print(f"rank-{r:2d}: ||A - A_r||_F = {err:.3f},  variance captured = {frac:.3f}")
rank- 1: ||A - A_r||_F = 42.357,  variance captured = 0.089
rank- 5: ||A - A_r||_F = 35.741,  variance captured = 0.351
rank-10: ||A - A_r||_F = 28.875,  variance captured = 0.577
rank-20: ||A - A_r||_F = 16.883,  variance captured = 0.855

27.6.4 Pseudoinverse

\(A^+ = V \Sigma^+ U^T\) where \(\Sigma^+\) inverts the non-zero singular values.

import numpy as np

rng = np.random.default_rng(6)
m, n = 8, 5
A = rng.standard_normal((m, n))    # shape (8, 5)  full column rank
b = rng.standard_normal((m,))      # shape (8,)

U, s, Vt = np.linalg.svd(A, full_matrices=False)
# Pseudoinverse via SVD
tol  = s[0] * max(m, n) * np.finfo(float).eps
s_inv = np.where(s > tol, 1.0 / s, 0.0)   # shape (5,)
A_pinv = Vt.T @ np.diag(s_inv) @ U.T      # shape (5, 8)

x_svd   = A_pinv @ b             # shape (5,)
x_ref   = np.linalg.pinv(A) @ b  # shape (5,)
print(f"||x_svd - x_ref||       = {np.linalg.norm(x_svd - x_ref):.2e}")
print(f"least-squares residual  = {np.linalg.norm(A @ x_svd - b):.4f}")
||x_svd - x_ref||       = 3.04e-16
least-squares residual  = 1.2993

27.6.5 SVD and PCA

PCA of a zero-mean data matrix \(X\) of shape \((n, d)\) (\(n\) samples, \(d\) features) uses the SVD of \(X\) directly rather than forming the covariance matrix \(X^T X / (n-1)\):

import numpy as np

rng = np.random.default_rng(42)
n, d = 200, 10
X_raw = rng.standard_normal((n, d))    # shape (200, 10)
X     = X_raw - X_raw.mean(axis=0)     # shape (200, 10)  zero mean

U, s, Vt = np.linalg.svd(X, full_matrices=False)
# Vt rows are principal directions; s**2 / (n-1) = eigenvalues of covariance

# Verify: singular values of X relate to eigenvalues of X^T X
Cov = X.T @ X / (n - 1)               # shape (10, 10)
evals_cov = np.linalg.eigvalsh(Cov)   # shape (10,) ascending
evals_svd = (s**2) / (n - 1)          # shape (10,) descending

print(f"Covariance eigenvalues (desc): {evals_cov[::-1].round(3)}")
print(f"SVD-based eigenvalues  (desc): {evals_svd.round(3)}")
Covariance eigenvalues (desc): [1.45  1.233 1.169 1.092 1.021 0.925 0.851 0.801 0.794 0.715]
SVD-based eigenvalues  (desc): [1.45  1.233 1.169 1.092 1.021 0.925 0.851 0.801 0.794 0.715]

27.6.6 Numerical Properties

Property Value
Stability Among the most backward-stable algorithms
Cost \(O(mn^2)\) for \(m \ge n\); expensive for large dense matrices
Sparse / large Use scipy.sparse.linalg.svds(A, k) for top-\(k\)
svds order Ascending (opposite of np.linalg.svd) – always sort after
Condition number \(\kappa(A) = \sigma_{\max} / \sigma_{\min}\)
import numpy as np
import scipy.sparse.linalg

rng = np.random.default_rng(0)
A = rng.standard_normal((30, 20))    # shape (30, 20)

U_sp, s_sp, Vt_sp = scipy.sparse.linalg.svds(A, k=5)
# svds returns ASCENDING order -- reverse to get largest first
idx = np.argsort(s_sp)[::-1]
s_sp = s_sp[idx]
U_sp = U_sp[:, idx]
Vt_sp = Vt_sp[idx]

s_ref = np.linalg.svd(A, compute_uv=False)[:5]   # descending, top 5
print(f"svds (sorted desc):  {s_sp.round(4)}")
print(f"svd  (top 5 desc):   {s_ref.round(4)}")
svds (sorted desc):  [9.2868 8.4279 8.2091 7.3689 6.6779]
svd  (top 5 desc):   [9.2868 8.4279 8.2091 7.3689 6.6779]

scipy.sparse.linalg.svds returns singular values in ascending order. Always sort by descending singular value after calling svds.


27.7 Decision Guide

27.7.1 Which Factorization to Use

Situation Factorization Function
Solve \(Ax = b\), square \(A\), once LU (internal) np.linalg.solve(A, b)
Solve \(Ax = b\), square \(A\), many RHS LU explicit scipy.linalg.lu_factor + lu_solve
Solve \(Ax = b\), \(A\) symmetric PD Cholesky scipy.linalg.cho_factor + cho_solve
Overdetermined \(Ax \approx b\) (least squares) QR or SVD np.linalg.lstsq or QR solve
All eigenvalues, symmetric dense \(A\) Spectral np.linalg.eigh
All eigenvalues, non-symmetric \(A\) Eigen np.linalg.eig
Top-\(k\) eigenvalues, large sparse symmetric Lanczos scipy.sparse.linalg.eigsh(A, k)
All singular values, dense \(A\) SVD np.linalg.svd(A, full_matrices=False)
Top-\(k\) singular values, large \(A\) Truncated SVD scipy.sparse.linalg.svds(A, k)
Low-rank approximation SVD (Eckart-Young) Keep top-\(r\) singular triplets
PCA of data matrix SVD of centered \(X\) np.linalg.svd(X, full_matrices=False)
Sample from \(\mathcal{N}(\mu, \Sigma)\) Cholesky of \(\Sigma\) np.linalg.cholesky
Compute pseudoinverse \(A^+\) SVD np.linalg.pinv(A) or manual
Reveal numerical rank Pivoted QR or SVD scipy.linalg.qr(A, pivoting=True)
Matrix exponential \(e^A\) Pade + scaling (SciPy) scipy.linalg.expm(A)

27.7.2 Relationship Between Factorizations

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

fig, ax = plt.subplots(figsize=(9, 5))
ax.set_xlim(0, 10)
ax.set_ylim(0, 6)
ax.axis('off')

nodes = {
    'SVD':   (5.0, 5.0),
    'Spectral': (2.0, 3.2),
    'QR':    (5.0, 3.2),
    'LU':    (8.0, 3.2),
    'Eigen': (2.0, 1.2),
    'Cholesky': (8.0, 1.2),
}
colors = {
    'SVD': '#4a90d9',
    'Spectral': '#2ecc71',
    'QR': '#2ecc71',
    'LU': '#aaaaaa',
    'Eigen': '#f39c12',
    'Cholesky': '#f39c12',
}
descriptions = {
    'SVD':      'A = U S V^T\nany matrix',
    'Spectral': 'A = Q Lam Q^T\nsymmetric',
    'QR':       'A = QR\nany matrix',
    'LU':       'PA = LU\nsquare',
    'Eigen':    'A = P Lam P^-1\ndiagonalizable',
    'Cholesky': 'A = L L^T\nsymmetric PD',
}

for name, (x, y) in nodes.items():
    box = mpatches.FancyBboxPatch(
        (x - 1.1, y - 0.55), 2.2, 1.1,
        boxstyle='round,pad=0.1',
        facecolor=colors[name], edgecolor='#333333',
        linewidth=1.2, alpha=0.85, zorder=3,
    )
    ax.add_patch(box)
    ax.text(x, y + 0.12, name, ha='center', va='center',
            fontsize=10, fontweight='bold', color='white', zorder=4)
    ax.text(x, y - 0.22, descriptions[name], ha='center', va='center',
            fontsize=7, color='white', zorder=4)

edges = [
    ('SVD', 'Spectral', 'A symmetric\n=> U=V=Q'),
    ('SVD', 'QR',        'thin U gives\northogonal basis'),
    ('Spectral', 'Eigen', 'special case\nQ=P, P^-1=Q^T'),
    ('Spectral', 'Cholesky', 'A PD =>\nL from Q,Lam'),
]

for src, dst, label in edges:
    x0, y0 = nodes[src]
    x1, y1 = nodes[dst]
    ax.annotate('', xy=(x1, y1 + 0.55), xytext=(x0, y0 - 0.55),
                arrowprops=dict(arrowstyle='->', color='#555555', lw=1.4),
                zorder=2)
    mx, my = (x0 + x1) / 2, (y0 + y1) / 2
    ax.text(mx + 0.15, my, label, ha='left', va='center',
            fontsize=6.5, color='#444444', zorder=4)

ax.set_title('Matrix factorization relationships', fontsize=11)
plt.tight_layout()
plt.show()
Figure 27.2: Relationships among the six major factorizations. SVD is the most general and subsumes several others as special cases. Arrows indicate ‘can be derived from’ or ‘is a special case of’.

27.7.3 Conditioning and Numerical Stability at a Glance

import numpy as np
import scipy.linalg

rng = np.random.default_rng(0)
n = 6
A = rng.standard_normal((n, n))
A = A @ A.T + n * np.eye(n)     # shape (6, 6)  SPD

kappa = np.linalg.cond(A)

P_mat, L, U = scipy.linalg.lu(A)
Q, R = np.linalg.qr(A)
vals, _ = np.linalg.eigh(A)
U_s, s, Vt = np.linalg.svd(A, full_matrices=False)

print(f"Condition number kappa(A)      : {kappa:.2f}")
print(f"kappa via singular values       : {s[0]/s[-1]:.2f}")
print(f"kappa via eigenvalues (eigh)    : {vals[-1]/vals[0]:.2f}")
print(f"kappa via LU (U diagonal ratio) : {np.abs(np.diag(U)[0]/np.diag(U)[-1]):.2f}")
Condition number kappa(A)      : 2.67
kappa via singular values       : 2.67
kappa via eigenvalues (eigh)    : 2.67
kappa via LU (U diagonal ratio) : 1.01

All three estimates of condition number should agree for a well-formed SPD matrix. Large condition numbers (\(\kappa \gg 10^6\)) mean the system is nearly singular and solutions will have \(\log_{10}(\kappa)\) digits of accuracy lost.