2 Systems of Linear Equations

2.1 Linear Equations as Hyperplane Intersections

A single linear equation in two unknowns,

\[a_1 x_1 + a_2 x_2 = b\]

describes a line in \(\mathbb{R}^2\). Every point on that line satisfies the equation. A system of two such equations asks: where do two lines meet?

That geometric picture generalises exactly. In \(\mathbb{R}^3\) one equation defines a plane. In \(\mathbb{R}^n\) it defines a hyperplane — a flat \((n-1)\)-dimensional slice through \(n\)-dimensional space. Solving a system of \(m\) equations in \(n\) unknowns means finding the intersection of \(m\) hyperplanes.

2.1.1 One Equation, One Hyperplane

The equation \(\mathbf{a}^T \mathbf{x} = b\) (row vector \(\mathbf{a}\), column vector \(\mathbf{x}\)) defines a hyperplane perpendicular to \(\mathbf{a}\) at signed distance \(b / \|\mathbf{a}\|\) from the origin.

import numpy as np
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(6, 5))

x = np.linspace(-3, 3, 300)

# Equation 1:  2x1 + x2 = 3  →  x2 = 3 - 2*x1
ax.plot(x, 3 - 2*x, color='#4a90d9', lw=2, label=r'$2x_1 + x_2 = 3$')

# Equation 2: -x1 + 2x2 = 2  →  x2 = (2 + x1) / 2
ax.plot(x, (2 + x) / 2, color='#2ecc71', lw=2, label=r'$-x_1 + 2x_2 = 2$')

# Intersection
A = np.array([[2., 1.], [-1., 2.]])
b = np.array([3., 2.])
sol = np.linalg.solve(A, b)
ax.scatter(*sol, color='tomato', s=80, zorder=5,
           label=f'solution ({sol[0]:.2f}, {sol[1]:.2f})')

# Normal vectors at solution
for row, col in zip(A, ['#4a90d9', '#2ecc71']):
    ax.annotate('', xy=sol + row * 0.35, xytext=sol,
                arrowprops=dict(arrowstyle='->', color=col, lw=1.8))

ax.set_xlim(-3, 3); ax.set_ylim(-2, 4)
ax.axhline(0, color='#333333', lw=0.5, alpha=0.4)
ax.axvline(0, color='#333333', lw=0.5, alpha=0.4)
ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$')
ax.set_title('Two lines intersect at the unique solution', pad=10)
ax.legend(fontsize=9)
ax.grid(True, alpha=0.2)
plt.tight_layout()
plt.show()

The arrows show the normal vectors \(\mathbf{a}_1 = [2, 1]^T\) and \(\mathbf{a}_2 = [-1, 2]^T\) — each perpendicular to its line. The solution is the unique point that lies on both hyperplanes simultaneously.

2.1.2 Extending to 3D

With three unknowns, each equation cuts out a plane in \(\mathbb{R}^3\). Two planes generically intersect in a line. Add a third plane — if it crosses that line, you get a point.

import numpy as np
import matplotlib.pyplot as plt

colors = ['#4a90d9', '#2ecc71', '#9b59b6']

# System with unique solution at (1, 1, 1)
A = np.array([[2., 1., 0.],
              [0., 3., 1.],
              [1., 0., 2.]])
b = A @ np.array([1., 1., 1.])

fig = plt.figure(figsize=(7, 6))
ax  = fig.add_subplot(111, projection='3d')

r = np.linspace(-0.5, 2.5, 40)
X, Y = np.meshgrid(r, r)

for row, rhs, col in zip(A, b, colors):
    if abs(row[2]) > 1e-9:
        Z = (rhs - row[0]*X - row[1]*Y) / row[2]
        mask = (Z > -0.5) & (Z < 2.5)
        Zp = np.where(mask, Z, np.nan)
        ax.plot_surface(X, Y, Zp, alpha=0.4, color=col)

sol = np.linalg.solve(A, b)
ax.scatter(*sol, color='black', s=80, zorder=10, depthshade=False, label='solution')

ax.set_xlabel('$x$', labelpad=3)
ax.set_ylabel('$y$', labelpad=3)
ax.set_zlabel('$z$', labelpad=3)
ax.set_title('Three planes — one intersection point', pad=8)
ax.tick_params(labelsize=7)
plt.tight_layout()
plt.show()

2.1.3 The Three Cases — Unique, No Solution, Infinitely Many

Two lines in \(\mathbb{R}^2\) can intersect at a point, be parallel (never meet), or coincide (overlap entirely). The three panels below show each case with concrete slope/intercept values.

import numpy as np
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 3, figsize=(13, 4))
x = np.linspace(-4, 4, 300)
blue, green = '#4a90d9', '#2ecc71'

# Panel 1: Unique solution
ax = axes[0]
s1, i1 = 1.0,  1.0
s2, i2 = -0.5, -1.0
xi = (i2 - i1) / (s1 - s2)
yi = s1 * xi + i1
ax.plot(x, s1*x + i1, color=blue,  lw=2, label=f'y = {s1}x + {i1}')
ax.plot(x, s2*x + i2, color=green, lw=2, label=f'y = {s2}x + {i2}')
ax.scatter(xi, yi, color='tomato', s=80, zorder=5,
           label=f'({xi:.2f}, {yi:.2f})')
ax.set_title('Unique solution\n(lines intersect)', fontsize=10)
ax.legend(fontsize=8)

# Panel 2: Parallel lines — no solution
ax = axes[1]
s3, i3 = 1.5,  1.0
s4, i4 = 1.5, -2.0
ax.plot(x, s3*x + i3, color=blue,  lw=2, label=f'y = {s3}x + {i3}')
ax.plot(x, s4*x + i4, color=green, lw=2, label=f'y = {s4}x + {i4}')
ax.set_title('No solution\n(parallel lines, same slope)', fontsize=10)
ax.legend(fontsize=8)

# Panel 3: Coincident lines — infinitely many solutions
ax = axes[2]
ax.plot(x,  1.0*x + 1.0, color=blue,  lw=3,   label='y = x + 1')
ax.plot(x, (2.0*x + 2.0)/2, color=green, lw=1.5, ls='--',
        label='2y = 2x + 2  (same line)')
ax.set_title('Infinitely many solutions\n(coincident lines)', fontsize=10)
ax.legend(fontsize=8)

for ax in axes:
    ax.axhline(0, color='#333333', lw=0.4, alpha=0.4)
    ax.axvline(0, color='#333333', lw=0.4, alpha=0.4)
    ax.set_xlim(-4, 4); ax.set_ylim(-4, 4)
    ax.grid(True, alpha=0.2)
    ax.set_aspect('equal')
    ax.set_xlabel('$x$'); ax.set_ylabel('$y$')

fig.suptitle('Three cases: unique / no solution / infinitely many', fontsize=12)
fig.tight_layout()
plt.show()

Setting both slopes equal makes the lines parallel — no solution. Setting both slopes and intercepts equal makes them coincide — infinitely many solutions. These two degenerate cases are not numerical flukes; they are the subject of §2.2.

2.1.4 The Normal-Vector Viewpoint

Row \(i\) of \(A\) is the normal vector to the \(i\)-th hyperplane. The right-hand side \(b_i\) is the signed offset along that normal:

\[\mathbf{a}_i^T \mathbf{x} = b_i \quad \Leftrightarrow \quad \text{signed distance from origin} = \frac{b_i}{\|\mathbf{a}_i\|}\]

Changing \(\mathbf{a}_i\) tilts the hyperplane.
Changing \(b_i\) slides it without tilting.
The solution \(\mathbf{x}^*\) lies on every hyperplane simultaneously.

When you debug a failed linear solve, the first geometric question is: are any two of my hyperplanes nearly parallel? Nearly parallel normals mean nearly linearly dependent rows — rank deficiency. That is Chapter 3.

2.2 The Three Solution Cases

A system \(A\mathbf{x} = \mathbf{b}\) with \(A \in \mathbb{R}^{m \times n}\) has exactly one of three outcomes:

Case	Geometry	Physical example
No solution	Hyperplanes miss each other	Contradictory sensor readings
Unique solution	Hyperplanes meet at one point	Well-determined position fix
Infinitely many	Hyperplanes share a line or subspace	Underdetermined kinematics

There is no fourth option. Understanding which case you are in — before you call a solver — is what separates robust code from brittle code.

2.2.1 Case 1 — No Solution (Inconsistent System)

The system is inconsistent when \(\mathbf{b}\) is not in the column space of \(A\): no linear combination of \(A\)’s columns reaches \(\mathbf{b}\).

Physical example — GPS with a faulty satellite. Three range measurements from satellites should meet at one point. If one satellite has a clock error, the three spheres no longer intersect — the system is inconsistent.

import numpy as np

# Inconsistent 3×2 system: two lines that don't intersect in R^2
A = np.array([[1.,  2.],
              [2.,  4.],   # parallel to row 1 (same normal direction)
              [1., -1.]])
b = np.array([3., 7., 1.]) # 7 ≠ 2*3=6, so rows 1 & 2 are contradictory

# np.linalg.solve requires square A; use lstsq which always returns something
x, res, rank, sv = np.linalg.lstsq(A, b, rcond=None)

print(f"Rank of A:   {rank}  (out of {min(A.shape)} possible)")
print(f"Residual:    {np.linalg.norm(A @ x - b):.4f}   ← non-zero means no exact solution")
print(f"lstsq x:     {x.round(4)}   ← best-fit, not exact")

Rank of A:   2  (out of 2 possible)
Residual:    0.4472   ← non-zero means no exact solution
lstsq x:     [1.8 0.8]   ← best-fit, not exact

The non-zero residual is the diagnostic. A zero residual would mean the system was consistent; a large residual points to a measurement contradiction worth investigating before proceeding.

2.2.2 Case 2 — Unique Solution

The system has a unique solution when \(A\) has full column rank and \(m \geq n\) (at least as many equations as unknowns, independent ones).

Physical example — trilateration from three WiFi access points. Three known-position routers each provide a range estimate. Three linear constraints, three unknowns — generically one solution.

import numpy as np

# 3×3 system, full rank
A = np.array([[1., 0., 1.],
              [0., 2., 1.],
              [3., 1., 0.]])
b = np.array([4., 5., 7.])

x = np.linalg.solve(A, b)
print(f"Solution:    {x.round(6)}")
print(f"Verify Ax:   {(A @ x).round(6)}   (should equal b = {b})")
print(f"Condition:   {np.linalg.cond(A):.2f}")

Solution:    [1.857143 1.428571 2.142857]
Verify Ax:   [4. 5. 7.]   (should equal b = [4. 5. 7.])
Condition:   3.47

The condition number is what distinguishes a numerically unique solution from a practically unreliable one. A condition number of 1000 does not mean no solution exists — it means the solution is sensitive to errors in \(\mathbf{b}\).

2.2.3 Case 3 — Infinitely Many Solutions

When \(A\) has more unknowns than independent equations, the solution is not a point but a flat (an affine subspace). The general solution is:

\[\mathbf{x} = \mathbf{x}_p + \mathbf{x}_h\]

where \(\mathbf{x}_p\) is any particular solution satisfying \(A\mathbf{x}_p = \mathbf{b}\), and \(\mathbf{x}_h\) is any vector in the null space of \(A\) (satisfying \(A\mathbf{x}_h = \mathbf{0}\)).

Physical example — redundant robot arm. A 7-DoF arm positioning its end-effector in 6-DoF space has one extra degree of freedom. For any target pose, there is a 1-parameter family of joint configurations that reaches it.

import numpy as np

# Underdetermined system: 2 equations, 3 unknowns
A = np.array([[1., 2., 3.],
              [0., 1., 2.]])
b = np.array([6., 3.])

# Minimum-norm particular solution via lstsq
x_p, _, rank, _ = np.linalg.lstsq(A, b, rcond=None)
print(f"Rank:        {rank}  (< {A.shape[1]} unknowns → underdetermined)")
print(f"x_p (min-norm): {x_p.round(4)}")
print(f"Residual:    {np.linalg.norm(A @ x_p - b):.2e}   ← exact solution")

# Null space: any vector orthogonal to both rows of A
# The null space of a 2×3 rank-2 matrix is 1-dimensional
_, _, Vt = np.linalg.svd(A)
null_vec = Vt[-1]   # last right singular vector
print(f"\nNull vector: {null_vec.round(4)}")
print(f"A @ null_vec: {(A @ null_vec).round(10)}   ← should be [0, 0]")

# Any scalar multiple of null_vec can be added to x_p
alpha = 2.5
x_other = x_p + alpha * null_vec
print(f"\nalpha={alpha}: x = {x_other.round(4)},  A @ x = {(A @ x_other).round(6)}")

Rank:        2  (< 3 unknowns → underdetermined)
x_p (min-norm): [1. 1. 1.]
Residual:    2.70e-15   ← exact solution

Null vector: [ 0.4082 -0.8165  0.4082]
A @ null_vec: [-0. -0.]   ← should be [0, 0]

alpha=2.5: x = [ 2.0206 -1.0412  2.0206],  A @ x = [6. 3.]

The null space is the set of all “free motions” — directions you can move without changing the output \(A\mathbf{x}\). Chapter 3 derives the null space systematically from row reduction.

2.2.4 Detecting the Case Before Solving

In practice you should diagnose the system before committing to a solver:

import numpy as np

def diagnose_system(A, b):
    """Print which of the three cases applies."""
    m, n     = A.shape
    rank_A   = np.linalg.matrix_rank(A)
    Ab       = np.column_stack([A, b])
    rank_Ab  = np.linalg.matrix_rank(Ab)

    print(f"Shape:        {m} equations, {n} unknowns")
    print(f"rank(A) = {rank_A},   rank([A|b]) = {rank_Ab}")

    if rank_Ab > rank_A:
        print("→ INCONSISTENT: no solution")
    elif rank_A == n:
        print("→ UNIQUE solution")
    else:
        print(f"→ INFINITELY MANY: null space dimension = {n - rank_A}")

# Test all three cases
print("=== Inconsistent ===")
diagnose_system(np.array([[1.,2.],[2.,4.]]),
                np.array([3., 7.]))

print("\n=== Unique ===")
diagnose_system(np.array([[1.,0.],[0.,1.]]),
                np.array([3., 2.]))

print("\n=== Underdetermined ===")
diagnose_system(np.array([[1., 2., 3.]]),
                np.array([6.]))

=== Inconsistent ===
Shape:        2 equations, 2 unknowns
rank(A) = 1,   rank([A|b]) = 2
→ INCONSISTENT: no solution

=== Unique ===
Shape:        2 equations, 2 unknowns
rank(A) = 2,   rank([A|b]) = 2
→ UNIQUE solution

=== Underdetermined ===
Shape:        1 equations, 3 unknowns
rank(A) = 1,   rank([A|b]) = 1
→ INFINITELY MANY: null space dimension = 2

The key rule: \(\text{rank}([A|\mathbf{b}]) > \text{rank}(A)\) means inconsistent. \(\text{rank}(A) = n\) means unique. Otherwise infinitely many.

2.2.5 Why This Matters for Each Field

Machine learning. Under-parameterised models (many data, few weights) are generically overdetermined and inconsistent — we use least squares. Over- parameterised models (few data, many weights) are underdetermined — we need regularization to pick one solution from infinitely many. Both cases appear constantly; recognising which you are in shapes every modelling decision.

Computer vision. Homography estimation from point correspondences is a \(2n \times 9\) system for \(n\) point pairs. With \(n = 4\) it is exactly determined (if no point is degenerate); with \(n > 4\) it is overdetermined and we use \(\text{DLT}\) (the linear least-squares method). Fewer than 4 pairs — infinitely many homographies fit.

Robotics. Forward kinematics is unique (joint angles → end-effector pose). Inverse kinematics is generically underdetermined for redundant arms: the null space of the Jacobian encodes the self-motion. Chapter 22 derives this in detail.

2.3 Augmented Matrices and Gaussian Elimination

Gaussian elimination is the oldest algorithm in numerical linear algebra — and the one every production solver ultimately traces back to. Understanding it means understanding why solvers fail, not just how to call them.

2.3.1 The Augmented Matrix

Writing the system \(A\mathbf{x} = \mathbf{b}\) as an augmented matrix \([A \mid \mathbf{b}]\) groups all the information into one object:

\[\begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix} \mathbf{x} = \begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix} \quad\longrightarrow\quad \left(\begin{array}{ccc|c} a_{11} & a_{12} & a_{13} & b_1 \\ a_{21} & a_{22} & a_{23} & b_2 \\ a_{31} & a_{32} & a_{33} & b_3 \end{array}\right)\]

Row operations on \([A \mid \mathbf{b}]\) transform the system but do not change its solution set. The three legal operations are:

Operation	Notation	Why it preserves solutions
Swap rows \(i\) and \(j\)	\(R_i \leftrightarrow R_j\)	Re-orders equations — same constraints
Scale row \(i\) by \(c \neq 0\)	\(R_i \leftarrow c R_i\)	Multiplying both sides by \(c\)
Add \(c\) times row \(j\) to row \(i\)	\(R_i \leftarrow R_i + c R_j\)	Adding a multiple of one equation to another

2.3.2 Forward Elimination — Step by Step

The goal is to introduce zeros below each pivot (the leading nonzero in each row), converting \([A \mid \mathbf{b}]\) to row echelon form (REF).

Example system:

\[\begin{pmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 8 \\ -11 \\ -3 \end{pmatrix}\]

import numpy as np

def forward_eliminate(aug):
    """Gaussian elimination with partial pivoting. Returns row echelon form."""
    M = aug.astype(float).copy()
    m, n = M.shape          # m rows, n columns (last is augmented)
    pivot_row = 0

    for col in range(n - 1):           # for each variable column
        # Partial pivoting: swap in the largest magnitude in this column
        best = np.argmax(np.abs(M[pivot_row:, col])) + pivot_row
        M[[pivot_row, best]] = M[[best, pivot_row]]

        if abs(M[pivot_row, col]) < 1e-12:
            continue                    # skip: column is zero below pivot

        for row in range(pivot_row + 1, m):
            factor = M[row, col] / M[pivot_row, col]
            M[row] -= factor * M[pivot_row]

        pivot_row += 1

    return M

A = np.array([[ 2.,  1., -1.],
              [-3., -1.,  2.],
              [-2.,  1.,  2.]])
b = np.array([8., -11., -3.])
aug = np.column_stack([A, b])

print("Original augmented matrix:")
print(aug)

ref = forward_eliminate(aug)
print("\nRow echelon form:")
print(ref.round(6))

Original augmented matrix:
[[  2.   1.  -1.   8.]
 [ -3.  -1.   2. -11.]
 [ -2.   1.   2.  -3.]]

Row echelon form:
[[ -3.        -1.         2.       -11.      ]
 [  0.         1.666667   0.666667   4.333333]
 [  0.         0.         0.2       -0.2     ]]

After forward elimination the matrix is upper triangular. The solution is recovered by back substitution — solve for the last variable, substitute upward.

def back_substitute(ref):
    """Back substitution on row echelon form. Returns solution vector."""
    m, n = ref.shape
    nvars = n - 1
    x = np.zeros(nvars)

    for i in range(nvars - 1, -1, -1):
        if abs(ref[i, i]) < 1e-12:
            raise ValueError(f"Zero pivot at row {i}: system is singular or underdetermined")
        x[i] = (ref[i, -1] - ref[i, i+1:nvars] @ x[i+1:nvars]) / ref[i, i]

    return x

x = back_substitute(ref)
print(f"Solution: {x}")
print(f"Verify:   A @ x = {A @ x}  (should equal {b})")

Solution: [ 2.  3. -1.]
Verify:   A @ x = [  8. -11.  -3.]  (should equal [  8. -11.  -3.])

Expected output: \(\mathbf{x} = [2, 3, -1]^T\), and \(A\mathbf{x} = [8, -11, -3]^T\).

2.3.3 Partial Pivoting — Why It Matters

Naive elimination without pivoting can divide by near-zero entries, amplifying floating-point errors catastrophically.

import numpy as np

# Without pivoting: tiny pivot causes large error
A_bad = np.array([[1e-16, 1.],
                  [1.,    1.]])
b_bad = np.array([1., 2.])

# "Manual" elimination without pivoting
aug = np.column_stack([A_bad, b_bad]).astype(float)
factor = aug[1, 0] / aug[0, 0]        # divides by 1e-16  ← catastrophic
aug[1] -= factor * aug[0]
print("Without pivoting, row 1 after elimination:")
print(aug[1])                          # coefficient may be 0 due to cancellation

# With pivoting (numpy.linalg.solve uses partial pivoting internally)
x_good = np.linalg.solve(A_bad, b_bad)
print(f"\nnp.linalg.solve (with pivoting): {x_good}")
print(f"Residual: {np.linalg.norm(A_bad @ x_good - b_bad):.2e}")

Without pivoting, row 1 after elimination:
[ 0.e+00 -1.e+16 -1.e+16]

np.linalg.solve (with pivoting): [1. 1.]
Residual: 0.00e+00

numpy.linalg.solve uses LAPACK’s dgesv, which applies partial pivoting at every step. This is why you should never roll your own elimination for production use — but you should understand why pivoting exists.

2.3.4 Counting Operations

Gaussian elimination on an \(n \times n\) system costs roughly \(\frac{2}{3}n^3\) floating-point operations (flops). This cubic scaling is why:

A \(100 \times 100\) system solves in microseconds.
A \(10000 \times 10000\) system takes about a second.
A \(10^6 \times 10^6\) dense system is infeasible — sparse methods (Ch 12) are required.

import numpy as np
import time

for n in [100, 500, 1000, 2000]:
    rng = np.random.default_rng(0)
    A = rng.standard_normal((n, n))
    b = rng.standard_normal(n)
    t0 = time.perf_counter()
    x = np.linalg.solve(A, b)
    dt = time.perf_counter() - t0
    print(f"n={n:5d}:  {dt*1000:.2f} ms   "
          f"residual={np.linalg.norm(A@x-b):.1e}")

n=  100:  17.14 ms   residual=5.1e-13
n=  500:  73.37 ms   residual=1.0e-11
n= 1000:  29.25 ms   residual=9.7e-12
n= 2000:  58.21 ms   residual=3.7e-10

The residual stays near machine epsilon (\(\approx 10^{-15}\)) for all sizes as long as \(A\) is well-conditioned — a testament to how well-implemented partial pivoting is.

2.3.5 What Elimination Reveals

Beyond the solution, elimination reveals the structure of the system:

The number of pivots equals \(\text{rank}(A)\) — the number of independent equations.
Zero rows in the augmented form with a nonzero right-hand side signal inconsistency.
Zero rows with zero right-hand side signal free variables (infinitely many solutions).

Chapter 3 formalises this into the machinery of reduced row echelon form, rank, and the four fundamental subspaces.

2.4 Solving Systems in Code

NumPy exposes three functions for solving \(A\mathbf{x} = \mathbf{b}\). Choosing the wrong one — or using the right one without checking its output — is a recurring source of silent bugs.

2.4.1 The Three Functions and When to Use Them

Function	Use when	Notes
`np.linalg.solve(A, b)`	\(A\) is square and you believe it is nonsingular	Fastest; raises `LinAlgError` if exactly singular
`np.linalg.lstsq(A, b)`	\(A\) is rectangular, overdetermined, or possibly rank-deficient	Always returns something; check residuals
`scipy.linalg.solve(A, b)`	Square system, want more control (pivoting options, symmetric flag)	More options than NumPy version

import numpy as np
import scipy.linalg

A = np.array([[3., 1., 0.],
              [1., 4., 1.],
              [0., 1., 3.]])
b = np.array([1., 2., 3.])

# Method 1: numpy solve (square, nonsingular)
x1 = np.linalg.solve(A, b)

# Method 2: lstsq (works for any shape)
x2, residuals, rank, sv = np.linalg.lstsq(A, b, rcond=None)

# Method 3: scipy solve (identical result, but more options available)
x3 = scipy.linalg.solve(A, b)

print(f"np.solve:   {x1.round(6)}")
print(f"np.lstsq:   {x2.round(6)}")
print(f"scipy:      {x3.round(6)}")
print(f"All agree:  {np.allclose(x1, x2) and np.allclose(x1, x3)}")

np.solve:   [0.266667 0.2      0.933333]
np.lstsq:   [0.266667 0.2      0.933333]
scipy:      [0.266667 0.2      0.933333]
All agree:  True

2.4.2 What lstsq Returns — and What to Check

np.linalg.lstsq returns a 4-tuple. Each element tells you something about the system:

import numpy as np

# Overdetermined system: 5 equations, 2 unknowns
rng = np.random.default_rng(42)
A = rng.standard_normal((5, 2))
b_true = np.array([1.0, 2.0])
b = A @ b_true + 0.05 * rng.standard_normal(5)   # add small noise

x, residuals, rank, singular_values = np.linalg.lstsq(A, b, rcond=None)

print(f"Solution x:          {x.round(4)}")
print(f"Residuals:           {residuals}")   # sum of squared residuals
print(f"Rank:                {rank}")
print(f"Singular values:     {singular_values.round(4)}")
print(f"Condition number:    {singular_values[0]/singular_values[-1]:.2f}")
print(f"\nManual residual:     {np.linalg.norm(A @ x - b)**2:.6f}")

Solution x:          [1.0292 1.9697]
Residuals:           [0.00437096]
Rank:                2
Singular values:     [2.7181 1.2574]
Condition number:    2.16

Manual residual:     0.004371

Warning: residuals is empty (shape (0,)) when \(A\) does not have full column rank, or when \(m \leq n\). Always compute the residual manually with np.linalg.norm(A @ x - b) if you are not certain the system is overdetermined and full-rank.

2.4.3 The Silent Wrong Answer

np.linalg.solve does not warn you when \(A\) is ill-conditioned. It will return a number — a plausible-looking one — that can be completely wrong.

import numpy as np

# Hilbert matrix: famously ill-conditioned
def hilbert(n):
    i, j = np.meshgrid(np.arange(1, n+1), np.arange(1, n+1))
    return 1.0 / (i + j - 1)

for n in [4, 8, 12]:
    H  = hilbert(n)
    x_true = np.ones(n)
    b  = H @ x_true            # exact right-hand side for solution = all-ones

    x_computed = np.linalg.solve(H, b)
    kappa      = np.linalg.cond(H)
    error      = np.linalg.norm(x_computed - x_true)

    print(f"n={n:2d}  κ(H)={kappa:.2e}  ||x_comp - x_true||={error:.2e}")

n= 4  κ(H)=1.55e+04  ||x_comp - x_true||=8.27e-14
n= 8  κ(H)=1.53e+10  ||x_comp - x_true||=1.73e-07
n=12  κ(H)=1.76e+16  ||x_comp - x_true||=1.13e+00

For \(n = 12\), the condition number exceeds \(10^{16}\) — beyond machine precision. The computed solution has no significant digits. Yet np.linalg.solve returns without error.

Defensive pattern:

import numpy as np

def safe_solve(A, b, cond_threshold=1e12):
    """Solve A x = b with condition number guard."""
    kappa = np.linalg.cond(A)
    if kappa > cond_threshold:
        import warnings
        warnings.warn(f"Ill-conditioned system: κ = {kappa:.2e}. "
                      f"Solution may be unreliable.", stacklevel=2)
    x = np.linalg.solve(A, b)
    residual = np.linalg.norm(A @ x - b)
    return x, kappa, residual

x, kappa, res = safe_solve(hilbert(10), np.ones(10))
print(f"κ = {kappa:.2e},  residual = {res:.2e}")

κ = 1.60e+13,  residual = 2.44e-10

C:\Users\james.choi\AppData\Local\Temp\ipykernel_15824\1476246802.py:14: UserWarning: Ill-conditioned system: κ = 1.60e+13. Solution may be unreliable.
  x, kappa, res = safe_solve(hilbert(10), np.ones(10))

2.4.4 Solving Multiple Right-Hand Sides Efficiently

When you need \(A\mathbf{x}_j = \mathbf{b}_j\) for many vectors \(\mathbf{b}_j\) with the same \(A\), factorising \(A\) once and reusing is far faster than calling solve repeatedly.

import numpy as np
import scipy.linalg
import time

n  = 500
rhs_count = 200
rng = np.random.default_rng(0)
A = rng.standard_normal((n, n))
B = rng.standard_normal((n, rhs_count))   # 200 right-hand sides

# Naive: solve 200 times
t0 = time.perf_counter()
X_naive = np.column_stack([np.linalg.solve(A, B[:, j]) for j in range(rhs_count)])
t_naive = time.perf_counter() - t0

# Smart: LU factorisation once, then forward/back substitution 200 times
t0 = time.perf_counter()
lu, piv = scipy.linalg.lu_factor(A)
X_smart = scipy.linalg.lu_solve((lu, piv), B)
t_smart = time.perf_counter() - t0

print(f"Naive (200× solve):  {t_naive*1000:.1f} ms")
print(f"LU once + solve:     {t_smart*1000:.1f} ms")
print(f"Speed-up:            {t_naive/t_smart:.1f}×")
print(f"Results match:       {np.allclose(X_naive, X_smart)}")

Naive (200× solve):  3117.0 ms
LU once + solve:     16.7 ms
Speed-up:            186.5×
Results match:       True

scipy.linalg.lu_factor computes the LU decomposition once; lu_solve applies it cheaply. Chapter 12 covers LU decomposition in depth.

2.4.5 Solving Sparse Systems

When \(A\) has mostly zeros (common in FEM, graph problems, and SLAM), dense solvers waste memory and time. Use scipy.sparse:

import numpy as np
import scipy.sparse
import scipy.sparse.linalg

# Build a sparse tridiagonal matrix (common in 1D PDEs)
n = 1000
diag_main  = 4.0 * np.ones(n)
diag_off   = -1.0 * np.ones(n - 1)

A_sparse = scipy.sparse.diags(
    [diag_off, diag_main, diag_off], [-1, 0, 1],
    format='csr'
)
b = np.ones(n)

x = scipy.sparse.linalg.spsolve(A_sparse, b)

# Verify
print(f"Shape:    {A_sparse.shape}")
print(f"NNZ:      {A_sparse.nnz} (vs {n*n} dense entries)")
print(f"Residual: {np.linalg.norm(A_sparse @ x - b):.2e}")

Shape:    (1000, 1000)
NNZ:      2998 (vs 1000000 dense entries)
Residual: 9.61e-16

Sparse solvers are introduced for context here; Part IV and the robotics application chapters use them in earnest.

2.4.6 Decision Tree: Which Solver?

Is A square?
├─ YES → Is κ(A) well-conditioned (< 1e10)?
│        ├─ YES → np.linalg.solve (fastest)
│        └─ NO  → investigate the physics; consider regularization (Ch 17)
└─ NO  → Use np.linalg.lstsq
         ├─ m > n (overdetermined): minimises ||Ax - b||
         ├─ m < n (underdetermined): returns minimum-norm solution
         └─ Check rank; if rank < n, null space is non-trivial (Ch 3)

Is A sparse (>90% zeros)?
└─ YES → scipy.sparse.linalg.spsolve (or iterative solvers, Ch 12)

Do you have many right-hand sides with same A?
└─ YES → scipy.linalg.lu_factor once, then lu_solve (faster)

2.5 Application: Multi-Sensor Position Fusion

A mobile robot carries three ultrasonic range sensors mounted at known positions on a wall. Each sensor reports its distance to the robot. The task: fuse the three range measurements into a single position estimate.

This is a canonical instance of \(A\mathbf{x} = \mathbf{b}\) — and it immediately raises the overdetermined question when you add a fourth sensor for redundancy.

2.5.1 Problem Setup

Three sensors are at fixed positions \((s_x^{(i)}, s_y^{(i)})\) along the wall. Each reports range \(r_i\) to the robot at unknown position \((p_x, p_y)\).

The range equation is:

\[(p_x - s_x^{(i)})^2 + (p_y - s_y^{(i)})^2 = r_i^2\]

Expanding and rearranging yields a linear system by subtracting the equation for sensor \(i\) from sensor 1:

\[2(s_x^{(1)} - s_x^{(i)}) p_x + 2(s_y^{(1)} - s_y^{(i)}) p_y = r_i^2 - r_1^2 + \|\mathbf{s}_1\|^2 - \|\mathbf{s}_i\|^2\]

This linearisation is the standard trick in time-of-arrival (ToA) positioning.

import numpy as np

# Sensor positions (metres)
sensors = np.array([
    [0.0,  0.0],   # sensor 1 at origin
    [4.0,  0.0],   # sensor 2 to the right
    [2.0,  3.0],   # sensor 3 above
])

# True robot position (unknown to the solver)
p_true = np.array([2.5, 1.2])

# True ranges (+ small measurement noise)
rng = np.random.default_rng(7)
ranges_true = np.linalg.norm(sensors - p_true, axis=1)
noise_std   = 0.02   # 2 cm noise
ranges      = ranges_true + rng.normal(0, noise_std, size=3)

print("Sensor positions:"); print(sensors)
print(f"\nTrue position:    {p_true}")
print(f"True ranges:      {ranges_true.round(4)}")
print(f"Measured ranges:  {ranges.round(4)}")

Sensor positions:
[[0. 0.]
 [4. 0.]
 [2. 3.]]

True position:    [2.5 1.2]
True ranges:      [2.7731 1.9209 1.8682]
Measured ranges:  [2.7731 1.9269 1.8627]

2.5.2 Building and Solving the Linear System

import numpy as np

def build_linear_system(sensors, ranges):
    """
    Linearise range equations by differencing against sensor 0.
    Returns (A, b) for the 2D position solve.
    """
    s0, r0 = sensors[0], ranges[0]
    rows_A, rows_b = [], []

    for si, ri in zip(sensors[1:], ranges[1:]):
        # 2*(s0 - si) @ p = r_i^2 - r_0^2 + ||s0||^2 - ||si||^2
        row_a = 2 * (s0 - si)
        row_b = ri**2 - r0**2 + np.dot(s0, s0) - np.dot(si, si)
        rows_A.append(row_a)
        rows_b.append(row_b)

    return np.array(rows_A), np.array(rows_b)

A, b = build_linear_system(sensors, ranges)
print("A (coefficient matrix):")
print(A)
print("\nb (right-hand side):")
print(b.round(4))

# Three sensors give 2 difference equations, 2 unknowns → unique solve
p_est = np.linalg.solve(A, b)
error = np.linalg.norm(p_est - p_true)

print(f"\nEstimated position:  {p_est.round(4)}")
print(f"True position:       {p_true}")
print(f"Position error:      {error*100:.2f} cm")

A (coefficient matrix):
[[-8.  0.]
 [-4. -6.]]

b (right-hand side):
[-19.9771 -17.2206]

Estimated position:  [2.4971 1.2053]
True position:       [2.5 1.2]
Position error:      0.61 cm

2.5.3 Adding a Fourth Sensor — the Overdetermined Case

With four sensors we have three difference equations for two unknowns. The system is overdetermined — no exact solution exists because of measurement noise. We switch to least squares.

import numpy as np

# Add a fourth sensor
sensors4 = np.vstack([sensors, [0.0, 3.0]])   # sensor 4 at (0, 3)
ranges4_true = np.linalg.norm(sensors4 - p_true, axis=1)
ranges4      = ranges4_true + rng.normal(0, noise_std, size=4)

A4, b4 = build_linear_system(sensors4, ranges4)
print(f"System shape: {A4.shape}  (3 equations, 2 unknowns → overdetermined)")

# Least-squares solve
p_ls, residuals, rank, sv = np.linalg.lstsq(A4, b4, rcond=None)
error_ls = np.linalg.norm(p_ls - p_true)

print(f"\nLeast-squares estimate: {p_ls.round(4)}")
print(f"True position:          {p_true}")
print(f"Error (4 sensors):      {error_ls*100:.2f} cm")
print(f"Error (3 sensors):      {error*100:.2f} cm")
print(f"Rank of A4:             {rank}")

System shape: (3, 2)  (3 equations, 2 unknowns → overdetermined)

Least-squares estimate: [2.4952 1.1907]
True position:          [2.5 1.2]
Error (4 sensors):      1.05 cm
Error (3 sensors):      0.61 cm
Rank of A4:             2

The least-squares estimate is more accurate than the exact 2-sensor solve because it averages out measurement noise across more sensors.

2.5.4 Visualising the Fusion

import numpy as np
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(7, 6))

theta  = np.linspace(0, 2*np.pi, 300)
colors = ['#4a90d9', '#2ecc71', '#9b59b6', '#e67e22']

# Draw range circles for all 4 sensors
for i, (s, r, col) in enumerate(zip(sensors4, ranges4, colors)):
    ax.plot(s[0] + r*np.cos(theta), s[1] + r*np.sin(theta),
            color=col, lw=1.2, alpha=0.7, label=f'Sensor {i+1} range')
    ax.scatter(*s, color=col, s=80, zorder=5, marker='s')
    ax.text(s[0]+0.1, s[1]+0.1, f'S{i+1}', color=col, fontsize=9)

# True and estimated positions
ax.scatter(*p_true, color='black', s=120, zorder=10, marker='*',
           label='True position')
ax.scatter(*p_ls, color='tomato', s=80, zorder=10, marker='x',
           linewidths=2, label=f'LS estimate (err={error_ls*100:.1f} cm)')

ax.set_xlim(-1, 5); ax.set_ylim(-1, 4.5)
ax.set_aspect('equal')
ax.set_xlabel('x (m)'); ax.set_ylabel('y (m)')
ax.set_title('Multi-sensor position fusion — overdetermined least squares', pad=10)
ax.legend(fontsize=8)
ax.grid(True, alpha=0.2)
plt.tight_layout()
plt.show()

2.5.5 Noise Sensitivity — Fewer vs. More Sensors

The panels below show how the position estimate degrades as noise increases, and how adding more sensors mitigates that degradation.

import numpy as np
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 2, figsize=(11, 9))
theta  = np.linspace(0, 2*np.pi, 300)
colors = ['#4a90d9', '#2ecc71', '#9b59b6', '#e67e22']

configs = [
    (2, 0.02,  axes[0, 0]),   # 2 sensors, low noise
    (2, 0.10,  axes[0, 1]),   # 2 sensors, high noise
    (4, 0.02,  axes[1, 0]),   # 4 sensors, low noise
    (4, 0.10,  axes[1, 1]),   # 4 sensors, high noise
]

for n_sens, noise, ax in configs:
    rng_w = np.random.default_rng(0)
    sens  = sensors4[:n_sens]
    r_t   = np.linalg.norm(sens - p_true, axis=1)
    r_m   = r_t + rng_w.normal(0, noise, size=n_sens)

    A_w, b_w = build_linear_system(sens, r_m)
    p_w, _, _, _ = np.linalg.lstsq(A_w, b_w, rcond=None)
    err = np.linalg.norm(p_w - p_true) * 100   # cm

    for i, (s, r) in enumerate(zip(sens, r_m)):
        ax.plot(s[0] + r*np.cos(theta), s[1] + r*np.sin(theta),
                color=colors[i], lw=1, alpha=0.6)
        ax.scatter(*s, color=colors[i], s=50, marker='s', zorder=5)

    ax.scatter(*p_true, color='black',  s=100, marker='*', zorder=6,
               label='true')
    ax.scatter(*p_w,    color='tomato', s=80,  marker='x', linewidths=2,
               zorder=6, label=f'est. (err={err:.1f} cm)')

    ax.set_xlim(-1, 5); ax.set_ylim(-1, 4.5)
    ax.set_aspect('equal')
    ax.set_title(f'{n_sens} sensors,  noise = {noise*100:.0f} cm → err = {err:.1f} cm',
                 fontsize=10)
    ax.legend(fontsize=8)
    ax.grid(True, alpha=0.2)

fig.suptitle('Effect of sensor count and noise level on position accuracy',
             fontsize=12)
fig.tight_layout()
plt.show()

2.5.6 Key Takeaways

Range-only positioning linearises to \(A\mathbf{x} = \mathbf{b}\) by differencing sensor equations — a standard trick in ToA, UWB, and GPS.
With exactly as many constraints as unknowns, np.linalg.solve gives the exact (noise-free) answer.
Adding more sensors creates an overdetermined system; np.linalg.lstsq automatically finds the best-fit position — more sensors means less noise sensitivity.
The condition number of \(A\) depends on the geometry of the sensors: sensors clustered together have nearly parallel rows and a high condition number. Spread-out sensors give a well-conditioned problem.

2.6 Exercises

2.1 Consider the system: \[\begin{pmatrix} 2 & 6 \\ 1 & 3 \end{pmatrix} \mathbf{x} = \begin{pmatrix} 8 \\ 4 \end{pmatrix}\]

Without calling any solver: (a) What are the normal vectors to each hyperplane? (b) Are the hyperplanes parallel? (c) What does this tell you about the number of solutions?

2.2 You call np.linalg.solve(A, b) on a \(3 \times 3\) system and get a result. You compute np.linalg.cond(A) and get 3.2e14. Should you trust the result? What would you do instead?

2.3 Write down the augmented matrix for the following system, perform one step of forward elimination by hand (eliminate \(x_1\) from rows 2 and 3), and identify the new pivot:

\[\begin{aligned} x_1 + 2x_2 + x_3 &= 5 \\ 2x_1 + 4x_2 + 3x_3 &= 11 \\ x_1 + x_2 + 2x_3 &= 6 \end{aligned}\]

2.4 A robotics student writes this code to fuse four range sensors:

A, b = build_linear_system(sensors4, ranges4)
x = np.linalg.solve(A, b)

This raises LinAlgError: Last 2 dimensions of the array must be square. Why? What should they use instead?

2.5 You are estimating a 2D position from three sensors. After solving with lstsq you get a residual of 2.4 metres, but your sensors are accurate to 5 cm. List at least three possible causes for the large residual and how you would diagnose each.

2.6 (Harder) Prove that the solution set of \(A\mathbf{x} = \mathbf{b}\), when non-empty, is an affine subspace (a flat). Specifically, show that if \(\mathbf{x}_1\) and \(\mathbf{x}_2\) are both solutions, then so is \(\mathbf{x}_1 + t(\mathbf{x}_2 - \mathbf{x}_1)\) for any scalar \(t\).

2.7 You have a \(10 \times 10\) Hilbert matrix \(H\) and solve \(H\mathbf{x} = \mathbf{b}\) with np.linalg.solve. The residual ||H @ x - b|| is \(10^{-11}\), but the error ||x - x_true|| is \(10^{3}\). How is this possible? What does it imply for interpreting residuals alone?

2.7 Solutions

Solution 2.1

Normal vectors are the rows of \(A\): \(\mathbf{a}_1 = [2, 6]^T\) and \(\mathbf{a}_2 = [1, 3]^T\).
Check whether one is a scalar multiple of the other: \([2, 6] = 2 \cdot [1, 3]\) — yes, they are parallel (proportional normal vectors).
Parallel hyperplanes either coincide (infinitely many solutions) or are distinct and non-intersecting (no solution). Checking: row 1 is \(2x_1 + 6x_2 = 8\), row 2 is \(x_1 + 3x_2 = 4\), which is exactly half of row 1 (\(8/2 = 4\)). So both equations describe the same line — infinitely many solutions.

import numpy as np
A = np.array([[2., 6.], [1., 3.]])
b = np.array([8., 4.])
Ab = np.column_stack([A, b])
print(f"rank(A)  = {np.linalg.matrix_rank(A)}")
print(f"rank(Ab) = {np.linalg.matrix_rank(Ab)}")
# rank(A) == rank(Ab) == 1  →  infinitely many solutions

rank(A)  = 1
rank(Ab) = 1

Solution 2.2

No, you should not trust the result. A condition number of \(3.2 \times 10^{14}\) means roughly 14–15 digits of precision are consumed by the conditioning — but double-precision arithmetic only carries about 15–16 significant digits. The computed solution may have zero correct digits.

What to do:

Understand the physics. A near-singular matrix means your constraints are nearly linearly dependent — some measurements are almost redundant. Fix the problem setup if possible.
Regularize. Add Tikhonov regularization: solve \((A^T A + \lambda I)\mathbf{x} = A^T \mathbf{b}\) for a small \(\lambda\). Chapter 17 covers this.
Use SVD. np.linalg.lstsq with rcond set to a threshold drops near-zero singular values and returns a truncated solution. Chapter 18 covers the SVD approach in depth.
At minimum, report the condition number to the caller so they know the result is unreliable.

import numpy as np

def hilbert(n):
    i, j = np.meshgrid(np.arange(1, n+1), np.arange(1, n+1))
    return 1.0 / (i + j - 1)

H = hilbert(10)
b = H @ np.ones(10)
x = np.linalg.solve(H, b)
kappa = np.linalg.cond(H)
print(f"κ = {kappa:.2e}")
print(f"Residual: {np.linalg.norm(H @ x - b):.2e}  ← looks fine")
print(f"Error:    {np.linalg.norm(x - np.ones(10)):.2e}  ← actually terrible")

κ = 1.60e+13
Residual: 5.66e-16  ← looks fine
Error:    2.74e-04  ← actually terrible

This is the “silent wrong answer” pattern: the residual looks fine, but the solution is garbage. High condition number is the only warning.

Solution 2.3

Augmented matrix:

\[\left(\begin{array}{ccc|c} 1 & 2 & 1 & 5 \\ 2 & 4 & 3 & 11 \\ 1 & 1 & 2 & 6 \end{array}\right)\]

Pivot is \(a_{11} = 1\).

\(R_2 \leftarrow R_2 - 2R_1\): \([2-2, 4-4, 3-2, 11-10] = [0, 0, 1, 1]\)
\(R_3 \leftarrow R_3 - 1 \cdot R_1\): \([1-1, 1-2, 2-1, 6-5] = [0, -1, 1, 1]\)

After one step:

\[\left(\begin{array}{ccc|c} 1 & 2 & 1 & 5 \\ 0 & 0 & 1 & 1 \\ 0 & -1 & 1 & 1 \end{array}\right)\]

The new pivot should be in column 2. Row 2 has a zero in column 2, so we swap rows 2 and 3:

\[\left(\begin{array}{ccc|c} 1 & 2 & 1 & 5 \\ 0 & -1 & 1 & 1 \\ 0 & 0 & 1 & 1 \end{array}\right)\]

New pivot: \(a_{22} = -1\). Note that column 2 of the original row 2 was zero — this is a zero pivot, triggering a row swap. Partial pivoting exists precisely to handle this.

import numpy as np

aug = np.array([[1., 2., 1., 5.],
                [2., 4., 3., 11.],
                [1., 1., 2., 6.]])

aug[1] -= 2 * aug[0]
aug[2] -= 1 * aug[0]
print("After step 1:")
print(aug)
# Swap rows 1 and 2 (0-indexed)
aug[[1, 2]] = aug[[2, 1]]
print("\nAfter pivot swap:")
print(aug)

After step 1:
[[ 1.  2.  1.  5.]
 [ 0.  0.  1.  1.]
 [ 0. -1.  1.  1.]]

After pivot swap:
[[ 1.  2.  1.  5.]
 [ 0. -1.  1.  1.]
 [ 0.  0.  1.  1.]]

Solution 2.4

build_linear_system differences each sensor against sensor 0, producing \(n_\text{sensors} - 1\) equations. With 4 sensors the output is a \(3 \times 2\) matrix — not square. np.linalg.solve requires a square matrix.

The correct call is:

import numpy as np

A4, b4 = build_linear_system(sensors4, ranges4)
print(f"A4 shape: {A4.shape}")   # (3, 2) — overdetermined

# Use lstsq, not solve
p_est, residuals, rank, sv = np.linalg.lstsq(A4, b4, rcond=None)
print(f"Estimated position: {p_est.round(4)}")

A4 shape: (3, 2)
Estimated position: [2.4952 1.1907]

General rule: if \(A\) is not square, always use lstsq. If \(A\) is square but you’re not sure it’s full-rank, check the condition number first.

Solution 2.5

A 2.4 m residual from 5 cm sensors implies the model is wrong, not just noisy. Possible causes:

Wrong sensor positions. If sensors contains stale calibration values and a sensor has shifted, the geometry is inconsistent. Diagnose: physically remeasure sensor positions and compare.
Wrong range model. The linearisation assumes range = Euclidean distance. If sensors report travel time and there is a reflective surface creating multipath, ranges will be systematically biased. Diagnose: plot residuals vs. range — a systematic trend suggests model mismatch.
Unit mismatch. Ranges in metres vs. sensor positions in centimetres, or vice versa. Diagnose: check that ranges and sensors use the same unit.
Non-linear regime. The linearisation is exact for ToA-derived ranges but approximate if the sensors report a non-Euclidean quantity. Diagnose: compare the linearised \(A\mathbf{x}\) prediction to a non-linear residual.
Wrong sensor index. The subtraction in build_linear_system differences sensor \(i\) against sensor 0. If sensors are stored in a different order than assumed, the geometry is scrambled. Diagnose: print sensors and ranges side by side and verify they correspond.

Solution 2.6

Let \(\mathbf{x}_1\) and \(\mathbf{x}_2\) both satisfy \(A\mathbf{x} = \mathbf{b}\). Define \(\mathbf{x}(t) = \mathbf{x}_1 + t(\mathbf{x}_2 - \mathbf{x}_1) = (1-t)\mathbf{x}_1 + t\mathbf{x}_2\).

Apply \(A\):

\[A\mathbf{x}(t) = (1-t)A\mathbf{x}_1 + t A\mathbf{x}_2 = (1-t)\mathbf{b} + t\mathbf{b} = \mathbf{b}\]

So \(\mathbf{x}(t)\) satisfies \(A\mathbf{x} = \mathbf{b}\) for every scalar \(t\). This means the solution set is closed under the operation “form a line through any two points” — the definition of an affine subspace.

The direction of this affine subspace is the null space of \(A\): the difference \(\mathbf{x}_2 - \mathbf{x}_1\) satisfies \(A(\mathbf{x}_2 - \mathbf{x}_1) = \mathbf{b} - \mathbf{b} = \mathbf{0}\), so it lies in \(\text{Null}(A)\).

Solution 2.7

This is the condition number effect in its starkest form. The residual \(\|H\mathbf{x} - \mathbf{b}\|\) measures how well \(\mathbf{x}\) satisfies the equations. The error \(\|\mathbf{x} - \mathbf{x}_\text{true}\|\) measures how close \(\mathbf{x}\) is to the true answer.

For an ill-conditioned system they can be wildly different. The relationship is:

\[\frac{\|\mathbf{x} - \mathbf{x}_\text{true}\|}{\|\mathbf{x}_\text{true}\|} \leq \kappa(H) \cdot \frac{\|H\mathbf{x} - \mathbf{b}\|}{\|\mathbf{b}\|}\]

With \(\kappa(H) \approx 10^{13}\) for the \(10 \times 10\) Hilbert matrix:

\[\text{relative error} \lesssim 10^{13} \times \frac{10^{-11}}{\|\mathbf{b}\|} \approx 10^{13} \times 10^{-11} = 100\]

A relative error of 100 means the solution could be off by 100× — which matches the \(10^3\) error observed when \(\mathbf{x}_\text{true} \sim 1\).

The implication: a small residual is necessary but not sufficient for a good solution. With a high condition number you can have a tiny residual and a useless solution simultaneously. Always check the condition number, not just the residual.