6  Linear Transformations and Their Matrices

6.1 Definition and Consequences of Linearity

A linear transformation is a function \(T : \mathbb{R}^n \to \mathbb{R}^m\) that respects two operations — scaling and addition:

\[T(c\,\mathbf{u} + \mathbf{v}) = c\,T(\mathbf{u}) + T(\mathbf{v}) \quad \text{for all } \mathbf{u}, \mathbf{v} \in \mathbb{R}^n,\ c \in \mathbb{R}\]

This single rule (often stated as two separate axioms) encodes everything a linear map can do. And it completely excludes things a linear map cannot do — translate, curve, or rescale differently in different places.


6.1.1 Why This Definition Matters

The linearity condition is not a technicality — it is the reason matrices exist.

Claim: Every linear map \(T : \mathbb{R}^n \to \mathbb{R}^m\) is completely determined by where it sends the \(n\) standard basis vectors \(\mathbf{e}_1, \ldots, \mathbf{e}_n\).

Proof sketch: Any vector \(\mathbf{x} = x_1\mathbf{e}_1 + \cdots + x_n\mathbf{e}_n\), so by linearity:

\[T(\mathbf{x}) = x_1\,T(\mathbf{e}_1) + \cdots + x_n\,T(\mathbf{e}_n)\]

Stack the output vectors \(T(\mathbf{e}_j)\) as columns of a matrix \(A\), and this becomes exactly \(A\mathbf{x}\). The matrix is the transformation.


6.1.2 The Two Axioms (Unpacked)

Setting \(c = 1\) gives additivity: \(T(\mathbf{u}+\mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v})\).

Setting \(\mathbf{v} = \mathbf{0}\) gives homogeneity: \(T(c\,\mathbf{u}) = c\,T(\mathbf{u})\).

Together they say: linearity commutes with linear combinations.

import numpy as np

# Numerical check: verify both axioms for T(x) = A @ x
A = np.array([[2., -1.],
              [1.,  3.],
              [0.,  2.]])   # shape (3, 2)

rng = np.random.default_rng(42)
u = rng.standard_normal(2)   # shape (2,)
v = rng.standard_normal(2)   # shape (2,)
c = 3.7

# Additivity: T(u + v) == T(u) + T(v)
lhs_add = A @ (u + v)
rhs_add = A @ u + A @ v
print("Additivity:   T(u+v) == T(u)+T(v) ?", np.allclose(lhs_add, rhs_add))

# Homogeneity: T(cu) == c T(u)
lhs_hom = A @ (c * u)
rhs_hom = c * (A @ u)
print("Homogeneity:  T(cu)  == c T(u)    ?", np.allclose(lhs_hom, rhs_hom))
Additivity:   T(u+v) == T(u)+T(v) ? True
Homogeneity:  T(cu)  == c T(u)    ? True

6.1.3 Immediate Consequences

The zero vector is fixed.

Setting \(c = 0\): \(T(\mathbf{0}) = 0 \cdot T(\mathbf{u}) = \mathbf{0}\).

Every linear map sends the origin to the origin. If a function moves the origin, it cannot be linear — that is the key distinction between linear and affine maps (Chapter 7).

Linear combinations are preserved.

\[T\!\left(\sum_{i=1}^k c_i \mathbf{v}_i\right) = \sum_{i=1}^k c_i\, T(\mathbf{v}_i)\]

This is what makes linear algebra so powerful for machine learning: if you know what \(T\) does to a basis, you know what it does to every vector.

import numpy as np
import matplotlib.pyplot as plt

A = np.array([[1.5, -0.5],
              [0.5,  1.0]])   # shape (2, 2)

# Show T(0) = 0
print("T(0) =", A @ np.zeros(2))

# Visualise: unit square corners before and after T
corners   = np.array([[0,0],[1,0],[1,1],[0,1],[0,0]], dtype=float)  # shape (5, 2)
corners_t = (A @ corners.T).T                                        # shape (5, 2)

fig, axes = plt.subplots(1, 2, figsize=(9, 4))

for ax, pts, title, col in zip(
        axes,
        [corners, corners_t],
        ['Input: unit square', r'Output: $T$(unit square)'],
        ['#4a90d9', 'tomato']):
    ax.fill(pts[:-1, 0], pts[:-1, 1], alpha=0.25, color=col)
    ax.plot(pts[:, 0], pts[:, 1], color=col, lw=2)
    ax.set_xlim(-0.5, 2.5); ax.set_ylim(-0.5, 2.5)
    ax.set_aspect('equal')
    ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.grid(True, alpha=0.2)
    ax.set_title(title, fontsize=10)
    ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$')

fig.suptitle('A linear map deforms the unit square — origin stays fixed',
             fontsize=11)
fig.tight_layout()
plt.show()
T(0) = [0. 0.]


6.1.4 Is It Linear? A Quick Test

Before applying any of the machinery in this chapter, check the two conditions:

Function \(T(\mathbf{0}) = \mathbf{0}\)? Additive? Linear?
\(T(\mathbf{x}) = A\mathbf{x}\) Yes Yes Yes
\(T(\mathbf{x}) = A\mathbf{x} + \mathbf{b}\), \(\mathbf{b}\neq\mathbf{0}\) No No No
\(T(x) = x^2\) Yes No No
\(T(x) = 3x\) Yes Yes Yes

The second row is an affine map — the topic of Chapter 7.


6.1.5 Linearity in Machine Learning

Linear transformations appear everywhere in ML:

  • Fully connected layers: \(\mathbf{y} = W\mathbf{x}\) is a linear map; adding the bias \(\mathbf{b}\) makes it affine.
  • PCA: the projection onto principal components is linear (Chapter 20).
  • Kalman filter: the observation model \(\mathbf{z} = H\mathbf{x}\) is linear (Chapter 23).

The nonlinearity in neural networks comes entirely from the activation functions, not from the weight matrices themselves. Understanding what weight matrices can and cannot do is the foundation for understanding deep networks.

6.2 A Visual Zoo of 2D Transformations

Every \(2 \times 2\) matrix performs one of a handful of recognisable operations on the plane. Knowing this zoo by sight lets you look at a matrix and immediately know its geometric effect — an essential skill for robotics, computer vision, and understanding why algorithms behave the way they do.

We will watch each transformation deform the same object: a unit square and a set of equally-spaced grid lines.


6.2.1 The Transformations at a Glance

Name Matrix Effect
Identity \(I\) Nothing
Scaling \(\begin{bmatrix}s_x&0\\0&s_y\end{bmatrix}\) Stretch along axes
Rotation by \(\theta\) \(\begin{bmatrix}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{bmatrix}\) Rotate counter-clockwise
Reflection (x-axis) \(\begin{bmatrix}1&0\\0&-1\end{bmatrix}\) Flip vertically
Horizontal shear \(\begin{bmatrix}1&k\\0&1\end{bmatrix}\) Slide rows horizontally
Projection onto x-axis \(\begin{bmatrix}1&0\\0&0\end{bmatrix}\) Collapse to a line

6.2.2 Visualising All Six on One Figure

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

theta = np.pi / 4   # 45 degrees

transforms = {
    'Identity':       np.eye(2),
    'Scale (2×, 0.5×)': np.array([[2., 0.],[0., 0.5]]),
    'Rotation 45°':   np.array([[np.cos(theta), -np.sin(theta)],
                                 [np.sin(theta),  np.cos(theta)]]),
    'Reflection (x)': np.array([[1.,  0.],[0., -1.]]),
    'Horiz. shear':   np.array([[1., 1.5],[0.,  1.]]),
    'Projection→x':   np.array([[1.,  0.],[0.,  0.]]),
}

# Unit square vertices (closed polygon)
sq = np.array([[0,0],[1,0],[1,1],[0,1],[0,0]], dtype=float)  # shape (5, 2)

# Grid lines in input space
lines_h = [np.array([[0, y],[1, y]]) for y in np.linspace(0, 1, 6)]
lines_v = [np.array([[x, 0],[x, 1]]) for x in np.linspace(0, 1, 6)]
grid_lines = lines_h + lines_v

fig, axes = plt.subplots(2, 3, figsize=(12, 7))
axes = axes.flatten()

for ax, (name, A) in zip(axes, transforms.items()):
    # Transform grid
    for ln in grid_lines:
        lt = (A @ ln.T).T
        ax.plot(lt[:, 0], lt[:, 1], color='#4a90d9', lw=0.8, alpha=0.6)
    # Transform square outline
    sq_t = (A @ sq.T).T
    ax.fill(sq_t[:-1, 0], sq_t[:-1, 1], alpha=0.20, color='tomato')
    ax.plot(sq_t[:, 0], sq_t[:, 1], color='tomato', lw=2)
    # Original square (faint)
    ax.plot(sq[:, 0], sq[:, 1], color='#aaaaaa', lw=1, ls='--')

    ax.set_xlim(-1.8, 2.5); ax.set_ylim(-1.8, 2.5)
    ax.set_aspect('equal')
    ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.grid(True, alpha=0.2)
    ax.set_title(name, fontsize=10)
    ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$')

fig.suptitle('Six fundamental 2D linear transformations', fontsize=13)
fig.tight_layout()
plt.show()


6.2.3 What to Notice

Scaling stretches independently along each axis. If \(s_x = s_y\) the square becomes a bigger or smaller square (uniform scale); if \(s_x \neq s_y\) it becomes a rectangle.

Rotation preserves distances and angles — it is the prototype of an isometry. The determinant is \(\cos^2\theta + \sin^2\theta = 1\).

Reflection also preserves distances, but it reverses orientation — the counter-clockwise order of vertices becomes clockwise. The determinant is \(-1\).

Shear keeps one axis fixed while sliding the other. A horizontal shear of \(k\) maps \(\mathbf{e}_2 \mapsto [k, 1]^T\). The square tilts into a parallelogram but area is preserved (\(\det = 1\)).

Projection destroys information: the whole plane collapses onto a line. The determinant is 0, the transformation is not invertible.


6.2.4 Determinant = Signed Area Scale Factor

import numpy as np

transforms = {
    'Identity':           np.eye(2),
    'Scale (2, 0.5)':     np.array([[2., 0.],[0., 0.5]]),
    'Rotation 45°':       np.array([[np.cos(np.pi/4), -np.sin(np.pi/4)],
                                     [np.sin(np.pi/4),  np.cos(np.pi/4)]]),
    'Reflection':         np.array([[1., 0.],[0., -1.]]),
    'Horiz. shear (1.5)': np.array([[1., 1.5],[0., 1.]]),
    'Projection→x':       np.array([[1., 0.],[0., 0.]]),
}

print(f"{'Transform':<25}  det")
print("-" * 35)
for name, A in transforms.items():
    d = np.linalg.det(A)
    print(f"{name:<25}  {d:+.4f}")
Transform                  det
-----------------------------------
Identity                   +1.0000
Scale (2, 0.5)             +1.0000
Rotation 45°               +1.0000
Reflection                 -1.0000
Horiz. shear (1.5)         +1.0000
Projection→x               +0.0000

The unit square has area 1. After transformation, the area becomes \(|\det A|\). When \(\det A < 0\), orientation is reversed (left-hand becomes right-hand). When \(\det A = 0\), the square collapses to a lower-dimensional object — volume is destroyed and the map is not invertible.


6.2.5 Interactive: Build Your Own Transformation

import numpy as np
import matplotlib.pyplot as plt

def show_transform(A, title='Custom'):
    """Show what matrix A does to the unit square and a smiley grid."""
    sq = np.array([[0,0],[1,0],[1,1],[0,1],[0,0]], dtype=float)

    # A denser grid to show distortion
    s = np.linspace(0, 1, 12)
    grid_pts = np.array([[u, v] for u in s for v in s])   # shape (144, 2)
    mapped   = (A @ grid_pts.T).T                          # shape (144, 2)

    sq_t = (A @ sq.T).T
    fig, axes = plt.subplots(1, 2, figsize=(9, 4))

    axes[0].scatter(grid_pts[:,0], grid_pts[:,1], s=8, color='#4a90d9', alpha=0.6)
    axes[0].plot(sq[:,0], sq[:,1], 'k-', lw=1.5)
    axes[0].set_title('Input', fontsize=10)

    axes[1].scatter(mapped[:,0], mapped[:,1], s=8, color='tomato', alpha=0.6)
    axes[1].plot(sq_t[:,0], sq_t[:,1], 'k-', lw=1.5)
    axes[1].set_title(f'Output: {title}  (det={np.linalg.det(A):.3f})', fontsize=10)

    for ax in axes:
        ax.set_aspect('equal')
        ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
        ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
        ax.grid(True, alpha=0.2)
        ax.set_xlim(-2.5, 2.5); ax.set_ylim(-2.5, 2.5)

    fig.tight_layout()
    plt.show()

# Try a combined scale + rotation
theta = np.pi / 6
R = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta),  np.cos(theta)]])
S = np.array([[2., 0.], [0., 0.5]])
show_transform(R @ S, title='Scale then Rotate')

6.3 Reading the Standard Matrix from Column Images

Section 6.1 showed that every linear map is determined by where it sends the basis vectors. This section turns that observation into a recipe for writing down any linear map as a matrix — without having to solve any system of equations.

The recipe: to find the matrix of \(T : \mathbb{R}^n \to \mathbb{R}^m\), apply \(T\) to each standard basis vector \(\mathbf{e}_j\) and use the result as the \(j\)-th column of the matrix:

\[[T] = \begin{bmatrix} T(\mathbf{e}_1) & T(\mathbf{e}_2) & \cdots & T(\mathbf{e}_n) \end{bmatrix}\]

The \(j\)-th column is the image of the \(j\)-th basis vector.


6.3.1 Why This Works

Any \(\mathbf{x} \in \mathbb{R}^n\) decomposes as \(\mathbf{x} = x_1\mathbf{e}_1 + \cdots + x_n\mathbf{e}_n\). Linearity gives:

\[T(\mathbf{x}) = x_1 T(\mathbf{e}_1) + \cdots + x_n T(\mathbf{e}_n) = \underbrace{\begin{bmatrix} T(\mathbf{e}_1) \mid \cdots \mid T(\mathbf{e}_n)\end{bmatrix}}_{A}\mathbf{x}\]

The matrix-vector product is exactly this weighted combination of columns.


6.3.2 Example 1: Rotation by 90°

Where does a counter-clockwise 90° rotation send the basis vectors?

\[T(\mathbf{e}_1) = T\!\begin{pmatrix}1\\0\end{pmatrix} = \begin{pmatrix}0\\1\end{pmatrix}, \qquad T(\mathbf{e}_2) = T\!\begin{pmatrix}0\\1\end{pmatrix} = \begin{pmatrix}-1\\0\end{pmatrix}\]

So the rotation matrix is:

\[R_{90} = \begin{bmatrix}0 & -1 \\ 1 & 0\end{bmatrix}\]

import numpy as np

# 90-degree counter-clockwise rotation
theta = np.pi / 2
R90 = np.array([[np.cos(theta), -np.sin(theta)],   # shape (2, 2)
                [np.sin(theta),  np.cos(theta)]])
R90 = R90.round(10)   # clean up floating point noise

e1 = np.array([1., 0.])   # shape (2,)
e2 = np.array([0., 1.])   # shape (2,)

print("R90 ="); print(R90)
print(f"\nT(e1) = R90 @ e1 = {R90 @ e1}  (column 1)")
print(f"T(e2) = R90 @ e2 = {R90 @ e2}  (column 2)")
R90 =
[[ 0. -1.]
 [ 1.  0.]]

T(e1) = R90 @ e1 = [0. 1.]  (column 1)
T(e2) = R90 @ e2 = [-1.  0.]  (column 2)

6.3.3 Example 2: Projection onto the Line \(y = x\)

The projection of \(\mathbf{v}\) onto the line \(y = x\) has direction \(\hat{\mathbf{u}} = \frac{1}{\sqrt{2}}[1,1]^T\). Using the formula \(P = \hat{\mathbf{u}}\hat{\mathbf{u}}^T\):

\[P = \frac{1}{2}\begin{bmatrix}1\\1\end{bmatrix}\begin{bmatrix}1 & 1\end{bmatrix} = \frac{1}{2}\begin{bmatrix}1&1\\1&1\end{bmatrix}\]

Let us verify via the column-image recipe:

import numpy as np
import matplotlib.pyplot as plt

# Projection matrix onto y = x
u_hat = np.array([1., 1.]) / np.sqrt(2)   # shape (2,)
P = np.outer(u_hat, u_hat)                 # shape (2, 2)
print("P ="); print(P.round(4))

e1 = np.array([1., 0.])
e2 = np.array([0., 1.])
print(f"\nT(e1) = P @ e1 = {P @ e1}  (column 1 of P)")
print(f"T(e2) = P @ e2 = {P @ e2}  (column 2 of P)")

# Visualise
fig, ax = plt.subplots(figsize=(5, 5))
t = np.linspace(-1.5, 1.5, 100)
ax.plot(t, t, color='#4a90d9', lw=1.5, ls='--', label=r'Line $y=x$')

for v, col, label in [(e1, 'tomato', r'$\mathbf{e}_1$'),
                       (e2, '#2ecc71', r'$\mathbf{e}_2$')]:
    pv = P @ v
    ax.annotate('', xy=v, xytext=[0,0],
                arrowprops=dict(arrowstyle='->', color=col, lw=2))
    ax.annotate('', xy=pv, xytext=[0,0],
                arrowprops=dict(arrowstyle='->', color=col, lw=2, ls='dashed'))
    ax.text(v[0]+0.05, v[1]+0.05, label, color=col, fontsize=11)
    ax.text(pv[0]+0.05, pv[1]-0.15, r'$P$'+label, color=col, fontsize=9)

ax.set_xlim(-0.3, 1.4); ax.set_ylim(-0.3, 1.4)
ax.set_aspect('equal')
ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
ax.grid(True, alpha=0.2)
ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$')
ax.set_title(r'Column images of the projection onto $y = x$')
ax.legend(fontsize=9)
plt.tight_layout()
plt.show()
P =
[[0.5 0.5]
 [0.5 0.5]]

T(e1) = P @ e1 = [0.5 0.5]  (column 1 of P)
T(e2) = P @ e2 = [0.5 0.5]  (column 2 of P)


6.3.4 Example 3: Deriving an Unknown Map

Suppose you observe that a linear map \(T : \mathbb{R}^2 \to \mathbb{R}^3\) satisfies \(T([1,0]^T) = [2, -1, 0]^T\) and \(T([0,1]^T) = [1, 3, -2]^T\). Write down its matrix immediately:

import numpy as np

# Observed basis images
T_e1 = np.array([2., -1.,  0.])   # shape (3,)
T_e2 = np.array([1.,  3., -2.])   # shape (3,)

# Standard matrix: stack images as columns
A = np.column_stack([T_e1, T_e2])   # shape (3, 2)
print("Standard matrix A ="); print(A)

# Verify: what does T do to an arbitrary vector x = [3, -1]?
x = np.array([3., -1.])   # shape (2,)
print(f"\nT({x}) = A @ x = {A @ x}")
print(f"Manual: 3*T(e1) + (-1)*T(e2) = {3*T_e1 + (-1)*T_e2}")
Standard matrix A =
[[ 2.  1.]
 [-1.  3.]
 [ 0. -2.]]

T([ 3. -1.]) = A @ x = [ 5. -6.  2.]
Manual: 3*T(e1) + (-1)*T(e2) = [ 5. -6.  2.]

6.3.5 The Recipe in Reverse: Reading a Matrix

Given a matrix \(A\), you can immediately state its action on each basis vector. For example:

\[A = \begin{bmatrix}3 & 0 \\ 1 & 2\end{bmatrix}\]

  • Column 1 = \([3,1]^T\): the x-axis gets mapped to the vector \((3,1)\).
  • Column 2 = \([0,2]^T\): the y-axis gets mapped straight up, scaled by 2.
import numpy as np
import matplotlib.pyplot as plt

A = np.array([[3., 0.],
              [1., 2.]])   # shape (2, 2)

e1 = np.array([1., 0.])
e2 = np.array([0., 1.])

fig, ax = plt.subplots(figsize=(5, 5))

vecs = {'$\\mathbf{e}_1$': (e1, '#4a90d9'),
        '$\\mathbf{e}_2$': (e2, '#2ecc71'),
        '$A\\mathbf{e}_1$': (A @ e1, 'tomato'),
        '$A\\mathbf{e}_2$': (A @ e2, '#e67e22')}

for label, (v, col) in vecs.items():
    ax.annotate('', xy=v, xytext=[0,0],
                arrowprops=dict(arrowstyle='->', color=col, lw=2.5))
    ax.text(v[0]+0.1, v[1]+0.1, label, color=col, fontsize=11)

ax.set_xlim(-0.5, 4); ax.set_ylim(-0.5, 3)
ax.set_aspect('equal')
ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
ax.grid(True, alpha=0.2)
ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$')
ax.set_title('Basis vectors and their images under $A$')
plt.tight_layout()
plt.show()


6.3.6 Summary

Step What you do
1 Apply \(T\) to each \(\mathbf{e}_j\) to get \(T(\mathbf{e}_j)\)
2 Use \(T(\mathbf{e}_j)\) as the \(j\)-th column of \([T]\)
3 Check: \([T]\mathbf{x}\) should equal the direct computation \(T(\mathbf{x})\)

This is more than a trick — it is the definition of the standard matrix. Every \(m \times n\) matrix encodes a linear map from \(\mathbb{R}^n\) to \(\mathbb{R}^m\), and every such linear map has exactly one such matrix.

6.4 Kernel, Image, and the Dimension Theorem

Two subspaces are attached to every linear map \(T : \mathbb{R}^n \to \mathbb{R}^m\):

  • Kernel (null space): inputs that \(T\) maps to zero — \(\ker T = \{\mathbf{x} : T(\mathbf{x}) = \mathbf{0}\}\).
  • Image (column space): all possible outputs — \(\operatorname{im} T = \{T(\mathbf{x}) : \mathbf{x} \in \mathbb{R}^n\}\).

These two subspaces are linked by a beautifully simple equation.


6.4.1 The Rank-Nullity Theorem

\[\dim(\ker T) + \dim(\operatorname{im} T) = n\]

In plain language: the dimension of the input space equals the number of dimensions “killed” (kernel) plus the number “produced” (image). You can’t create more output dimensions than you have input dimensions, and any dimension lost to the kernel is exactly one dimension missing from the image.

For a matrix \(A\) of shape \((m, n)\):

\[\underbrace{\text{nullity}(A)}_{\dim\ker A} + \underbrace{\text{rank}(A)}_{\dim\operatorname{im}A} = n\]


6.4.2 Visualising the Theorem

import numpy as np

# A 3x4 matrix of rank 2
A = np.array([[1., 2., 3., 4.],
              [2., 4., 6., 8.],
              [0., 1., 1., 2.]])   # shape (3, 4)

rank    = np.linalg.matrix_rank(A)
nullity = A.shape[1] - rank

print(f"Shape:   {A.shape}  →  n = {A.shape[1]}")
print(f"rank(A)  = {rank}   (dim of image)")
print(f"nullity  = {nullity}  (dim of kernel)")
print(f"rank + nullity = {rank + nullity}  (= n = {A.shape[1]})")

# Compute null space via SVD
U, s, Vt = np.linalg.svd(A)
print(f"\nSingular values: {s.round(6)}")
null_basis = Vt[rank:]    # shape (nullity, 4)
print(f"\nNull space basis vectors (rows of Vt[rank:]):")
print(null_basis.round(6))
for i, v in enumerate(null_basis):
    print(f"  A @ null_vec_{i+1} = {(A @ v).round(10)}")
Shape:   (3, 4)  →  n = 4
rank(A)  = 2   (dim of image)
nullity  = 2  (dim of kernel)
rank + nullity = 4  (= n = 4)

Singular values: [12.475842  0.594445  0.      ]

Null space basis vectors (rows of Vt[rank:]):
[[ 0.674156  0.124619 -0.674156  0.274768]
 [-0.007697 -0.895908  0.007697  0.444105]]
  A @ null_vec_1 = [0. 0. 0.]
  A @ null_vec_2 = [0. 0. 0.]

6.4.3 The Kernel: What Gets Annihilated?

The kernel tells you when two different inputs give the same output.

If \(\mathbf{x}_p\) is any solution of \(A\mathbf{x} = \mathbf{b}\), then the complete solution is:

\[\mathbf{x} = \mathbf{x}_p + \mathbf{x}_h, \quad \mathbf{x}_h \in \ker A\]

Any vector in the kernel can be added “for free” without changing the output. This is why \(A\mathbf{x} = \mathbf{b}\) has infinitely many solutions when the kernel is non-trivial — you have free parameters.

import numpy as np
import matplotlib.pyplot as plt

# 2x3 matrix: maps R^3 -> R^2, rank 2, nullity 1
A = np.array([[1., 0., 1.],
              [0., 1., 1.]])   # shape (2, 3)

b = np.array([2., 3.])   # shape (2,)
xp, _, _, _ = np.linalg.lstsq(A, b, rcond=None)

# Null space vector (SVD)
_, s, Vt = np.linalg.svd(A)
null_vec = Vt[-1]   # shape (3,)

print(f"Particular solution: xp = {xp.round(4)}")
print(f"Null space vector:   n  = {null_vec.round(4)}")
print(f"A @ xp = {A @ xp}  (= b = {b})")
print(f"A @ null_vec = {(A @ null_vec).round(10)}  (= 0)")

# Show family of solutions
t_vals = np.linspace(-3, 3, 7)
print(f"\nFamily x = xp + t*null_vec, A @ x:")
for t in t_vals:
    x = xp + t * null_vec
    print(f"  t={t:+.1f}  →  x={x.round(3)},  A@x={np.round(A@x,3)}")
Particular solution: xp = [0.3333 1.3333 1.6667]
Null space vector:   n  = [-0.5774 -0.5774  0.5774]
A @ xp = [2. 3.]  (= b = [2. 3.])
A @ null_vec = [0. 0.]  (= 0)

Family x = xp + t*null_vec, A @ x:
  t=-3.0  →  x=[ 2.065  3.065 -0.065],  A@x=[2. 3.]
  t=-2.0  →  x=[1.488 2.488 0.512],  A@x=[2. 3.]
  t=-1.0  →  x=[0.911 1.911 1.089],  A@x=[2. 3.]
  t=+0.0  →  x=[0.333 1.333 1.667],  A@x=[2. 3.]
  t=+1.0  →  x=[-0.244  0.756  2.244],  A@x=[2. 3.]
  t=+2.0  →  x=[-0.821  0.179  2.821],  A@x=[2. 3.]
  t=+3.0  →  x=[-1.399 -0.399  3.399],  A@x=[2. 3.]

6.4.4 The Image: What Can Be Reached?

The image is the column space of \(A\). A vector \(\mathbf{b}\) can be reached by \(T\) if and only if \(\mathbf{b} \in \operatorname{im} T\).

import numpy as np

A = np.array([[1., 0., 1.],
              [0., 1., 1.]])   # shape (2, 3), rank 2

# The image is all of R^2 since rank = m = 2
# Every b in R^2 is reachable
b_test = np.array([5., -2.])   # shape (2,)
x, _, _, _ = np.linalg.lstsq(A, b_test, rcond=None)
residual = np.linalg.norm(A @ x - b_test)
print(f"b = {b_test}")
print(f"Residual after lstsq: {residual:.2e}  → b in im(A)? {'YES' if residual<1e-10 else 'NO'}")

# Now try a rank-deficient matrix: image is a proper subspace of R^3
B = np.array([[1., 2.],
              [2., 4.],
              [3., 6.]])   # shape (3, 2), rank 1

b_in  = np.array([2., 4., 6.])   # = 2 * col1, in im(B)
b_out = np.array([1., 1., 1.])   # not in im(B)

for label, bv in [("b_in ", b_in), ("b_out", b_out)]:
    xv, _, _, _ = np.linalg.lstsq(B, bv, rcond=None)
    res = np.linalg.norm(B @ xv - bv)
    print(f"\n{label} = {bv}:  residual = {res:.2e}  → in im(B)? {'YES' if res<1e-10 else 'NO'}")
b = [ 5. -2.]
Residual after lstsq: 4.44e-16  → b in im(A)? YES

b_in  = [2. 4. 6.]:  residual = 4.97e-16  → in im(B)? YES

b_out = [1. 1. 1.]:  residual = 6.55e-01  → in im(B)? NO

6.4.5 One-to-One, Onto, and Invertibility

Condition Algebraic Geometric
One-to-one (injective) \(\ker T = \{\mathbf{0}\}\) No two inputs share an output
Onto (surjective) \(\operatorname{im} T = \mathbb{R}^m\) Every output is reachable
Invertible (bijective) Both Unique input for every output

For a square matrix (\(n = m\)): - \(\ker T = \{\mathbf{0}\}\) \(\iff\) \(\operatorname{im} T = \mathbb{R}^n\) \(\iff\) \(\det A \neq 0\) \(\iff\) \(A\) is invertible.

These are all the same thing for square matrices.

import numpy as np

cases = {
    'Full rank (invertible)': np.array([[2., 1.],[1., 3.]]),
    'Rank 1 (singular)':      np.array([[1., 2.],[2., 4.]]),
    'Tall (3x2, injective)':  np.array([[1.,0.],[0.,1.],[1.,1.]]),
    'Wide (2x3, onto)':       np.array([[1.,0.,1.],[0.,1.,1.]]),
}

print(f"{'Matrix':<30}  shape   rank  nullity  det")
print("-"*60)
for name, A in cases.items():
    r = np.linalg.matrix_rank(A)
    nul = A.shape[1] - r
    det = np.linalg.det(A) if A.shape[0]==A.shape[1] else float('nan')
    print(f"{name:<30}  {str(A.shape):<8} {r}     {nul}        {det:+.2f}" if not np.isnan(det)
          else f"{name:<30}  {str(A.shape):<8} {r}     {nul}        n/a")
Matrix                          shape   rank  nullity  det
------------------------------------------------------------
Full rank (invertible)          (2, 2)   2     0        +5.00
Rank 1 (singular)               (2, 2)   1     1        +0.00
Tall (3x2, injective)           (3, 2)   2     0        n/a
Wide (2x3, onto)                (2, 3)   2     1        n/a

6.4.6 Why This Matters in Practice

Machine learning: if a weight matrix has a non-trivial null space, different weight vectors produce identical predictions — the model is over-parameterised.

Computer vision: the null space of the camera matrix reveals the camera centre (the 3D point that maps to every image point — Section 10.3).

Robotics: if a robot’s Jacobian has a non-trivial null space, there exist joint-velocity combinations that produce zero end-effector velocity — redundant degrees of freedom.

6.5 Composition = Matrix Multiplication

Applying two linear maps in sequence is another linear map. The matrix of the composed map is the product of the individual matrices. This is not just a notational convenience — it is why matrix multiplication is defined the way it is.

Theorem: If \(S : \mathbb{R}^n \to \mathbb{R}^m\) has matrix \(A\) and \(T : \mathbb{R}^m \to \mathbb{R}^p\) has matrix \(B\), then \(T \circ S : \mathbb{R}^n \to \mathbb{R}^p\) has matrix \(BA\).

Note the order: the rightmost matrix is applied first.


6.5.1 Why Multiplication Is Defined This Way

Apply \(S\) then \(T\) to a vector \(\mathbf{x}\):

\[T(S(\mathbf{x})) = B(A\mathbf{x}) = (BA)\mathbf{x}\]

Matrix multiplication is associativity of function composition. Every rule about matrix products follows from this.


6.5.2 Order Is Critical

\(BA \neq AB\) in general. “Rotate then shear” is a different transformation from “shear then rotate.”

import numpy as np
import matplotlib.pyplot as plt

theta = np.pi / 4
R = np.array([[np.cos(theta), -np.sin(theta)],   # shape (2, 2)
              [np.sin(theta),  np.cos(theta)]])
H = np.array([[1., 1.5],                          # shape (2, 2)
              [0., 1.]])     # horizontal shear

RS = R @ H   # shear first, then rotate   shape (2, 2)
SR = H @ R   # rotate first, then shear   shape (2, 2)

print("R @ H  (shear → rotate):\n", RS.round(4))
print("\nH @ R  (rotate → shear):\n", SR.round(4))
print("\nAre they equal?", np.allclose(RS, SR))

# Visualise the difference
sq = np.array([[0,0],[1,0],[1,1],[0,1],[0,0]], dtype=float)

fig, axes = plt.subplots(1, 3, figsize=(12, 4))

for ax, M, title, col in [
        (axes[0], np.eye(2), 'Original',           '#4a90d9'),
        (axes[1], RS,        'Shear → Rotate (RH)', 'tomato'),
        (axes[2], SR,        'Rotate → Shear (HR)', '#2ecc71')]:
    pts = (M @ sq.T).T
    ax.fill(pts[:-1,0], pts[:-1,1], alpha=0.3, color=col)
    ax.plot(pts[:,0], pts[:,1], color=col, lw=2)
    ax.set_xlim(-2, 2.5); ax.set_ylim(-1.5, 2.5)
    ax.set_aspect('equal')
    ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.grid(True, alpha=0.2)
    ax.set_title(title, fontsize=10)
    ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$')

fig.suptitle('Order matters: $RH \\neq HR$', fontsize=12)
fig.tight_layout()
plt.show()
R @ H  (shear → rotate):
 [[0.7071 0.3536]
 [0.7071 1.7678]]

H @ R  (rotate → shear):
 [[1.7678 0.3536]
 [0.7071 0.7071]]

Are they equal? False


6.5.3 Building Transformations from Primitives

Complex transformations are composed from simple ones. For example, rotation about an arbitrary point \(\mathbf{p}\) in 2D requires three steps:

  1. Translate so \(\mathbf{p}\) moves to the origin (this requires homogeneous coordinates — Chapter 7, but we preview the idea here).
  2. Rotate by \(\theta\).
  3. Translate back.

Using only linear maps, we can build the rotation about the origin:

import numpy as np

def rotation(theta):
    """2x2 counter-clockwise rotation matrix."""
    c, s = np.cos(theta), np.sin(theta)
    return np.array([[c, -s],
                     [s,  c]])   # shape (2, 2)

def scaling(sx, sy):
    return np.array([[sx, 0.],
                     [0., sy]])  # shape (2, 2)

def shear_x(k):
    return np.array([[1., k],
                     [0., 1.]])  # shape (2, 2)

# Example: scale, then rotate, then shear
R  = rotation(np.pi / 3)
S  = scaling(2.0, 0.5)
Hx = shear_x(0.5)

# Composed: shear(rotate(scale(x)))
M = Hx @ R @ S   # shape (2, 2)
print("Composed M = Hx @ R @ S:")
print(M.round(4))
print(f"\ndet(Hx) * det(R) * det(S) = {np.linalg.det(Hx):.4f} * {np.linalg.det(R):.4f} * {np.linalg.det(S):.4f} = {np.linalg.det(M):.4f}")
print(f"det(M) directly = {np.linalg.det(M):.4f}")
Composed M = Hx @ R @ S:
[[ 1.866  -0.308 ]
 [ 1.7321  0.25  ]]

det(Hx) * det(R) * det(S) = 1.0000 * 1.0000 * 1.0000 = 1.0000
det(M) directly = 1.0000

The determinant of a product equals the product of determinants: \(\det(BA) = \det B \cdot \det A\).


6.5.4 Powers of a Transformation

Applying \(T\) to itself \(k\) times gives \(T^k\), represented by \(A^k\):

import numpy as np
import matplotlib.pyplot as plt

# A "spiral" transformation: slight rotation + slight shrink
theta = np.pi / 8
A = 0.85 * np.array([[np.cos(theta), -np.sin(theta)],   # shape (2, 2)
                      [np.sin(theta),  np.cos(theta)]])

# Track a point under repeated application
x0 = np.array([1., 0.])   # shape (2,)
trajectory = [x0.copy()]

x = x0.copy()
for _ in range(30):
    x = A @ x
    trajectory.append(x.copy())

traj = np.array(trajectory)   # shape (31, 2)

fig, ax = plt.subplots(figsize=(5, 5))
ax.plot(traj[:, 0], traj[:, 1], 'o-', color='#4a90d9', ms=4, lw=1.5)
ax.scatter(traj[0, 0], traj[0, 1], color='tomato', s=80, zorder=5, label='Start')
ax.scatter(traj[-1, 0], traj[-1, 1], color='#2ecc71', s=80, zorder=5, label='End')
ax.set_xlim(-1.3, 1.3); ax.set_ylim(-1.3, 1.3)
ax.set_aspect('equal')
ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
ax.grid(True, alpha=0.2)
ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$')
ax.set_title('Repeated application: $A^k\\mathbf{x}_0$ spirals to origin')
ax.legend(fontsize=9)
plt.tight_layout()
plt.show()

The spiral converges to the origin because both eigenvalues of \(A\) have magnitude \(0.85 < 1\). Eigenvalues govern the long-run behaviour of iterated maps — the subject of Chapter 13.


6.5.5 Invertibility and the Inverse Map

If \(A\) is invertible, the inverse transformation \(T^{-1}\) has matrix \(A^{-1}\). Composing them gives the identity:

\[A^{-1}A = AA^{-1} = I\]

For composed maps: \((BA)^{-1} = A^{-1}B^{-1}\) — reverse the order.

import numpy as np

A = np.array([[3., 1.],
              [2., 4.]])   # shape (2, 2)
B = np.array([[1., 2.],
              [0., 1.]])   # shape (2, 2)

BA    = B @ A
BA_inv = np.linalg.solve(BA, np.eye(2))   # (BA)^{-1}
check  = np.linalg.solve(A, np.eye(2)) @ np.linalg.solve(B, np.eye(2))  # A^{-1} B^{-1}

print("(BA)^{-1}:")
print(BA_inv.round(6))
print("\nA^{-1} B^{-1}:")
print(check.round(6))
print("\nEqual?", np.allclose(BA_inv, check))
(BA)^{-1}:
[[ 0.4 -0.9]
 [-0.2  0.7]]

A^{-1} B^{-1}:
[[ 0.4 -0.9]
 [-0.2  0.7]]

Equal? True

6.6 Change of Basis: \([T]_{\mathcal{B}} = P^{-1}[T]_{\mathcal{E}}P\)

The same linear transformation can be represented by different matrices depending on which coordinate system you choose. Choosing a good coordinate system can turn a complicated matrix into a diagonal one — that is the entire idea behind diagonalisation and PCA.


6.6.1 The Setup

Let \(\mathcal{E}\) be the standard basis \(\{\mathbf{e}_1, \ldots, \mathbf{e}_n\}\) and \(\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}\) be a new basis. Define the change-of-basis matrix:

\[P = \begin{bmatrix}\mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_n\end{bmatrix}\]

The columns of \(P\) are the new basis vectors expressed in the old (standard) coordinates.

How coordinates transform: - \(\mathcal{B}\)-coords \(\to\) \(\mathcal{E}\)-coords: multiply by \(P\). - \(\mathcal{E}\)-coords \(\to\) \(\mathcal{B}\)-coords: multiply by \(P^{-1}\).


6.6.2 The Similarity Formula

If \(A = [T]_{\mathcal{E}}\) is the matrix of \(T\) in standard coordinates, then the matrix of the same transformation in basis \(\mathcal{B}\) is:

\[[T]_{\mathcal{B}} = P^{-1} A P\]

This is called a similarity transformation. The two matrices \(A\) and \(P^{-1}AP\) are similar — they represent the same geometric object viewed from different coordinate systems.

import numpy as np

# Transformation T: standard matrix in E-coords
A = np.array([[3., 1.],
              [1., 3.]])   # shape (2, 2)

# New basis B: columns are the new basis vectors
b1 = np.array([1., 1.]) / np.sqrt(2)   # shape (2,)
b2 = np.array([-1., 1.]) / np.sqrt(2)  # shape (2,)
P  = np.column_stack([b1, b2])          # shape (2, 2)

# Matrix of T in B-coords
T_B = np.linalg.solve(P, A @ P)   # = P^{-1} A P,  shape (2, 2)

print("A  = [T]_E:")
print(A)
print("\nP (new basis vectors as columns):")
print(P.round(6))
print("\n[T]_B = P^{-1} A P:")
print(T_B.round(6))
A  = [T]_E:
[[3. 1.]
 [1. 3.]]

P (new basis vectors as columns):
[[ 0.707107 -0.707107]
 [ 0.707107  0.707107]]

[T]_B = P^{-1} A P:
[[4. 0.]
 [0. 2.]]

The basis \(\mathcal{B}\) was chosen to be the eigenvectors of \(A\). In that basis, \(A\) becomes diagonal — the eigenvalues sit on the diagonal and the off-diagonal entries vanish. This is not a coincidence (Chapter 14).


6.6.3 Geometric Interpretation

import numpy as np
import matplotlib.pyplot as plt

# A = shear + scaling
A = np.array([[2., 1.],
              [0., 1.]])   # shape (2, 2)

# A custom basis (not eigenvectors here — just illustration)
P = np.array([[1., 0.5],
              [0., 1.]])   # shape (2, 2)

T_B = np.linalg.solve(P, A @ P)   # shape (2, 2)

# Grid in E-coords, then transform by A
s = np.linspace(-1.5, 1.5, 10)
grid_pts = np.array([[u, v] for u in s for v in s])   # shape (100, 2)

fig, axes = plt.subplots(1, 3, figsize=(13, 4))

def draw_grid(ax, pts, title, col):
    ax.scatter(pts[:,0], pts[:,1], s=10, color=col, alpha=0.5)
    ax.set_aspect('equal')
    ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.grid(True, alpha=0.2)
    ax.set_xlim(-4, 4); ax.set_ylim(-3, 3)
    ax.set_title(title, fontsize=9)

# Standard coords
draw_grid(axes[0], grid_pts, 'Input ($\\mathcal{E}$ coords)', '#4a90d9')

# Applied T in standard coords
out_E = (A @ grid_pts.T).T
draw_grid(axes[1], out_E, 'After $A$ (standard coords)', 'tomato')

# Transform to B-coords, apply T_B, transform back
P_inv = np.linalg.inv(P)
grid_B  = (P_inv @ grid_pts.T).T     # to B-coords
out_B   = (T_B @ grid_B.T).T         # apply T_B
out_back = (P @ out_B.T).T           # back to E-coords
draw_grid(axes[2], out_back, 'Same $T$ via $\\mathcal{B}$ coords', '#2ecc71')

axes[0].set_xlabel('$x_1$'); axes[0].set_ylabel('$x_2$')
for ax in axes[1:]:
    ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$')

fig.suptitle(r'Same transformation $T$, described in two coordinate systems', fontsize=11)
fig.tight_layout()
plt.show()

All three representations describe exactly the same geometric action on the plane. The output of panel 2 and panel 3 are identical — only the path taken to compute it differs.


6.6.4 What Is Preserved Under Similarity?

Similar matrices share key invariants:

Invariant Why it’s preserved
Eigenvalues \(\det(P^{-1}AP - \lambda I) = \det(A - \lambda I)\)
Determinant \(\det(P^{-1}AP) = \det(A)\)
Trace \(\text{tr}(P^{-1}AP) = \text{tr}(A)\)
Rank Column space dimension is basis-independent
import numpy as np

A = np.array([[3., 1.],
              [1., 3.]])   # shape (2, 2)

rng = np.random.default_rng(7)
P   = rng.standard_normal((2, 2))   # shape (2, 2), arbitrary invertible matrix
T_B = np.linalg.solve(P, A @ P)

print(f"det(A)   = {np.linalg.det(A):.6f}")
print(f"det(T_B) = {np.linalg.det(T_B):.6f}")

print(f"\ntrace(A)   = {np.trace(A):.6f}")
print(f"trace(T_B) = {np.trace(T_B):.6f}")

eig_A  = np.sort(np.linalg.eigvals(A).real)
eig_TB = np.sort(np.linalg.eigvals(T_B).real)
print(f"\nEigenvalues of A:   {eig_A.round(6)}")
print(f"Eigenvalues of T_B: {eig_TB.round(6)}")
det(A)   = 8.000000
det(T_B) = 8.000000

trace(A)   = 6.000000
trace(T_B) = 6.000000

Eigenvalues of A:   [2. 4.]
Eigenvalues of T_B: [2. 4.]

6.6.5 Why Change of Basis Matters

Diagonalisation: if \(P\) is built from eigenvectors, \(P^{-1}AP\) is diagonal. Computing \(A^k\) then becomes trivial: just raise the diagonal entries to the \(k\)-th power (Chapter 14).

PCA: the principal components are a change of basis that aligns coordinates with directions of maximum variance (Chapter 20).

Graphics and robotics: every “frame” (body frame, world frame, camera frame) is a basis. Transforming between frames is a change of basis (Chapters 9–10).

6.7 Application: Image Warping

Every 2D linear transformation can be applied directly to a pixel grid to produce a warped image. This is the basis of image rectification in computer vision, texture mapping in graphics, and data augmentation in deep learning.

We will apply each transformation from §6.2 to a test image and visualise the result using scipy.ndimage (which handles the pixel interpolation).


6.7.1 Why Image Warping Is a Linear Transformation

An image is a function \(I : \mathbb{Z}^2 \to \mathbb{R}\) (or \(\mathbb{R}^3\) for colour) mapping pixel coordinates to intensity values.

To warp an image by a \(2\times2\) matrix \(A\):

  1. For each output pixel location \(\mathbf{q}\), compute where it came from: \(\mathbf{p} = A^{-1}\mathbf{q}\) (inverse mapping).
  2. Read the intensity at \(\mathbf{p}\) (with bilinear interpolation for sub-pixel locations).
  3. Assign that intensity to \(\mathbf{q}\).

We always invert and pull from the source — “push” approaches leave holes.


6.7.2 Building a Test Image

import numpy as np
import matplotlib.pyplot as plt

def make_test_image(size=200):
    """Checkerboard + circle test pattern."""
    rng = np.random.default_rng(0)
    img = np.zeros((size, size), dtype=float)

    # Checkerboard
    tile = 20
    for r in range(0, size, tile):
        for c in range(0, size, tile):
            if ((r // tile) + (c // tile)) % 2 == 0:
                img[r:r+tile, c:c+tile] = 0.8

    # Circle
    cy, cx = size // 2, size // 2
    y, x = np.ogrid[:size, :size]
    mask = (x - cx)**2 + (y - cy)**2 < (size // 4)**2
    img[mask] = 0.5

    # Cross hair
    img[cy-2:cy+2, :] = 1.0
    img[:, cx-2:cx+2] = 1.0

    return img

img = make_test_image()
fig, ax = plt.subplots(figsize=(4, 4))
ax.imshow(img, cmap='gray', vmin=0, vmax=1, origin='upper')
ax.set_title('Test image', fontsize=10)
ax.set_xticks([]); ax.set_yticks([])
plt.tight_layout()
plt.show()


6.7.3 Warping by a 2D Linear Transformation

import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import affine_transform

def make_test_image(size=200):
    img = np.zeros((size, size), dtype=float)
    tile = 20
    for r in range(0, size, tile):
        for c in range(0, size, tile):
            if ((r // tile) + (c // tile)) % 2 == 0:
                img[r:r+tile, c:c+tile] = 0.8
    cy, cx = size // 2, size // 2
    y, x = np.ogrid[:size, :size]
    mask = (x - cx)**2 + (y - cy)**2 < (size // 4)**2
    img[mask] = 0.5
    img[cy-2:cy+2, :] = 1.0
    img[:, cx-2:cx+2] = 1.0
    return img

img = make_test_image(200)
size = img.shape[0]
center = size / 2

theta = np.pi / 6

transforms = {
    'Original':         np.eye(2),
    'Rotation 30°':     np.array([[np.cos(theta), -np.sin(theta)],
                                   [np.sin(theta),  np.cos(theta)]]),
    'Scale (1.5, 0.7)': np.array([[1.5, 0.],  [0., 0.7]]),
    'Horiz. shear':     np.array([[1.,  0.5],  [0., 1.]]),
    'Reflection (x)':   np.array([[1.,  0.],   [0., -1.]]),
    'Proj. onto x':     np.array([[1.,  0.],   [0.,  0.]]),
}

fig, axes = plt.subplots(2, 3, figsize=(12, 8))
axes = axes.flatten()

for ax, (name, A) in zip(axes, transforms.items()):
    if name == 'Original':
        warped = img
    else:
        # affine_transform uses the matrix to map OUTPUT coords to INPUT coords
        # so we pass A^{-1} (or the inverse of the rotation/etc.)
        # scipy convention: input_coords = matrix @ output_coords + offset
        # We centre the transform at the image centre
        A_inv = np.linalg.lstsq(A, np.eye(2), rcond=None)[0]
        offset = center - A_inv @ np.array([center, center])
        warped = affine_transform(img, A_inv, offset=offset,
                                  output_shape=img.shape, cval=0.0)

    ax.imshow(warped, cmap='gray', vmin=0, vmax=1, origin='upper')
    ax.set_title(name, fontsize=10)
    ax.set_xticks([]); ax.set_yticks([])

fig.suptitle('2D linear transformations applied to a test image', fontsize=13)
fig.tight_layout()
plt.show()


6.7.4 Applying the Zoo to a Real Photo

Below we apply the transforms to a synthetic photo-like image.

import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import affine_transform

# Generate a synthetic 'photo-like' image using gradients and noise
rng = np.random.default_rng(12)
size = 200

# Gradient background
y_idx, x_idx = np.mgrid[:size, :size]
photo = (x_idx / size * 0.6 + y_idx / size * 0.4
         + rng.standard_normal((size, size)) * 0.05)
photo = np.clip(photo, 0, 1)

# Add some structure: boxes
photo[40:80, 40:80]   += 0.3
photo[100:160, 60:140] -= 0.2
photo = np.clip(photo, 0, 1)

center = size / 2
theta  = np.pi / 8
A      = np.array([[np.cos(theta), -np.sin(theta)],
                   [np.sin(theta),  np.cos(theta)]])
A_inv  = A.T   # rotation inverse = transpose
offset = center - A_inv @ np.array([center, center])
warped = affine_transform(photo, A_inv, offset=offset,
                          output_shape=photo.shape, cval=0.0)

fig, axes = plt.subplots(1, 2, figsize=(9, 4))
axes[0].imshow(photo, cmap='gray', vmin=0, vmax=1)
axes[0].set_title('Original', fontsize=10)
axes[1].imshow(warped, cmap='gray', vmin=0, vmax=1)
axes[1].set_title('Rotated 22.5°', fontsize=10)
for ax in axes:
    ax.set_xticks([]); ax.set_yticks([])
fig.tight_layout()
plt.show()


6.7.5 Key Takeaways

  1. Inverse mapping is essential: map output pixels back to input pixels to avoid holes. Always use \(A^{-1}\), not \(A\), when warping.

  2. Projection matrices (\(\det = 0\)) cannot be inverted — they collapse the image onto a line. In scipy, the lstsq pseudo-inverse gives the least-squares pre-image, but information is permanently lost.

  3. The centre matters: to rotate about the image centre rather than the top-left corner, shift by \(\mathbf{c} - A^{-1}\mathbf{c}\) where \(\mathbf{c}\) is the centre pixel.

  4. Composition is multiplication: applying two warps \(A\) then \(B\) is the single warp \(BA\), which is one pass over the image rather than two.


6.7.6 Exploration

Kaggle dataset: Street View House Numbers (SVHN) or MNIST.

Try augmenting the training set by applying random combinations of: - Rotation (±15°) - Horizontal shear (±0.2) - Scaling (0.9–1.1 on each axis independently)

Measure how augmentation affects validation accuracy when training a simple logistic regression or small CNN. Does each transformation type help equally? Which hurts if applied too aggressively?

6.8 Exercises


6.1 Determine whether each function is a linear transformation. If not, identify which axiom it violates.

  1. \(T : \mathbb{R}^2 \to \mathbb{R}^2\), \(T([x,y]^T) = [x + 1,\; y]^T\)
  2. \(T : \mathbb{R}^2 \to \mathbb{R}^2\), \(T([x,y]^T) = [2x - y,\; x + 3y]^T\)
  3. \(T : \mathbb{R}^2 \to \mathbb{R}\), \(T([x,y]^T) = xy\)
  4. \(T : \mathbb{R}^3 \to \mathbb{R}^3\), \(T(\mathbf{x}) = \mathbf{0}\) (the zero map)

6.2 (Standard Matrix) Find the \(2 \times 2\) standard matrix for each transformation:

  1. \(T\) reflects vectors across the line \(y = -x\).
  2. \(T\) scales by factor 3 in the \(x\)-direction and contracts by factor \(\tfrac{1}{2}\) in the \(y\)-direction.
  3. \(T\) projects every vector onto the \(x\)-axis.

Verify each matrix by checking \(T(\mathbf{e}_1)\) and \(T(\mathbf{e}_2)\) explicitly.


6.3 (Kernel and Image) For the matrix \[A = \begin{bmatrix}1 & 2 & 0 \\ 0 & 0 & 1 \\ 2 & 4 & 0\end{bmatrix}\]

  1. Find the kernel of \(A\). What is its dimension?
  2. Find a basis for the image (column space) of \(A\).
  3. Verify the rank-nullity theorem.

6.4 (Piercing) A \(3 \times 5\) matrix \(A\) has rank 3.

  1. What is the dimension of \(\ker A\)?
  2. Is every \(\mathbf{b} \in \mathbb{R}^3\) in the image of \(A\)? Explain.
  3. Can \(A\mathbf{x} = \mathbf{b}\) have a unique solution? Explain.

6.5 (Composition) Let \[A = \begin{bmatrix}0 & -1 \\ 1 & 0\end{bmatrix} \quad \text{(rotation 90°)}, \qquad B = \begin{bmatrix}1 & 0 \\ 0 & -1\end{bmatrix} \quad \text{(reflection across } x\text{-axis)}\]

  1. Compute \(AB\) and \(BA\). Are they equal?
  2. What single geometric operation does \(AB\) represent?
  3. What does applying \(B\) twice do? Confirm with \((B^2 = I)\).

6.6 (Piercing) “A rotation matrix \(R_\theta\) and a scaling matrix \(S = sI\) always commute: \(R_\theta S = S R_\theta\).” Is this true? Does it hold for a non-uniform scaling \(S = \text{diag}(s_x, s_y)\) with \(s_x \neq s_y\)?


6.7 (Change of Basis) Let \[A = \begin{bmatrix}4 & 1 \\ 2 & 3\end{bmatrix}, \qquad P = \begin{bmatrix}1 & -1 \\ 1 & 2\end{bmatrix}\]

  1. Compute \(B = P^{-1}AP\).
  2. Verify that \(\det(A) = \det(B)\) and \(\text{tr}(A) = \text{tr}(B)\).
  3. Verify that \(A\) and \(B\) have the same eigenvalues.

6.8 (Harder) Prove that if \(T : \mathbb{R}^n \to \mathbb{R}^m\) is linear and \(\ker T = \{\mathbf{0}\}\), then \(T\) maps linearly independent sets to linearly independent sets.

Hint: suppose \(\{T(\mathbf{v}_1), \ldots, T(\mathbf{v}_k)\}\) is linearly dependent. What can you conclude about \(\{v_1, \ldots, v_k\}\)?


6.9 Solutions


Solution 6.1

(a) Not linear. \(T(\mathbf{0}) = [1, 0]^T \neq \mathbf{0}\). Fails the fixed-origin test.

(b) Linear.

import numpy as np

A = np.array([[2., -1.],
              [1.,  3.]])   # shape (2, 2)

rng = np.random.default_rng(1)
u   = rng.standard_normal(2)
v   = rng.standard_normal(2)
c   = 2.5

print("Additivity:", np.allclose(A@(u+v), A@u+A@v))
print("Homogeneity:", np.allclose(A@(c*u), c*(A@u)))
Additivity: True
Homogeneity: True

(c) Not linear. \(T([1,0]^T + [0,1]^T) = T([1,1]^T) = 1 \neq 0 + 0 = T([1,0]^T) + T([0,1]^T)\). Fails additivity.

(d) Linear. \(T(\mathbf{0}) = \mathbf{0}\) ✓, \(T(c\mathbf{u}+\mathbf{v}) = \mathbf{0} = c\mathbf{0}+\mathbf{0}\) ✓.


Solution 6.2

(a) Reflection across \(y = -x\): \(\mathbf{e}_1 = [1,0]^T \mapsto [0,-1]^T\); \(\mathbf{e}_2 = [0,1]^T \mapsto [-1,0]^T\). Matrix: \(\begin{bmatrix}0&-1\\-1&0\end{bmatrix}\).

(b) \(\mathbf{e}_1 \mapsto [3,0]^T\), \(\mathbf{e}_2 \mapsto [0,\tfrac{1}{2}]^T\). Matrix: \(\text{diag}(3, \tfrac{1}{2})\).

(c) \(\mathbf{e}_1 \mapsto [1,0]^T\), \(\mathbf{e}_2 \mapsto [0,0]^T\). Matrix: \(\begin{bmatrix}1&0\\0&0\end{bmatrix}\).

import numpy as np

matrices = {
    'Reflect y=-x':    np.array([[0., -1.], [-1., 0.]]),
    'Scale (3, 0.5)':  np.array([[3.,  0.], [ 0., 0.5]]),
    'Project to x':    np.array([[1.,  0.], [ 0., 0.]]),
}

e1 = np.array([1., 0.])
e2 = np.array([0., 1.])

for name, M in matrices.items():
    print(f"{name}:  T(e1)={M@e1},  T(e2)={M@e2}")
Reflect y=-x:  T(e1)=[ 0. -1.],  T(e2)=[-1.  0.]
Scale (3, 0.5):  T(e1)=[3. 0.],  T(e2)=[0.  0.5]
Project to x:  T(e1)=[1. 0.],  T(e2)=[0. 0.]

Solution 6.3

import numpy as np

A = np.array([[1., 2., 0.],
              [0., 0., 1.],
              [2., 4., 0.]])   # shape (3, 3)

rank    = np.linalg.matrix_rank(A)
nullity = A.shape[1] - rank
print(f"rank = {rank},  nullity = {nullity}  (rank + nullity = {rank+nullity} = n = {A.shape[1]})")

# Null space via SVD
U, s, Vt = np.linalg.svd(A)
print(f"Singular values: {s.round(6)}")
null_basis = Vt[rank:]   # shape (nullity, 3)
print(f"\n(a) Null space basis:\n{null_basis.round(6)}")
for v in null_basis:
    print(f"    A @ {v.round(4)} = {(A@v).round(10)}")

# Image basis: columns corresponding to pivot columns
print(f"\n(b) Image (column space) basis — pivot columns of A:")
print(f"  col1 = {A[:,0]},  col3 = {A[:,2]}")

print(f"\n(c) rank({rank}) + nullity({nullity}) = {rank+nullity} = n = 3  ✓")
rank = 2,  nullity = 1  (rank + nullity = 3 = n = 3)
Singular values: [5. 1. 0.]

(a) Null space basis:
[[ 0.894427 -0.447214 -0.      ]]
    A @ [ 0.8944 -0.4472 -0.    ] = [0. 0. 0.]

(b) Image (column space) basis — pivot columns of A:
  col1 = [1. 0. 2.],  col3 = [0. 1. 0.]

(c) rank(2) + nullity(1) = 3 = n = 3  ✓

Solution 6.4

(a) By rank-nullity: \(\dim(\ker A) = n - \text{rank}(A) = 5 - 3 = \mathbf{2}\).

(b) Yes. \(\dim(\operatorname{im}A) = \text{rank}(A) = 3 = m\), so the image is all of \(\mathbb{R}^3\). Every \(\mathbf{b}\) is reachable.

(c) No. The kernel is 2-dimensional, so whenever a particular solution \(\mathbf{x}_p\) exists, infinitely many solutions exist: \(\mathbf{x}_p + \ker A\). A unique solution requires \(\ker A = \{\mathbf{0}\}\), which requires nullity = 0, which requires rank = \(n = 5\). But rank ≤ \(m = 3 < 5\), so uniqueness is impossible.


Solution 6.5

import numpy as np

A = np.array([[ 0., -1.],
              [ 1.,  0.]])   # rotation 90°,     shape (2, 2)
B = np.array([[ 1.,  0.],
              [ 0., -1.]])   # reflection x-axis, shape (2, 2)

AB = A @ B
BA = B @ A

print("AB ="); print(AB)
print("\nBA ="); print(BA)
print("\nAB == BA?", np.allclose(AB, BA))

# What is AB?
print("\nAB applied to e1:", AB @ np.array([1.,0.]))
print("AB applied to e2:", AB @ np.array([0.,1.]))
# AB sends e1->[0,1] and e2->[1,0]: that's reflection across y=x

print("\nB^2 = I?", np.allclose(B @ B, np.eye(2)))
AB =
[[0. 1.]
 [1. 0.]]

BA =
[[ 0. -1.]
 [-1.  0.]]

AB == BA? False

AB applied to e1: [0. 1.]
AB applied to e2: [1. 0.]

B^2 = I? True

(a) \(AB \neq BA\).

(b) \(AB\): \(\mathbf{e}_1 \mapsto [0,1]^T\), \(\mathbf{e}_2 \mapsto [1,0]^T\)reflection across \(y = x\).

(c) \(B^2 = I\) ✓. Reflecting twice returns to the original.


Solution 6.6

import numpy as np

theta = np.pi / 5
R = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta),  np.cos(theta)]])

# Uniform scaling: S = sI commutes with everything
s = 3.0
S_uniform = s * np.eye(2)
print("Uniform scale commutes:", np.allclose(R @ S_uniform, S_uniform @ R))

# Non-uniform scaling: S = diag(sx, sy), sx != sy
S_nonuniform = np.diag([2., 0.5])
print("Non-uniform scale commutes:", np.allclose(R @ S_nonuniform, S_nonuniform @ R))
print("\nR @ S_nonuniform:\n", (R @ S_nonuniform).round(4))
print("S_nonuniform @ R:\n", (S_nonuniform @ R).round(4))
Uniform scale commutes: True
Non-uniform scale commutes: False

R @ S_nonuniform:
 [[ 1.618  -0.2939]
 [ 1.1756  0.4045]]
S_nonuniform @ R:
 [[ 1.618  -1.1756]
 [ 0.2939  0.4045]]

True for uniform scaling (\(S = sI\)) since scalars commute with all matrices. False for non-uniform scaling — the non-uniform stretch distorts the rotated axes differently depending on order.


Solution 6.7

import numpy as np

A = np.array([[4., 1.],
              [2., 3.]])   # shape (2, 2)
P = np.array([[1., -1.],
              [1.,  2.]])  # shape (2, 2)

B = np.linalg.solve(P, A @ P)   # P^{-1} A P

print("B = P^{-1} A P:"); print(B.round(6))

print(f"\n(b) det(A)={np.linalg.det(A):.4f},  det(B)={np.linalg.det(B):.4f}")
print(f"    tr(A) ={np.trace(A):.4f},   tr(B) ={np.trace(B):.4f}")

eA = np.sort(np.linalg.eigvals(A).real)
eB = np.sort(np.linalg.eigvals(B).real)
print(f"\n(c) eig(A)={eA.round(6)},  eig(B)={eB.round(6)}")
print(f"    Same? {np.allclose(eA, eB)}")
B = P^{-1} A P:
[[5. 0.]
 [0. 2.]]

(b) det(A)=10.0000,  det(B)=10.0000
    tr(A) =7.0000,   tr(B) =7.0000

(c) eig(A)=[2. 5.],  eig(B)=[2. 5.]
    Same? True

Solution 6.8

Proof. Suppose \(\{T(\mathbf{v}_1), \ldots, T(\mathbf{v}_k)\}\) is linearly dependent. Then there exist scalars \(c_1, \ldots, c_k\), not all zero, such that:

\[c_1 T(\mathbf{v}_1) + \cdots + c_k T(\mathbf{v}_k) = \mathbf{0}\]

By linearity:

\[T(c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k) = \mathbf{0}\]

So \(c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k \in \ker T\).

Since \(\ker T = \{\mathbf{0}\}\), we must have \(c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k = \mathbf{0}\).

But since not all \(c_i\) are zero, this means \(\{\mathbf{v}_1, \ldots, \mathbf{v}_k\}\) is linearly dependent — contradicting our hypothesis.

Therefore \(\{T(\mathbf{v}_1), \ldots, T(\mathbf{v}_k)\}\) must be linearly independent. \(\blacksquare\)