7  Affine Transformations and Homogeneous Coordinates

7.1 Why Linear Maps Cannot Translate

Chapter 6 showed that every linear transformation fixes the origin: \(T(\mathbf{0}) = \mathbf{0}\). This is not a limitation of a particular matrix — it is a theorem. And it means linear maps are fundamentally incapable of representing one of the most basic operations in graphics and robotics: translation.


7.1.1 The Problem in One Line

Suppose you want a transformation that moves every point two units to the right: \(T([x, y]^T) = [x+2,\; y]^T\).

Where does the origin go? \(T([0,0]^T) = [2, 0]^T \neq [0,0]^T\).

This immediately disqualifies \(T\) from being linear.

import numpy as np

# For any 2x2 matrix A, A @ [0,0] = [0,0] always.
rng = np.random.default_rng(0)
A   = rng.standard_normal((2, 2))   # shape (2, 2) — arbitrary matrix

origin = np.array([0., 0.])   # shape (2,)
print(f"A @ origin = {A @ origin}  (always zero, for ANY 2x2 matrix)")
print("\nBut translation requires T(origin) = [2, 0], not [0, 0].")
print("Conclusion: translation CANNOT be represented by a 2x2 matrix.")
A @ origin = [0. 0.]  (always zero, for ANY 2x2 matrix)

But translation requires T(origin) = [2, 0], not [0, 0].
Conclusion: translation CANNOT be represented by a 2x2 matrix.

7.1.2 Why This Matters in Practice

Consider a robot arm: you want to describe the pose of the end-effector relative to the base. The pose has a rotation component (how the gripper is oriented) and a translation component (where in space the gripper sits).

If you could only use linear maps, you could rotate the gripper but never move it away from the origin. Every robot would be frozen at the world centre.

Similarly, in computer graphics, moving a rendered object from one location to another is a translation. Games, animations, and camera motion all require it.


7.1.3 What Goes Wrong Algebraically

Suppose we insist on a \(2 \times 2\) matrix \(A\) satisfying \(A\mathbf{x} = \mathbf{x} + \mathbf{t}\) for all \(\mathbf{x}\).

Evaluate at \(\mathbf{x} = \mathbf{0}\): \(A\mathbf{0} = \mathbf{0} + \mathbf{t} = \mathbf{t}\).

But \(A\mathbf{0} = \mathbf{0}\) for any matrix. So \(\mathbf{t} = \mathbf{0}\) — the only “translation” a linear map can perform is no translation at all.

import numpy as np

t = np.array([2., 1.])   # shape (2,) — desired translation

def translate(x):
    return x + t

u = np.array([1., 3.])   # shape (2,)
v = np.array([2., 0.])   # shape (2,)
c = 4.0

# Additivity test: T(u+v) should equal T(u)+T(v)
lhs = translate(u + v)
rhs = translate(u) + translate(v)
print(f"Additivity:  T(u+v)={lhs},  T(u)+T(v)={rhs},  equal={np.allclose(lhs,rhs)}")

# Homogeneity test: T(cu) should equal c*T(u)
lhs2 = translate(c * u)
rhs2 = c * translate(u)
print(f"Homogeneity: T(cu) ={lhs2},  c*T(u)   ={rhs2},  equal={np.allclose(lhs2,rhs2)}")
Additivity:  T(u+v)=[5. 4.],  T(u)+T(v)=[7. 5.],  equal=False
Homogeneity: T(cu) =[ 6. 13.],  c*T(u)   =[12. 16.],  equal=False

Translation breaks both axioms. It is a fundamentally different kind of operation.


7.1.4 The Three Broken Properties

Property Linear requires Translation gives
Fixed origin \(T(\mathbf{0}) = \mathbf{0}\) \(T(\mathbf{0}) = \mathbf{t} \neq \mathbf{0}\)
Additivity \(T(\mathbf{u}+\mathbf{v}) = T(\mathbf{u})+T(\mathbf{v})\) \(\mathbf{u}+\mathbf{v}+\mathbf{t}\) vs \(\mathbf{u}+\mathbf{v}+2\mathbf{t}\)
Homogeneity \(T(c\mathbf{u}) = cT(\mathbf{u})\) \(c\mathbf{u}+\mathbf{t}\) vs \(c\mathbf{u}+c\mathbf{t}\)

7.1.5 The Solution: Enlarge the Space

The fix is elegant: instead of enlarging the matrix entries, we enlarge the space. By embedding \(\mathbb{R}^2\) into \(\mathbb{R}^3\) (or \(\mathbb{R}^n\) into \(\mathbb{R}^{n+1}\)), translation becomes a linear map in the higher-dimensional space.

This is the central idea of homogeneous coordinates (§7.3). First, we characterise the class of maps that combines linear part and translation — the affine maps of §7.2.

7.2 Affine Maps: \(\mathbf{y} = A\mathbf{x} + \mathbf{t}\)

An affine map is a linear transformation followed by a translation:

\[\mathbf{y} = A\mathbf{x} + \mathbf{t}\]

where \(A\) is an \(m \times n\) matrix (the linear part) and \(\mathbf{t} \in \mathbb{R}^m\) is the translation vector.

Every linear map is a special affine map (with \(\mathbf{t} = \mathbf{0}\)). Every pure translation is a special affine map (with \(A = I\)). The general affine map combines both.


7.2.1 What Affine Maps Preserve

Affine maps preserve parallelism but not angles or distances (unless \(A\) happens to be an isometry).

The key property: if \(\mathbf{p}\), \(\mathbf{q}\), \(\mathbf{r}\) are collinear with \(\mathbf{q} = (1-\lambda)\mathbf{p} + \lambda\mathbf{r}\), then their images satisfy the same ratio:

\[f(\mathbf{q}) = (1-\lambda)\,f(\mathbf{p}) + \lambda\,f(\mathbf{r})\]

This is why affine maps are used in image warping, mesh deformation, and shape interpolation — they keep parallel lines parallel and preserve ratios along line segments.

import numpy as np
import matplotlib.pyplot as plt

A = np.array([[1.5, 0.5],
              [0.3, 1.2]])   # shape (2, 2)
t = np.array([1.0, 0.5])    # shape (2,)

def affine(x):
    return A @ x + t

# Parallel lines: two horizontal strips
fig, axes = plt.subplots(1, 2, figsize=(10, 4))

colors = ['#4a90d9', 'tomato', '#2ecc71', '#e67e22']
for i, y0 in enumerate([0.0, 0.5, 1.0, 1.5]):
    xs = np.linspace(0, 1, 50)
    pts = np.array([[x, y0] for x in xs])               # shape (50, 2)
    pts_t = np.array([affine(p) for p in pts])          # shape (50, 2)
    axes[0].plot(pts[:,0], pts[:,1], color=colors[i], lw=2)
    axes[1].plot(pts_t[:,0], pts_t[:,1], color=colors[i], lw=2)

for ax, title in [(axes[0], 'Input: parallel lines'),
                  (axes[1], r'Output: still parallel after affine map')]:
    ax.set_xlim(-0.2, 2.5); ax.set_ylim(-0.2, 3.0)
    ax.set_aspect('equal')
    ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.grid(True, alpha=0.2)
    ax.set_title(title, fontsize=10)
    ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$')

fig.suptitle(r'Affine maps preserve parallelism', fontsize=11)
fig.tight_layout()
plt.show()


7.2.2 The Six Types of 2D Affine Maps

Any 2D affine map \(\mathbf{y} = A\mathbf{x} + \mathbf{t}\) decomposes into:

Component Description Matrix \(A\) Translation \(\mathbf{t}\)
Translation Slide without rotating \(I\) \(\mathbf{t} \neq \mathbf{0}\)
Rotation Rotate about a point \(R_\theta\) Depends on centre
Scaling Stretch / shrink \(\text{diag}(s_x, s_y)\) \(\mathbf{0}\)
Shear Tilt along one axis \(\begin{bmatrix}1&k\\0&1\end{bmatrix}\) \(\mathbf{0}\)
Reflection Mirror across a line Ortho, \(\det=-1\) \(\mathbf{0}\) (line through origin)
General affine Any invertible \(A\) plus \(\mathbf{t}\) Invertible Arbitrary
import numpy as np
import matplotlib.pyplot as plt

# Unit square for reference
sq = np.array([[0,0],[1,0],[1,1],[0,1],[0,0]], dtype=float)  # shape (5, 2)

theta = np.pi / 4
configs = [
    ('Translation',  np.eye(2),                          np.array([1.5, 0.5])),
    ('Rotation 45°', np.array([[np.cos(theta),-np.sin(theta)],
                                [np.sin(theta), np.cos(theta)]]),  np.zeros(2)),
    ('Scale (2, 0.5)',np.array([[2.,0.],[0.,0.5]]),       np.zeros(2)),
    ('Shear',        np.array([[1.,0.8],[0.,1.]]),         np.zeros(2)),
    ('Reflection y', np.array([[-1.,0.],[0.,1.]]),         np.zeros(2)),
    ('General',      np.array([[1.2,0.4],[0.2,0.9]]),      np.array([0.5,-0.3])),
]

fig, axes = plt.subplots(2, 3, figsize=(12, 7))
axes = axes.flatten()

for ax, (name, M, tv) in zip(axes, configs):
    pts_t = (M @ sq.T).T + tv   # shape (5, 2)
    ax.plot(sq[:,0], sq[:,1], '--', color='#aaaaaa', lw=1.5, label='Original')
    ax.fill(pts_t[:-1,0], pts_t[:-1,1], alpha=0.25, color='#4a90d9')
    ax.plot(pts_t[:,0], pts_t[:,1], color='#4a90d9', lw=2, label='Transformed')
    ax.set_xlim(-2.2, 3.0); ax.set_ylim(-1.5, 2.5)
    ax.set_aspect('equal')
    ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.grid(True, alpha=0.2)
    ax.set_title(name, fontsize=10)
    ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$')

fig.suptitle('Six 2D affine maps applied to the unit square', fontsize=12)
fig.tight_layout()
plt.show()


7.2.3 Affine Maps in Machine Learning

Data augmentation: random affine transforms (small rotations + translations + scales) applied to training images prevent overfitting. Libraries like torchvision.transforms.RandomAffine implement exactly \(A\mathbf{x} + \mathbf{t}\).

Attention mechanisms: the attention score matrix is not affine, but the linear projection layers \(Q = XW_Q\), \(K = XW_K\), \(V = XW_V\) (plus bias) are affine.

Feature normalisation: batch norm, layer norm — each is an affine rescaling \(\hat{x}_i \leftarrow \gamma x_i + \beta\).


7.2.4 Inverse of an Affine Map

If \(A\) is invertible, the affine map \(\mathbf{y} = A\mathbf{x} + \mathbf{t}\) has a well-defined inverse:

\[\mathbf{x} = A^{-1}(\mathbf{y} - \mathbf{t}) = A^{-1}\mathbf{y} - A^{-1}\mathbf{t}\]

The inverse is also an affine map, with linear part \(A^{-1}\) and translation \(-A^{-1}\mathbf{t}\).

import numpy as np

A = np.array([[1.5, 0.5],
              [0.3, 1.2]])   # shape (2, 2)
t = np.array([1.0, 0.5])    # shape (2,)

# Inverse affine
A_inv      = np.linalg.inv(A)   # shape (2, 2)
t_inv      = -A_inv @ t         # shape (2,)

# Verify: forward then inverse should recover original point
x_test = np.array([3., -1.])    # shape (2,)
y_test = A @ x_test + t
x_recovered = A_inv @ y_test + t_inv

print(f"x_test       = {x_test}")
print(f"Forward y    = A @ x + t = {y_test.round(6)}")
print(f"Inverse x    = A_inv @ y + t_inv = {x_recovered.round(6)}")
print(f"Recovered correctly: {np.allclose(x_test, x_recovered)}")
x_test       = [ 3. -1.]
Forward y    = A @ x + t = [5.  0.2]
Inverse x    = A_inv @ y + t_inv = [ 3. -1.]
Recovered correctly: True

7.3 Homogeneous Coordinates

The key insight that unifies linear and affine maps is deceptively simple: embed \(\mathbb{R}^n\) into \(\mathbb{R}^{n+1}\) by appending a \(1\) to every point. In this enlarged space, translation becomes a linear operation — representable as an \((n+1) \times (n+1)\) matrix.

This trick, called homogeneous coordinates, is the backbone of computer graphics (OpenGL, Vulkan, Metal), robotics (ROS2 transforms), and camera projection (Chapter 10).


7.3.1 The Embedding

A 2D point \(\mathbf{p} = [x, y]^T\) is embedded as:

\[\tilde{\mathbf{p}} = \begin{bmatrix}x \\ y \\ 1\end{bmatrix} \in \mathbb{R}^3\]

A 2D direction vector \(\mathbf{v} = [v_x, v_y]^T\) is embedded as:

\[\tilde{\mathbf{v}} = \begin{bmatrix}v_x \\ v_y \\ 0\end{bmatrix} \in \mathbb{R}^3\]

The bottom entry — \(1\) for points, \(0\) for vectors — is the homogeneous coordinate. This distinction is not cosmetic; it has deep geometric meaning and prevents a pervasive class of bugs in robotics and graphics code (Chapter 9).


7.3.2 Why Points Get 1 and Vectors Get 0

Points have a location. Translating them changes where they are.

Vectors (direction, velocity, force) have no location. Translating a velocity vector makes no physical sense — a force of \([3, 0]^T\) N points in the same direction regardless of where it acts.

The \(w\)-component encodes this: the \(3\times 3\) translation matrix will add \(t_x\) and \(t_y\) to the top entries only when \(w = 1\). When \(w = 0\), translation has no effect.

import numpy as np

# 3x3 homogeneous translation matrix for 2D
def T_mat(tx, ty):
    return np.array([[1., 0., tx],
                     [0., 1., ty],
                     [0., 0., 1.]])   # shape (3, 3)

tx, ty = 3.0, -1.5
M = T_mat(tx, ty)

# A point at (2, 4)
point = np.array([2., 4., 1.])   # shape (3,) — w=1

# A direction vector (1, 0)
vec   = np.array([1., 0., 0.])   # shape (3,) — w=0

print("Translation matrix M ="); print(M)
print(f"\nM @ point = {M @ point}  →  translated to ({2+tx}, {4+ty})")
print(f"M @ vec   = {M @ vec}   →  unchanged (direction vectors don't translate)")
Translation matrix M =
[[ 1.   0.   3. ]
 [ 0.   1.  -1.5]
 [ 0.   0.   1. ]]

M @ point = [5.  2.5 1. ]  →  translated to (5.0, 2.5)
M @ vec   = [1. 0. 0.]   →  unchanged (direction vectors don't translate)

7.3.3 The General Rule: Projecting Back to \(\mathbb{R}^n\)

If \(w \neq 0\), a homogeneous vector \([x, y, w]^T\) represents the 2D point:

\[\mathbf{p} = \left[\frac{x}{w},\; \frac{y}{w}\right]^T\]

This division by \(w\) is called perspective division and becomes crucial in camera projection (Chapter 10). For now, since we always maintain \(w = 1\), it is just: strip the last entry.

import numpy as np

def to_homogeneous(pts):
    """Add w=1 column.  pts: shape (n, 2)  ->  (n, 3)"""
    ones = np.ones((pts.shape[0], 1))
    return np.hstack([pts, ones])

def from_homogeneous(pts_h):
    """Divide by w, drop last column.  pts_h: (n, 3) -> (n, 2)"""
    w = pts_h[:, -1:]
    return pts_h[:, :-1] / w

pts_2d = np.array([[1., 2.],
                   [3., 0.],
                   [-1., 4.]])   # shape (3, 2)

pts_h = to_homogeneous(pts_2d)
print("Homogeneous form:\n", pts_h)

pts_back = from_homogeneous(pts_h)
print("\nBack to 2D:\n", pts_back)
print("Exact recovery:", np.allclose(pts_2d, pts_back))
Homogeneous form:
 [[ 1.  2.  1.]
 [ 3.  0.  1.]
 [-1.  4.  1.]]

Back to 2D:
 [[ 1.  2.]
 [ 3.  0.]
 [-1.  4.]]
Exact recovery: True

7.3.4 Why This Is “Projective” Geometry

In homogeneous coordinates, all scalar multiples of a vector represent the same point:

\[\begin{bmatrix}2\\4\\1\end{bmatrix} \sim \begin{bmatrix}4\\8\\2\end{bmatrix} \sim \begin{bmatrix}1\\2\\0.5\end{bmatrix}\]

They all represent the 2D point \((2, 4)\) after dividing by \(w\).

When \(w \to 0\), the point moves to infinity — it becomes a direction. Points at infinity represent the vanishing points of parallel lines (Chapter 11). The homogeneous coordinate \(w\) is what connects the algebra of matrices to the geometry of perspective.

import numpy as np
import matplotlib.pyplot as plt

# Show equivalent representations of the same point
pt = np.array([2., 4.])   # shape (2,) — 2D point

w_vals = [0.1, 0.5, 1.0, 2.0, 4.0]
homog  = np.array([[pt[0]*w, pt[1]*w, w] for w in w_vals])  # shape (5, 3)

fig, ax = plt.subplots(figsize=(6, 4))
sc = ax.scatter(homog[:,0], homog[:,1],
                c=w_vals, cmap='Blues', s=100, zorder=5)
for row, w in zip(homog, w_vals):
    ax.annotate(f'w={w}', xy=(row[0]+0.1, row[1]+0.2), fontsize=8)

ax.set_xlabel('$x$ (= $p_x \\cdot w$)')
ax.set_ylabel('$y$ (= $p_y \\cdot w$)')
ax.set_title(r'All these points represent $(2, 4)$ in homogeneous coords', fontsize=10)
ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
ax.grid(True, alpha=0.2)
plt.colorbar(sc, label='w value')
plt.tight_layout()
plt.show()


7.3.5 Points vs. Vectors: Why It Matters

This distinction will reappear in §9.4 with real consequences. For a preview:

  • Transforming a point with the full \(4 \times 4\) matrix applies both rotation and translation.
  • Transforming a direction vector (normal, velocity, ray direction) should apply only the rotation — the \(w=0\) entry ensures this automatically.

Forgetting the distinction leads to a pervasive class of bugs: normals getting translated (producing incorrect lighting in shaders), velocities getting shifted (producing incorrect physics), and ray directions getting offset (producing wrong intersection tests).

7.4 The 3×3 Homogeneous Matrix for 2D

With homogeneous coordinates in place, every 2D affine map

\[\mathbf{y} = A\mathbf{x} + \mathbf{t}\]

can be written as a single \(3 \times 3\) matrix multiplication:

\[\begin{bmatrix}y_1\\y_2\\1\end{bmatrix} = \underbrace{\begin{bmatrix}a_{11}&a_{12}&t_1\\a_{21}&a_{22}&t_2\\0&0&1\end{bmatrix}}_{M} \begin{bmatrix}x_1\\x_2\\1\end{bmatrix}\]

The bottom row \([0, 0, 1]\) is fixed — it ensures the \(w\)-component stays 1 throughout (assuming the input has \(w=1\)).


7.4.1 The Building Blocks

All standard 2D transforms have compact \(3\times 3\) forms:

import numpy as np

def T2d(tx, ty):
    """Pure translation."""
    return np.array([[1., 0., tx],
                     [0., 1., ty],
                     [0., 0., 1.]])   # shape (3, 3)

def R2d(theta):
    """Counter-clockwise rotation by theta radians."""
    c, s = np.cos(theta), np.sin(theta)
    return np.array([[c, -s, 0.],
                     [s,  c, 0.],
                     [0., 0., 1.]])   # shape (3, 3)

def S2d(sx, sy):
    """Uniform or non-uniform scaling."""
    return np.array([[sx, 0., 0.],
                     [0., sy, 0.],
                     [0., 0., 1.]])   # shape (3, 3)

def Hx2d(k):
    """Horizontal shear by factor k."""
    return np.array([[1., k, 0.],
                     [0., 1., 0.],
                     [0., 0., 1.]])   # shape (3, 3)

# Print each matrix
for name, M in [('Translation (3, -2)', T2d(3., -2.)),
                ('Rotation 45°',        R2d(np.pi/4)),
                ('Scale (2, 0.5)',       S2d(2., 0.5)),
                ('Shear (k=1)',          Hx2d(1.))]:
    print(f"{name}:\n{M.round(4)}\n")
Translation (3, -2):
[[ 1.  0.  3.]
 [ 0.  1. -2.]
 [ 0.  0.  1.]]

Rotation 45°:
[[ 0.7071 -0.7071  0.    ]
 [ 0.7071  0.7071  0.    ]
 [ 0.      0.      1.    ]]

Scale (2, 0.5):
[[2.  0.  0. ]
 [0.  0.5 0. ]
 [0.  0.  1. ]]

Shear (k=1):
[[1. 1. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

7.4.2 Applying the Matrix to Points

To transform a batch of 2D points, convert to homogeneous, multiply, convert back:

import numpy as np
import matplotlib.pyplot as plt

def to_h(pts):
    """pts: (n,2) -> (n,3)"""
    return np.hstack([pts, np.ones((len(pts), 1))])

def from_h(pts_h):
    """pts_h: (n,3) -> (n,2)"""
    return pts_h[:, :2] / pts_h[:, 2:]

def R2d(theta):
    c, s = np.cos(theta), np.sin(theta)
    return np.array([[c,-s,0.],[s,c,0.],[0.,0.,1.]])

def T2d(tx, ty):
    return np.array([[1.,0.,tx],[0.,1.,ty],[0.,0.,1.]])

# Unit square
sq = np.array([[0,0],[1,0],[1,1],[0,1],[0,0]], dtype=float)  # shape (5, 2)
sq_h = to_h(sq)                                               # shape (5, 3)

# Rotate 30° about image centre (0.5, 0.5)
#   Step 1: translate centre to origin
#   Step 2: rotate
#   Step 3: translate back
cx, cy = 0.5, 0.5
M = T2d(cx, cy) @ R2d(np.pi/6) @ T2d(-cx, -cy)   # shape (3, 3)

sq_t = from_h((M @ sq_h.T).T)   # shape (5, 2)

fig, ax = plt.subplots(figsize=(5, 5))
ax.fill(sq[:-1,0], sq[:-1,1], alpha=0.2, color='#4a90d9')
ax.plot(sq[:,0], sq[:,1], color='#4a90d9', lw=2, label='Original')
ax.fill(sq_t[:-1,0], sq_t[:-1,1], alpha=0.2, color='tomato')
ax.plot(sq_t[:,0], sq_t[:,1], color='tomato', lw=2, label='Rotated 30° about centre')
ax.scatter([cx],[cy], color='black', s=80, zorder=5, label='Centre of rotation')
ax.set_xlim(-0.5, 1.8); ax.set_ylim(-0.5, 1.8)
ax.set_aspect('equal')
ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
ax.grid(True, alpha=0.2)
ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$')
ax.set_title('Rotation about an off-origin point via homogeneous coords')
ax.legend(fontsize=9)
plt.tight_layout()
plt.show()

Without homogeneous coordinates, rotating about a point other than the origin requires three separate operations and careful bookkeeping. With them, the three steps compose into a single \(3 \times 3\) matrix.


7.4.3 Composition by Multiplication

Chaining affine transforms is just matrix multiplication in homogeneous space:

import numpy as np

def T2d(tx, ty):
    return np.array([[1.,0.,tx],[0.,1.,ty],[0.,0.,1.]])

def R2d(theta):
    c, s = np.cos(theta), np.sin(theta)
    return np.array([[c,-s,0.],[s,c,0.],[0.,0.,1.]])

def S2d(sx, sy):
    return np.array([[sx,0.,0.],[0.,sy,0.],[0.,0.,1.]])

# Compose: scale, then rotate 45°, then translate
M = T2d(2., 1.) @ R2d(np.pi/4) @ S2d(0.5, 2.)   # shape (3, 3)
print("Composed 3×3 matrix (Scale→Rotate→Translate):")
print(M.round(4))

# Extract the linear part (top-left 2x2) and translation (top-right column)
A_linear = M[:2, :2]   # shape (2, 2)
t_vec    = M[:2,  2]   # shape (2,)
print(f"\nLinear part A:\n{A_linear.round(4)}")
print(f"Translation t: {t_vec.round(4)}")
print(f"Bottom row (should be [0,0,1]): {M[2,:]}")
Composed 3×3 matrix (Scale→Rotate→Translate):
[[ 0.3536 -1.4142  2.    ]
 [ 0.3536  1.4142  1.    ]
 [ 0.      0.      1.    ]]

Linear part A:
[[ 0.3536 -1.4142]
 [ 0.3536  1.4142]]
Translation t: [2. 1.]
Bottom row (should be [0,0,1]): [0. 0. 1.]

The bottom row \([0, 0, 1]\) is an invariant: composing any two valid affine matrices preserves this row exactly.


7.4.4 Visualising the Grid Under All Building Blocks

import numpy as np
import matplotlib.pyplot as plt

def to_h(pts):
    return np.hstack([pts, np.ones((len(pts), 1))])

def from_h(pts_h):
    return pts_h[:, :2] / pts_h[:, 2:]

# Grid of points
s = np.linspace(0, 1, 10)
pts = np.array([[u, v] for u in s for v in s])   # shape (100, 2)
pts_h = to_h(pts)                                  # shape (100, 3)

theta = np.pi / 4
matrices = {
    'Translation (0.5, 0.3)': np.array([[1.,0.,0.5],[0.,1.,0.3],[0.,0.,1.]]),
    'Rotation 45°':           np.array([[np.cos(theta),-np.sin(theta),0.],
                                         [np.sin(theta), np.cos(theta),0.],
                                         [0.,0.,1.]]),
    'Scale (2, 0.5)':         np.array([[2.,0.,0.],[0.,0.5,0.],[0.,0.,1.]]),
    'Shear (k=0.8)':          np.array([[1.,0.8,0.],[0.,1.,0.],[0.,0.,1.]]),
}

fig, axes = plt.subplots(1, 4, figsize=(14, 3.5))
for ax, (name, M) in zip(axes, matrices.items()):
    pts_t = from_h((M @ pts_h.T).T)
    ax.scatter(pts_t[:,0], pts_t[:,1], s=8, color='#4a90d9', alpha=0.6)
    ax.scatter(pts[:,0],   pts[:,1],   s=4, color='#aaaaaa', alpha=0.3)
    ax.set_xlim(-0.5, 2.5); ax.set_ylim(-0.5, 1.5)
    ax.set_aspect('equal')
    ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
    ax.grid(True, alpha=0.2)
    ax.set_title(name, fontsize=9)

fig.suptitle('3×3 homogeneous matrices applied to a grid', fontsize=11)
fig.tight_layout()
plt.show()

7.5 The 4×4 Homogeneous Matrix for 3D

The same trick extends to 3D. Every 3D affine map \(\mathbf{y} = R\mathbf{x} + \mathbf{t}\) is written as a \(4 \times 4\) homogeneous matrix:

\[\begin{bmatrix}y_1\\y_2\\y_3\\1\end{bmatrix} = \underbrace{\begin{bmatrix}r_{11}&r_{12}&r_{13}&t_1\\ r_{21}&r_{22}&r_{23}&t_2\\ r_{31}&r_{32}&r_{33}&t_3\\ 0&0&0&1\end{bmatrix}}_{T} \begin{bmatrix}x_1\\x_2\\x_3\\1\end{bmatrix}\]

This \(4 \times 4\) matrix is the workhorse of 3D robotics and graphics. Every robot joint, every camera pose, every object in a game engine is stored and composed using exactly this structure.


7.5.1 Building 3D Homogeneous Matrices

import numpy as np

def T3d(tx, ty, tz):
    """Pure 3D translation."""
    M = np.eye(4)           # shape (4, 4)
    M[:3, 3] = [tx, ty, tz]
    return M

def Rx3d(alpha):
    """Rotation about x-axis by alpha radians."""
    c, s = np.cos(alpha), np.sin(alpha)
    return np.array([[1., 0., 0., 0.],
                     [0.,  c, -s, 0.],
                     [0.,  s,  c, 0.],
                     [0., 0., 0., 1.]])   # shape (4, 4)

def Ry3d(beta):
    """Rotation about y-axis by beta radians."""
    c, s = np.cos(beta), np.sin(beta)
    return np.array([[ c, 0., s, 0.],
                     [0., 1., 0., 0.],
                     [-s, 0., c, 0.],
                     [0., 0., 0., 1.]])   # shape (4, 4)

def Rz3d(gamma):
    """Rotation about z-axis by gamma radians."""
    c, s = np.cos(gamma), np.sin(gamma)
    return np.array([[c, -s, 0., 0.],
                     [s,  c, 0., 0.],
                     [0., 0., 1., 0.],
                     [0., 0., 0., 1.]])   # shape (4, 4)

def S3d(sx, sy, sz):
    """3D scaling."""
    return np.diag([sx, sy, sz, 1.]).astype(float)   # shape (4, 4)

# Example: translate then rotate about z
M = T3d(1., 2., 0.) @ Rz3d(np.pi/4)   # shape (4, 4)
print("T @ Rz(45°):")
print(M.round(4))
print(f"\nBottom row: {M[3,:]}")
T @ Rz(45°):
[[ 0.7071 -0.7071  0.      1.    ]
 [ 0.7071  0.7071  0.      2.    ]
 [ 0.      0.      1.      0.    ]
 [ 0.      0.      0.      1.    ]]

Bottom row: [0. 0. 0. 1.]

7.5.2 The Anatomy of a Pose Matrix

A \(4 \times 4\) homogeneous matrix encodes a complete 3D rigid-body pose:

\[T = \begin{bmatrix}\underbrace{R}_{3\times3\text{ rotation}} & \underbrace{\mathbf{t}}_{3\times1\text{ translation}} \\ \mathbf{0}^T & 1\end{bmatrix}\]

  • The top-left \(3 \times 3\) block \(R\) is a rotation matrix (\(R^TR=I\), \(\det R = +1\)).
  • The top-right column \(\mathbf{t}\) is the translation (position of the child frame’s origin in the parent frame).
  • The bottom row is always \([0, 0, 0, 1]\).
import numpy as np

def Rz3d(gamma):
    c, s = np.cos(gamma), np.sin(gamma)
    return np.array([[c,-s,0.],[s,c,0.],[0.,0.,1.]])

# Build a pose matrix: rotate 30° about z, translate to (1, 2, 3)
R = Rz3d(np.pi / 6)     # shape (3, 3)
t = np.array([1., 2., 3.])   # shape (3,)

T = np.eye(4)       # shape (4, 4)
T[:3, :3] = R
T[:3,  3] = t

print("Pose matrix T ="); print(T.round(4))

# Extract parts back
R_extracted = T[:3, :3]   # shape (3, 3)
t_extracted = T[:3,  3]   # shape (3,)

print(f"\nR orthogonal?  R^T R = I: {np.allclose(R_extracted.T @ R_extracted, np.eye(3))}")
print(f"det(R) = {np.linalg.det(R_extracted):.6f}  (should be +1)")
print(f"t = {t_extracted}")
Pose matrix T =
[[ 0.866 -0.5    0.     1.   ]
 [ 0.5    0.866  0.     2.   ]
 [ 0.     0.     1.     3.   ]
 [ 0.     0.     0.     1.   ]]

R orthogonal?  R^T R = I: True
det(R) = 1.000000  (should be +1)
t = [1. 2. 3.]

7.5.3 Applying a Pose to Points and Vectors

import numpy as np

def Rz3d(gamma):
    c, s = np.cos(gamma), np.sin(gamma)
    return np.array([[c,-s,0.],[s,c,0.],[0.,0.,1.]])

R = Rz3d(np.pi / 4)
T = np.eye(4)
T[:3,:3] = R
T[:3, 3] = np.array([1., 0., 0.5])

# A 3D point: w=1
point_h = np.array([0., 0., 0., 1.])   # shape (4,) — origin

# A 3D direction vector: w=0
vec_h = np.array([1., 0., 0., 0.])     # shape (4,) — x-direction

pt_out  = T @ point_h
vec_out = T @ vec_h

print("Input point  (w=1):", point_h[:3])
print("Output point (w=1):", pt_out[:3].round(4),
      " — translated AND rotated")
print()
print("Input vector (w=0):", vec_h[:3])
print("Output vector(w=0):", vec_out[:3].round(4),
      " — only rotated, NOT translated")
Input point  (w=1): [0. 0. 0.]
Output point (w=1): [1.  0.  0.5]  — translated AND rotated

Input vector (w=0): [1. 0. 0.]
Output vector(w=0): [0.7071 0.7071 0.    ]  — only rotated, NOT translated

The \(w=0\) entry in the vector automatically prevents the translation from being applied — no special-case code needed.


7.5.4 Visualising 3D Affine Motion

import numpy as np
import matplotlib.pyplot as plt

def Ry3d(beta):
    c, s = np.cos(beta), np.sin(beta)
    return np.array([[ c,0.,s,0.],
                     [0.,1.,0.,0.],
                     [-s,0.,c,0.],
                     [0.,0.,0.,1.]])

def T3d(tx, ty, tz):
    M = np.eye(4)
    M[:3, 3] = [tx, ty, tz]
    return M

# A simple 3D object: a cube wireframe (corners only)
corners = np.array([[x,y,z] for x in [0,1] for y in [0,1] for z in [0,1]],
                   dtype=float)   # shape (8, 3)
corners_h = np.hstack([corners, np.ones((8,1))])   # shape (8, 4)

# Transformation: rotate about y then translate
M = T3d(1.5, 0., 0.) @ Ry3d(np.pi / 5)   # shape (4, 4)
corners_t = (M @ corners_h.T).T           # shape (8, 4)

fig = plt.figure(figsize=(9, 4))
for idx, (pts, title, col) in enumerate([
        (corners,     'Original cube',    '#4a90d9'),
        (corners_t[:,:3], 'After Ry→Translate', 'tomato')]):
    ax = fig.add_subplot(1, 2, idx+1, projection='3d')
    ax.scatter(pts[:,0], pts[:,1], pts[:,2], color=col, s=40)
    # Draw edges (pairs of corners differing in one coordinate)
    for i in range(8):
        for j in range(i+1, 8):
            diff = np.abs(corners[i] - corners[j])
            if diff.sum() == 1:   # adjacent corners
                pi = pts[i] if col=='#4a90d9' else corners_t[i,:3]
                pj = pts[j] if col=='#4a90d9' else corners_t[j,:3]
                ax.plot([pi[0],pj[0]],[pi[1],pj[1]],[pi[2],pj[2]],
                        color=col, lw=1.5, alpha=0.7)
    ax.set_title(title, fontsize=10)
    ax.set_xlabel('x'); ax.set_ylabel('y'); ax.set_zlabel('z')

fig.suptitle('4×4 homogeneous matrix transforms a 3D cube', fontsize=11)
fig.tight_layout()
plt.show()


7.5.5 How Robotics Frameworks Use This

In ROS2, every coordinate frame relationship is stored as a geometry_msgs/TransformStamped, which encodes exactly the \(4 \times 4\) pose matrix (as a quaternion + translation). The tf2 library multiplies these matrices when you call lookup_transform — chaining the robot’s kinematic tree from world frame to sensor frame.

In OpenGL, the MVP (Model-View-Projection) pipeline is three \(4 \times 4\) matrix multiplications: \(P \cdot V \cdot M \cdot \tilde{\mathbf{p}}\), applying the model transform, then the view transform, then the projection.

7.6 Composing Affine Chains

In robotics and graphics, transformations are always chained: world → body → sensor → feature. Each link is a \(4\times4\) matrix, and the total transform is their product. This section works through a concrete 3-step chain with full numerical verification.

The critical lesson: order is not reversible. Translate-then-rotate produces a different result from rotate-then-translate.


7.6.1 Order Matters: A Concrete Failure

import numpy as np

def T2d(tx, ty):
    return np.array([[1.,0.,tx],[0.,1.,ty],[0.,0.,1.]])

def R2d(theta):
    c, s = np.cos(theta), np.sin(theta)
    return np.array([[c,-s,0.],[s,c,0.],[0.,0.,1.]])

theta = np.pi / 3   # 60 degrees

# Two orderings of the same two operations
TR = T2d(2., 1.) @ R2d(theta)   # rotate FIRST, then translate
RT = R2d(theta) @ T2d(2., 1.)   # translate FIRST, then rotate

print("Translate AFTER Rotate  (T @ R):")
print(TR.round(4))
print("\nTranslate BEFORE Rotate (R @ T):")
print(RT.round(4))
print("\nAre they equal?", np.allclose(TR, RT))
Translate AFTER Rotate  (T @ R):
[[ 0.5   -0.866  2.   ]
 [ 0.866  0.5    1.   ]
 [ 0.     0.     1.   ]]

Translate BEFORE Rotate (R @ T):
[[ 0.5    -0.866   0.134 ]
 [ 0.866   0.5     2.2321]
 [ 0.      0.      1.    ]]

Are they equal? False
import numpy as np
import matplotlib.pyplot as plt

def T2d(tx, ty):
    return np.array([[1.,0.,tx],[0.,1.,ty],[0.,0.,1.]])

def R2d(theta):
    c, s = np.cos(theta), np.sin(theta)
    return np.array([[c,-s,0.],[s,c,0.],[0.,0.,1.]])

def apply_h(M, pts):
    """pts: (n,2) -> transformed (n,2)"""
    pts_h = np.hstack([pts, np.ones((len(pts),1))])
    out   = (M @ pts_h.T).T
    return out[:,:2]

sq = np.array([[0,0],[1,0],[1,1],[0,1],[0,0]], dtype=float)
theta = np.pi / 3

TR = T2d(2., 1.) @ R2d(theta)
RT = R2d(theta) @ T2d(2., 1.)

sq_TR = apply_h(TR, sq)
sq_RT = apply_h(RT, sq)

fig, ax = plt.subplots(figsize=(7, 6))
ax.plot(sq[:,0],    sq[:,1],    '--', color='#aaaaaa', lw=1.5, label='Original')
ax.fill(sq_TR[:-1,0], sq_TR[:-1,1], alpha=0.25, color='#4a90d9')
ax.plot(sq_TR[:,0], sq_TR[:,1], color='#4a90d9', lw=2, label='Rotate → Translate (T@R)')
ax.fill(sq_RT[:-1,0], sq_RT[:-1,1], alpha=0.25, color='tomato')
ax.plot(sq_RT[:,0], sq_RT[:,1], color='tomato', lw=2, label='Translate → Rotate (R@T)')

ax.set_xlim(-2, 4); ax.set_ylim(-2, 4)
ax.set_aspect('equal')
ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
ax.grid(True, alpha=0.2)
ax.set_xlabel('$x_1$'); ax.set_ylabel('$x_2$')
ax.set_title('Translate-then-rotate $\\neq$ rotate-then-translate', fontsize=11)
ax.legend(fontsize=9)
plt.tight_layout()
plt.show()


7.6.2 A Three-Step Chain

Scenario: a camera is mounted on a robot arm. The robot:

  1. Translates the camera from the arm base to the joint: \(T_1\)
  2. Rotates the camera at the joint: \(R\)
  3. Translates the camera to the final sensor position: \(T_2\)

Compute the total transform and apply it to a point cloud.

import numpy as np

def T3d(tx, ty, tz):
    M = np.eye(4)
    M[:3, 3] = [tx, ty, tz]
    return M

def Rz3d(gamma):
    c, s = np.cos(gamma), np.sin(gamma)
    return np.array([[c,-s,0.,0.],[s,c,0.,0.],[0.,0.,1.,0.],[0.,0.,0.,1.]])

# Step 1: translate to joint (0.3 m forward in x)
T1 = T3d(0.3, 0., 0.)   # shape (4, 4)

# Step 2: rotate at joint (30° about z)
R  = Rz3d(np.pi / 6)    # shape (4, 4)

# Step 3: translate to sensor (0.1 m along local x after rotation)
T2 = T3d(0.1, 0., 0.)   # shape (4, 4)

# Total: T2 applied LAST (rightmost = applied first to the point)
M_total = T2 @ R @ T1   # shape (4, 4)
print("Total transform  T2 @ R @ T1:")
print(M_total.round(4))

# Verify: where does the origin end up?
p_origin = np.array([0., 0., 0., 1.])   # shape (4,)
p_final  = M_total @ p_origin
print(f"\nOrigin under total transform: {p_final[:3].round(4)}")

# Manual step-by-step
p1 = T1 @ p_origin
p2 = R  @ p1
p3 = T2 @ p2
print(f"Step-by-step:                 {p3[:3].round(4)}")
print(f"Match: {np.allclose(p_final, p3)}")
Total transform  T2 @ R @ T1:
[[ 0.866  -0.5     0.      0.3598]
 [ 0.5     0.866   0.      0.15  ]
 [ 0.      0.      1.      0.    ]
 [ 0.      0.      0.      1.    ]]

Origin under total transform: [0.3598 0.15   0.    ]
Step-by-step:                 [0.3598 0.15   0.    ]
Match: True

7.6.3 Inverting a Chain

The inverse of a chain reverses all steps:

\[(T_2 R T_1)^{-1} = T_1^{-1} R^{-1} T_2^{-1}\]

For pose matrices (\(R\) orthogonal), the inverse is:

\[T^{-1} = \begin{bmatrix}R^T & -R^T\mathbf{t} \\ \mathbf{0}^T & 1\end{bmatrix}\]

This is cheap — no matrix inversion required, just transpose and multiply.

import numpy as np

def Rz3d(gamma):
    c, s = np.cos(gamma), np.sin(gamma)
    return np.array([[c,-s,0.,0.],[s,c,0.,0.],[0.,0.,1.,0.],[0.,0.,0.,1.]])

def T3d(tx, ty, tz):
    M = np.eye(4)
    M[:3, 3] = [tx, ty, tz]
    return M

def pose_inverse(T):
    """Efficient inverse for a 4x4 pose matrix (R block + t column)."""
    T_inv = np.eye(4)
    R = T[:3, :3]   # shape (3, 3)
    t = T[:3,  3]   # shape (3,)
    T_inv[:3, :3] = R.T
    T_inv[:3,  3] = -R.T @ t
    return T_inv

M = T3d(0.1, 0., 0.) @ Rz3d(np.pi/6) @ T3d(0.3, 0., 0.)

M_inv_fast  = pose_inverse(M)
M_inv_numpy = np.linalg.inv(M)

print("Efficient inverse:")
print(M_inv_fast.round(6))
print("\nnp.linalg.inv:")
print(M_inv_numpy.round(6))
print(f"\nMatch: {np.allclose(M_inv_fast, M_inv_numpy)}")
print(f"M @ M_inv = I? {np.allclose(M @ M_inv_fast, np.eye(4))}")
Efficient inverse:
[[ 0.866025  0.5       0.       -0.386603]
 [-0.5       0.866025  0.        0.05    ]
 [ 0.        0.        1.        0.      ]
 [ 0.        0.        0.        1.      ]]

np.linalg.inv:
[[ 0.866025  0.5       0.       -0.386603]
 [-0.5       0.866025  0.        0.05    ]
 [ 0.        0.        1.        0.      ]
 [ 0.        0.        0.        1.      ]]

Match: True
M @ M_inv = I? True

7.6.4 Robot Kinematics Preview

In a robot arm with \(n\) joints, each joint \(i\) contributes one \(4\times4\) matrix \(T_i(\theta_i)\) parameterised by the joint angle. The end-effector pose in world frame is:

\[T_\text{ee} = T_1(\theta_1)\; T_2(\theta_2)\; \cdots\; T_n(\theta_n)\]

This is the forward kinematics equation. Each \(T_i\) is an affine transform in homogeneous coordinates — exactly what this chapter is building toward (Chapter 9).

import numpy as np

def T3d(tx, ty, tz):
    M = np.eye(4); M[:3,3]=[tx,ty,tz]; return M

def Rz3d(gamma):
    c,s=np.cos(gamma),np.sin(gamma)
    return np.array([[c,-s,0.,0.],[s,c,0.,0.],[0.,0.,1.,0.],[0.,0.,0.,1.]])

# 3-link planar robot arm: all rotations about z, links of length 1
def fk(theta1, theta2, theta3):
    T1 = Rz3d(theta1)
    T2 = T3d(1., 0., 0.) @ Rz3d(theta2)   # move to next joint
    T3 = T3d(1., 0., 0.) @ Rz3d(theta3)
    T_ee = T3d(1., 0., 0.)                  # end-effector offset
    return T1 @ T2 @ T3 @ T_ee

# Compute end-effector position for a specific joint configuration
T = fk(np.pi/6, np.pi/4, -np.pi/3)
origin = np.array([0., 0., 0., 1.])
ee_pos = T @ origin
print(f"End-effector position: {ee_pos[:3].round(4)}")

# Sweep theta1 and plot the reachable workspace
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(5, 5))
for t1 in np.linspace(0, 2*np.pi, 60):
    for t2 in np.linspace(0, 2*np.pi, 20):
        pos = (fk(t1, t2, 0.) @ origin)[:2]
        ax.scatter(pos[0], pos[1], s=1, color='#4a90d9', alpha=0.1)
ax.set_aspect('equal')
ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
ax.grid(True, alpha=0.2)
ax.set_title('Reachable workspace of a 2-joint arm (sweeping $\\theta_1$, $\\theta_2$)', fontsize=9)
ax.set_xlabel('x'); ax.set_ylabel('y')
plt.tight_layout()
plt.show()
End-effector position: [2.0908 1.7247 0.    ]

7.7 Application: 2D Image Stitching

Two cameras, two drone photos of the same crop field — slightly shifted and rotated. A \(3 \times 3\) affine matrix, estimated from matched keypoints, is all it takes to warp one image onto the other’s coordinate frame and stitch them together. This section walks through the full pipeline from scratch.


7.7.1 Why Affine Is the Right Model

A drone photo of flat ground, seen from nearly overhead, is well-modelled by a 2D affine transform rather than a full perspective (projective) transform:

  • The scene is planar (flat field).
  • The camera is nearly orthographic at altitude.
  • Residual tilt introduces only slight shear and scale.

A \(3 \times 3\) homogeneous matrix — exactly what this chapter builds — gives us rotation, scale, shear, and translation in a single matrix multiply.


7.7.2 Step 1 — Synthetic Scene (No Real Image Required)

To keep the pipeline runnable without external data, we generate a synthetic “field” image: a gradient background with a grid of crop rows.

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

rng = np.random.default_rng(42)

H, W = 200, 200   # image height, width (pixels)

# Background: soft green gradient
xx, yy = np.meshgrid(np.linspace(0, 1, W), np.linspace(0, 1, H))
base_r = (0.55 + 0.10 * yy).clip(0, 1)   # shape (H, W)
base_g = (0.72 + 0.08 * xx).clip(0, 1)   # shape (H, W)
base_b = (0.40 + 0.05 * yy).clip(0, 1)   # shape (H, W)

# Crop rows: horizontal dark bands every 20 pixels
row_mask = ((np.arange(H) % 20) < 4)[:, None]   # shape (H, 1)
row_mask = np.broadcast_to(row_mask, (H, W))      # shape (H, W)

field_r = np.where(row_mask, base_r * 0.5, base_r)   # shape (H, W)
field_g = np.where(row_mask, base_g * 0.7, base_g)   # shape (H, W)
field_b = np.where(row_mask, base_b * 0.5, base_b)   # shape (H, W)

field = np.stack([field_r, field_g, field_b], axis=2)   # shape (H, W, 3)

# Add a few landmark dots for keypoint matching
landmarks = np.array([[40, 50], [80, 40], [150, 60],
                       [50, 130], [120, 150], [170, 120]],
                     dtype=float)   # shape (6, 2)  col=x, row=y

fig, ax = plt.subplots(figsize=(4, 4))
ax.imshow(field, origin='upper')
ax.scatter(landmarks[:, 0], landmarks[:, 1],
           color='yellow', s=60, zorder=5, label='Keypoints')
ax.set_title('Reference field image (image A)', fontsize=10)
ax.legend(fontsize=8)
plt.tight_layout()
plt.show()


7.7.3 Step 2 — Simulating a Second Camera View

The second camera is shifted right by 40 px, shifted down by 20 px, and rotated 8° relative to the first. We apply a known affine transform to the reference image to generate image B, then pretend we don’t know the matrix.

import numpy as np
from scipy.ndimage import affine_transform
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)

H, W = 200, 200

# Re-create field image
xx, yy = np.meshgrid(np.linspace(0, 1, W), np.linspace(0, 1, H))
base_r = (0.55 + 0.10 * yy).clip(0, 1)
base_g = (0.72 + 0.08 * xx).clip(0, 1)
base_b = (0.40 + 0.05 * yy).clip(0, 1)
row_mask = ((np.arange(H) % 20) < 4)[:, None]
row_mask = np.broadcast_to(row_mask, (H, W))
field_r = np.where(row_mask, base_r * 0.5, base_r)
field_g = np.where(row_mask, base_g * 0.7, base_g)
field_b = np.where(row_mask, base_b * 0.5, base_b)
field = np.stack([field_r, field_g, field_b], axis=2)   # shape (H, W, 3)

# True transform: rotate 8° + translate (40, 20) in pixel coords
theta = np.deg2rad(8.)
c, s = np.cos(theta), np.sin(theta)

# scipy.ndimage.affine_transform applies the *inverse* transform
# (pull model: for each output pixel, find source pixel)
# We want: output = R @ input + t  =>  source = R^T @ (output - t)
R_true = np.array([[c, -s], [s, c]])   # shape (2, 2) — forward rotation
tx, ty = 40., 20.                       # forward translation (x-right, y-down)

# For ndimage (row, col) convention: swap x<->y in translation
R_nd = np.array([[c, s], [-s, c]])         # shape (2, 2) — inverse rotation
t_nd = np.array([-ty*c - tx*s, ty*s - tx*c])  # shape (2,) — offset in row/col

# Warp each channel independently
warped_channels = []
for ch in range(3):
    w = affine_transform(field[:, :, ch],          # shape (H, W)
                         R_nd,
                         offset=t_nd,
                         output_shape=(H, W),
                         mode='constant', cval=0.)
    warped_channels.append(w)
img_B = np.stack(warped_channels, axis=2)   # shape (H, W, 3)

fig, axes = plt.subplots(1, 2, figsize=(9, 4))
axes[0].imshow(field, origin='upper')
axes[0].set_title('Image A (reference)', fontsize=10)
axes[1].imshow(img_B, origin='upper')
axes[1].set_title('Image B (shifted 40px right, 20px down, 8° rotated)', fontsize=10)
for ax in axes:
    ax.axis('off')
fig.suptitle('Two overlapping views of the same crop field', fontsize=11)
plt.tight_layout()
plt.show()


7.7.4 Step 3 — Corresponding Keypoints

In a real pipeline, feature detectors (SIFT, ORB) find matching points automatically. Here we compute the correspondences analytically from the known transform so the algebra is transparent.

import numpy as np

# Keypoints in image A (column x, row y)
pts_A = np.array([[40., 50.], [80., 40.], [150., 60.],
                  [50., 130.], [120., 150.], [170., 120.]])   # shape (6, 2)

theta = np.deg2rad(8.)
c, s  = np.cos(theta), np.sin(theta)
R_true = np.array([[c, -s], [s, c]])   # shape (2, 2)
t_true = np.array([40., 20.])          # shape (2,)

# Corresponding keypoints in image B
# (in real life these come from a feature matcher)
pts_B = (R_true @ pts_A.T).T + t_true   # shape (6, 2)

print("Keypoints in A (x, y):")
print(pts_A.round(1))
print("\nCorresponding keypoints in B (x, y):")
print(pts_B.round(1))
print(f"\nKnown transform: R(8°) + t=({t_true[0]}, {t_true[1]})")
Keypoints in A (x, y):
[[ 40.  50.]
 [ 80.  40.]
 [150.  60.]
 [ 50. 130.]
 [120. 150.]
 [170. 120.]]

Corresponding keypoints in B (x, y):
[[ 72.7  75.1]
 [113.7  70.7]
 [180.2 100.3]
 [ 71.4 155.7]
 [138.  185.2]
 [191.6 162.5]]

Known transform: R(8°) + t=(40.0, 20.0)

7.7.5 Step 4 — Estimating the Affine Matrix from Correspondences

Given \(n \geq 3\) matched points, the \(3 \times 3\) affine matrix \(M\) (mapping \(A\)-frame to \(B\)-frame) satisfies:

\[\begin{bmatrix}x'_i \\ y'_i \\ 1\end{bmatrix} = M \begin{bmatrix}x_i \\ y_i \\ 1\end{bmatrix}\]

For each pair, the first two rows give two linear equations in the 6 unknowns \(\{a_{11}, a_{12}, t_x, a_{21}, a_{22}, t_y\}\). With \(n\) point pairs we get \(2n\) equations — a least-squares problem once \(n > 3\).

\[\underbrace{\begin{bmatrix}x_1 & y_1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & x_1 & y_1 & 1 \\ \vdots & & & & & \vdots \\ x_n & y_n & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & x_n & y_n & 1\end{bmatrix}}_{C \;(2n \times 6)} \begin{bmatrix}a_{11}\\a_{12}\\t_x\\a_{21}\\a_{22}\\t_y\end{bmatrix} = \underbrace{\begin{bmatrix}x'_1\\y'_1\\\vdots\\x'_n\\y'_n\end{bmatrix}}_{\mathbf{b} \;(2n,)}\]

import numpy as np

pts_A = np.array([[40., 50.], [80., 40.], [150., 60.],
                  [50., 130.], [120., 150.], [170., 120.]])   # shape (6, 2)

theta = np.deg2rad(8.)
c, s  = np.cos(theta), np.sin(theta)
R_true = np.array([[c, -s], [s, c]])   # shape (2, 2)
t_true = np.array([40., 20.])          # shape (2,)
pts_B  = (R_true @ pts_A.T).T + t_true   # shape (6, 2)

n = len(pts_A)   # 6 correspondences

# Build the constraint matrix C (shape 2n x 6) and RHS b (shape 2n,)
C = np.zeros((2 * n, 6))   # shape (12, 6)
b = np.zeros(2 * n)        # shape (12,)

for i, (pa, pb) in enumerate(zip(pts_A, pts_B)):
    x,  y  = pa
    xp, yp = pb
    # Row for x'
    C[2*i,   :] = [x, y, 1., 0., 0., 0.]
    b[2*i]      = xp
    # Row for y'
    C[2*i+1, :] = [0., 0., 0., x, y, 1.]
    b[2*i+1]    = yp

# Solve in least-squares sense (overdetermined 12x6 system)
params, _, _, _ = np.linalg.lstsq(C, b, rcond=None)   # shape (6,)

# Reshape into 3x3 homogeneous matrix
M_est = np.array([[params[0], params[1], params[2]],
                  [params[3], params[4], params[5]],
                  [0.,        0.,        1.       ]])   # shape (3, 3)

print("Estimated affine matrix M (A -> B):")
print(M_est.round(6))

# Ground truth
M_true = np.array([[c, -s, t_true[0]],
                   [s,  c, t_true[1]],
                   [0., 0., 1.      ]])   # shape (3, 3)
print("\nTrue affine matrix:")
print(M_true.round(6))
print(f"\nMax error: {np.abs(M_est - M_true).max():.2e}")
Estimated affine matrix M (A -> B):
[[ 0.990268 -0.139173 40.      ]
 [ 0.139173  0.990268 20.      ]
 [ 0.        0.        1.      ]]

True affine matrix:
[[ 0.990268 -0.139173 40.      ]
 [ 0.139173  0.990268 20.      ]
 [ 0.        0.        1.      ]]

Max error: 2.84e-14

With 6 correspondences (12 equations for 6 unknowns), the least-squares fit recovers the true matrix to near-machine-precision.


7.7.6 Step 5 — Warping and Stitching

To stitch: warp image B back into image A’s coordinate frame using \(M^{-1}\), then blend the overlap region.

import numpy as np
from scipy.ndimage import affine_transform
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)

H, W = 200, 200

# Re-create both images
xx, yy = np.meshgrid(np.linspace(0, 1, W), np.linspace(0, 1, H))
base_r = (0.55 + 0.10 * yy).clip(0, 1)
base_g = (0.72 + 0.08 * xx).clip(0, 1)
base_b = (0.40 + 0.05 * yy).clip(0, 1)
row_mask = ((np.arange(H) % 20) < 4)[:, None]
row_mask = np.broadcast_to(row_mask, (H, W))
field_r = np.where(row_mask, base_r * 0.5, base_r)
field_g = np.where(row_mask, base_g * 0.7, base_g)
field_b = np.where(row_mask, base_b * 0.5, base_b)
img_A = np.stack([field_r, field_g, field_b], axis=2)   # shape (H, W, 3)

theta = np.deg2rad(8.)
c, s  = np.cos(theta), np.sin(theta)
R_nd  = np.array([[c, s], [-s, c]])   # shape (2, 2)
t_nd  = np.array([-20.*c - 40.*s, 20.*s - 40.*c])   # shape (2,)
warped_channels = []
for ch in range(3):
    w = affine_transform(img_A[:, :, ch], R_nd, offset=t_nd,
                         output_shape=(H, W), mode='constant', cval=0.)
    warped_channels.append(w)
img_B = np.stack(warped_channels, axis=2)   # shape (H, W, 3)

# Inverse warp: bring img_B back into img_A's frame
# M_est (A->B), so M_est_inv warps B->A
# scipy uses (row, col) — M_est is in (x=col, y=row), swap axes
# For the inverse warp of img_B onto img_A frame:
#   output_pixel (r,c) in A-frame  <-  M_est_inv applied to (r,c)
R_inv_nd = np.array([[c, -s], [s, c]])   # shape (2, 2) — R_true (forward)
# offset: for inverse warp, offset = -R_inv * t_true (in row/col order)
t_true_rc = np.array([20., 40.])          # (row offset, col offset)
t_inv_nd  = -R_inv_nd @ t_true_rc         # shape (2,)

warped_B_to_A = []
for ch in range(3):
    w = affine_transform(img_B[:, :, ch], R_inv_nd, offset=t_inv_nd,
                         output_shape=(H, W), mode='constant', cval=0.)
    warped_B_to_A.append(w)
img_B_warped = np.stack(warped_B_to_A, axis=2)   # shape (H, W, 3)

# Blend: where both images have signal, average; else take whichever is present
mask_A = (img_A.max(axis=2) > 0.01).astype(float)   # shape (H, W)
mask_B = (img_B_warped.max(axis=2) > 0.01).astype(float)   # shape (H, W)
w_A = mask_A / (mask_A + mask_B + 1e-8)   # shape (H, W)
w_B = mask_B / (mask_A + mask_B + 1e-8)   # shape (H, W)

mosaic = (w_A[:, :, None] * img_A
        + w_B[:, :, None] * img_B_warped).clip(0, 1)   # shape (H, W, 3)

fig, axes = plt.subplots(1, 3, figsize=(13, 4))
axes[0].imshow(img_A, origin='upper'); axes[0].set_title('Image A', fontsize=10)
axes[1].imshow(img_B, origin='upper'); axes[1].set_title('Image B', fontsize=10)
axes[2].imshow(mosaic, origin='upper'); axes[2].set_title('Stitched mosaic', fontsize=10)
for ax in axes:
    ax.axis('off')
fig.suptitle('Affine image stitching via 3×3 homogeneous matrix', fontsize=11)
plt.tight_layout()
plt.show()


7.7.7 Step 6 — How Accurate Is the Warp?

The residual between stitched and reference images quantifies alignment quality.

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)
from scipy.ndimage import affine_transform

H, W = 200, 200
xx, yy = np.meshgrid(np.linspace(0, 1, W), np.linspace(0, 1, H))
base_r = (0.55 + 0.10 * yy).clip(0, 1)
base_g = (0.72 + 0.08 * xx).clip(0, 1)
base_b = (0.40 + 0.05 * yy).clip(0, 1)
row_mask = ((np.arange(H) % 20) < 4)[:, None]
row_mask = np.broadcast_to(row_mask, (H, W))
img_A = np.stack([
    np.where(row_mask, base_r * 0.5, base_r),
    np.where(row_mask, base_g * 0.7, base_g),
    np.where(row_mask, base_b * 0.5, base_b)], axis=2)   # shape (H, W, 3)

theta = np.deg2rad(8.)
c, s = np.cos(theta), np.sin(theta)

# Warp img_A forward (make img_B)
R_nd = np.array([[c, s], [-s, c]])
t_nd = np.array([-20.*c - 40.*s, 20.*s - 40.*c])
img_B = np.stack([affine_transform(img_A[:,:,ch], R_nd, offset=t_nd,
                                   output_shape=(H,W), mode='constant', cval=0.)
                  for ch in range(3)], axis=2)   # shape (H, W, 3)

# Warp img_B back to A frame
R_inv = np.array([[c, -s], [s, c]])
t_inv = -R_inv @ np.array([20., 40.])
img_B_back = np.stack([affine_transform(img_B[:,:,ch], R_inv, offset=t_inv,
                                        output_shape=(H,W), mode='constant', cval=0.)
                       for ch in range(3)], axis=2)   # shape (H, W, 3)

# Only compare pixels where both have signal (interior of both views)
valid = (img_A.max(axis=2) > 0.05) & (img_B_back.max(axis=2) > 0.05)
residual = np.abs(img_A - img_B_back).mean(axis=2)   # shape (H, W)
residual_valid = residual[valid]

fig, axes = plt.subplots(1, 2, figsize=(10, 4))
im = axes[0].imshow(residual, origin='upper', cmap='hot', vmin=0, vmax=0.1)
axes[0].set_title('Pixel residual |A − warped B|', fontsize=10)
axes[0].axis('off')
plt.colorbar(im, ax=axes[0], fraction=0.046, pad=0.04)

axes[1].hist(residual_valid, bins=40, color='#4a90d9', edgecolor='white', lw=0.5)
axes[1].axvline(residual_valid.mean(), color='tomato', lw=2,
                label=f'Mean = {residual_valid.mean():.4f}')
axes[1].set_xlabel('Pixel residual'); axes[1].set_ylabel('Count')
axes[1].set_title('Distribution of alignment error', fontsize=10)
axes[1].legend(fontsize=9)
axes[1].grid(True, alpha=0.2)
axes[1].axhline(0, color='#333333', lw=0.4, alpha=0.5)

fig.suptitle('Warp quality: near-zero residual confirms exact affine alignment', fontsize=11)
plt.tight_layout()
plt.show()


7.7.8 What Breaks the Affine Model

Affine stitching fails when:

Violation Effect Fix
Lens distortion Barrel/pincushion warp Pre-calibrate and undistort
Scene is not planar Parallax between foreground/background Use homography (projective) or depth-aware stitching
Large rotation (>15°) Few inlier matches RANSAC robust estimation
Motion blur / rolling shutter Non-rigid deformation Dense optical flow

The \(3 \times 3\) affine model is the right starting point for flat scenes and small camera motions — exactly the case in drone agricultural surveys.


7.7.9 Kaggle Exploration

Dataset: Aerial Image Dataset — orthophoto tiles from drone surveys.

# Assumes: two overlapping tiles already loaded as img_A, img_B (H x W x 3 arrays)
import numpy as np
from skimage.feature import ORB, match_descriptors
from skimage.color import rgb2gray
from skimage.measure import ransac
from skimage.transform import AffineTransform

# 1. Detect ORB keypoints and descriptors
detector = ORB(n_keypoints=200)

detector.detect_and_extract(rgb2gray(img_A))
kp_A, desc_A = detector.keypoints, detector.descriptors

detector.detect_and_extract(rgb2gray(img_B))
kp_B, desc_B = detector.keypoints, detector.descriptors

# 2. Match descriptors
matches = match_descriptors(desc_A, desc_B, cross_check=True)

# 3. Robust affine estimation via RANSAC
pts_A_matched = kp_A[matches[:, 0]]   # shape (m, 2) — row, col
pts_B_matched = kp_B[matches[:, 1]]   # shape (m, 2)

model, inliers = ransac(
    (pts_A_matched, pts_B_matched),
    AffineTransform, min_samples=3,
    residual_threshold=2., max_trials=200
)

print(f"Inliers: {inliers.sum()} / {len(inliers)}")
print("Estimated transform matrix:")
print(model.params.round(4))   # shape (3, 3) — the 3x3 affine matrix

# 4. Warp and stitch using skimage.transform.warp
from skimage.transform import warp
img_B_aligned = warp(img_B, model.inverse)   # shape (H, W, 3)

This pipeline — ORB features, descriptor matching, RANSAC, affine warp — is the backbone of every commercial drone orthomosaic tool. The math is entirely Chapter 7.

7.8 Exercises and Solutions

7.8.1 Exercises

Exercise 7.1 (Translation breaks linearity) The translation map \(f(\mathbf{x}) = \mathbf{x} + \mathbf{t}\) with \(\mathbf{t} \neq \mathbf{0}\) fails all three axioms of linearity. (a) Show that \(f(\mathbf{0}) \neq \mathbf{0}\), violating the fixed-origin axiom. (b) Show that \(f(\alpha \mathbf{x}) \neq \alpha f(\mathbf{x})\) for general \(\alpha\), violating scaling. (c) Is there any non-zero scalar \(\alpha\) for which \(f(\alpha \mathbf{x}) = \alpha f(\mathbf{x})\)?


Exercise 7.2 (Composing affine maps) Let \(f(\mathbf{x}) = A_1 \mathbf{x} + \mathbf{t}_1\) and \(g(\mathbf{x}) = A_2 \mathbf{x} + \mathbf{t}_2\). (a) Write a closed-form expression for \(g(f(\mathbf{x}))\) as a single affine map \(\mathbf{y} = A \mathbf{x} + \mathbf{t}\). (b) Express \(A\) and \(\mathbf{t}\) in terms of \(A_1, A_2, \mathbf{t}_1, \mathbf{t}_2\). (c) Using homogeneous \(3 \times 3\) matrices \(M_1\) and \(M_2\), verify that \((M_2 M_1)\tilde{\mathbf{x}} = \widetilde{g(f(\mathbf{x}))}\).


Exercise 7.3 (Homogeneous coordinate distinction) Consider the \(3 \times 3\) matrix \(M = T \cdot R\) where \(T\) is a translation and \(R\) is a rotation. (a) Apply \(M\) to the point \(\tilde{\mathbf{p}} = [1, 2, 1]^T\). What happens to the translation? (b) Apply \(M\) to the vector \(\tilde{\mathbf{v}} = [1, 0, 0]^T\). What happens to the translation? (c) A graphics programmer stores a surface normal as \([n_x, n_y, 1]^T\) (accidentally using \(w=1\) instead of \(w=0\)). Describe the bug that results when this normal is transformed by a camera pose matrix.


Exercise 7.4 (Order matters) Let \(T = T_{2d}(3, 0)\) (translate right 3) and \(R = R_{2d}(\pi/2)\) (rotate 90° CCW). (a) Compute \(TR\) (rotate first, then translate) and apply it to \([1, 0, 1]^T\). (b) Compute \(RT\) (translate first, then rotate) and apply it to \([1, 0, 1]^T\). (c) Verify your answers match the geometric interpretation: where does the point \((1, 0)\) land in each case?


Exercise 7.5 (Efficient pose inverse) The \(4 \times 4\) pose matrix is

\[T = \begin{bmatrix}R & \mathbf{t} \\ \mathbf{0}^T & 1\end{bmatrix}\]

where \(R\) is a rotation matrix (\(R^T R = I\), \(\det R = +1\)). (a) Verify by direct block multiplication that

\[T^{-1} = \begin{bmatrix}R^T & -R^T\mathbf{t} \\ \mathbf{0}^T & 1\end{bmatrix}\]

  1. Count the floating-point multiplications for the efficient formula versus np.linalg.inv (which uses LU decomposition in \(O(n^3)\) operations). For a \(4 \times 4\) pose matrix, how many multiplications does each approach require?
  2. A robotics system running at 1 kHz calls this inverse 10,000 times per second. Why does the efficient formula matter?

Exercise 7.6 (Affine map from three points) An affine map in 2D is uniquely determined by the images of three non-collinear points. Given: \(A\) maps \((0,0) \mapsto (1,1)\), \((1,0) \mapsto (2.5, 1.3)\), \((0,1) \mapsto (1.2, 2.4)\). (a) Write the \(3 \times 6\) linear system that determines the 6 parameters of the affine map. (b) Solve it (by hand or code) and write the corresponding \(3 \times 3\) homogeneous matrix. (c) Verify your matrix maps the three source points to the three target points.


Exercise 7.7 (Rotation about an arbitrary 3D point) In 3D, rotating about a point \(\mathbf{c} = [c_x, c_y, c_z]^T\) by angle \(\theta\) about the \(z\)-axis follows the same translate-rotate-untranslate pattern as in 2D. (a) Write the sequence of \(4 \times 4\) matrices whose product gives the rotation about \(\mathbf{c}\). (b) Implement Rc_3d(cx, cy, cz, theta) using only T3d and Rz3d. (c) Verify: the point \(\mathbf{c}\) itself is a fixed point of the transform.


Exercise 7.8 (Forward kinematics by hand) A 2-link planar robot arm has link lengths \(L_1 = 1\) and \(L_2 = 0.5\). Both joints rotate about the \(z\)-axis.

The end-effector pose in world frame is: \[T_\text{ee} = R_z(\theta_1) \cdot T(L_1, 0, 0) \cdot R_z(\theta_2) \cdot T(L_2, 0, 0)\]

  1. For \(\theta_1 = \pi/3\) (60°) and \(\theta_2 = -\pi/6\) (-30°), compute \(T_\text{ee}\) numerically.
  2. Multiply \(T_\text{ee} \cdot [0, 0, 0, 1]^T\) to find the end-effector position.
  3. Verify geometrically: draw the arm and check that the result makes sense.

7.8.2 Solutions

Solution 7.1

import numpy as np

t = np.array([2., 3.])   # shape (2,)

# (a) f(0) = 0 + t != 0
print("(a) f(0) =", np.zeros(2) + t, "≠ [0, 0]")

# (b) f(alpha*x) vs alpha*f(x)
alpha = 3.
x = np.array([1., 1.])   # shape (2,)
lhs = alpha * x + t                        # f(alpha*x) = alpha*x + t
rhs = alpha * (x + t)                      # alpha*f(x) = alpha*(x + t)
print(f"(b) f(alpha*x) = {lhs}  vs  alpha*f(x) = {rhs}  Equal: {np.allclose(lhs, rhs)}")

# (c) f(alpha*x) = alpha*f(x)  =>  alpha*x + t = alpha*(x+t) = alpha*x + alpha*t  =>  t = alpha*t  =>  alpha=1
# Only alpha=1 works (trivially). For alpha!=1, scaling fails.
print("(c) f(alpha*x) = alpha*f(x)  iff  t = alpha*t  iff  alpha = 1")
print("    So no non-trivial scalar preserves scaling — translation is never linear.")
(a) f(0) = [2. 3.] ≠ [0, 0]
(b) f(alpha*x) = [5. 6.]  vs  alpha*f(x) = [ 9. 12.]  Equal: False
(c) f(alpha*x) = alpha*f(x)  iff  t = alpha*t  iff  alpha = 1
    So no non-trivial scalar preserves scaling — translation is never linear.

Solution 7.2

import numpy as np

# (a)–(b): g(f(x)) = A2(A1 x + t1) + t2 = (A2 A1) x + (A2 t1 + t2)
# So  A = A2 @ A1,  t = A2 @ t1 + t2

A1 = np.array([[2., 0.], [0., 1.]])   # shape (2, 2)
A2 = np.array([[1., 1.], [0., 1.]])   # shape (2, 2)
t1 = np.array([1., 0.])               # shape (2,)
t2 = np.array([0., 2.])               # shape (2,)

A  = A2 @ A1          # shape (2, 2)
t  = A2 @ t1 + t2     # shape (2,)
print(f"(a)–(b) g∘f:  A = {A}  t = {t}")

# (c) Verify via 3×3 homogeneous matrices
def to_hom33(A, t):
    M = np.eye(3)       # shape (3, 3)
    M[:2,:2] = A
    M[:2, 2] = t
    return M

M1 = to_hom33(A1, t1)   # shape (3, 3)
M2 = to_hom33(A2, t2)   # shape (3, 3)
M  = M2 @ M1             # shape (3, 3)
M_expected = to_hom33(A, t)   # shape (3, 3)
print(f"(c) M2 @ M1 matches closed form: {np.allclose(M, M_expected)}")
print("M2 @ M1 ="); print(M.round(4))
(a)–(b) g∘f:  A = [[2. 1.]
 [0. 1.]]  t = [1. 2.]
(c) M2 @ M1 matches closed form: True
M2 @ M1 =
[[2. 1. 1.]
 [0. 1. 2.]
 [0. 0. 1.]]

Solution 7.3

import numpy as np

def T2d(tx, ty):
    return np.array([[1.,0.,tx],[0.,1.,ty],[0.,0.,1.]])

def R2d(theta):
    c, s = np.cos(theta), np.sin(theta)
    return np.array([[c,-s,0.],[s,c,0.],[0.,0.,1.]])

M = T2d(5., 3.) @ R2d(np.pi/4)   # shape (3, 3)

p = np.array([1., 2., 1.])   # point — w=1
v = np.array([1., 0., 0.])   # vector — w=0

print("(a) M @ point:", (M @ p).round(4), "— translation IS applied (w=1)")
print("(b) M @ vector:", (M @ v).round(4), "— only rotation, translation NOT applied (w=0)")

# (c) Bug: normal stored with w=1 gets spuriously translated
# Surface normal should transform as n_out = R^T n (for rigid body)
# If someone uses w=1, the 5-unit translation corrupts the normal direction
bug_normal = np.array([0., 1., 1.])   # w=1 — should be w=0!
print("(c) Normal with w=1 (buggy):", (M @ bug_normal).round(4),
      "— translation added, normal no longer unit-length or correct direction")
correct_normal = np.array([0., 1., 0.])   # w=0 — correct
print("    Normal with w=0 (correct):", (M @ correct_normal).round(4))
(a) M @ point: [4.2929 5.1213 1.    ] — translation IS applied (w=1)
(b) M @ vector: [0.7071 0.7071 0.    ] — only rotation, translation NOT applied (w=0)
(c) Normal with w=1 (buggy): [4.2929 3.7071 1.    ] — translation added, normal no longer unit-length or correct direction
    Normal with w=0 (correct): [-0.7071  0.7071  0.    ]

Solution 7.4

import numpy as np

def T2d(tx, ty):
    return np.array([[1.,0.,tx],[0.,1.,ty],[0.,0.,1.]])

def R2d(theta):
    c, s = np.cos(theta), np.sin(theta)
    return np.array([[c,-s,0.],[s,c,0.],[0.,0.,1.]])

T  = T2d(3., 0.)       # shape (3, 3)
R  = R2d(np.pi / 2)    # shape (3, 3)
pt = np.array([1., 0., 1.])   # shape (3,)

# (a) TR = T @ R: rotate first, then translate
TR = T @ R   # shape (3, 3)
print("(a) TR @ [1,0,1]:", (TR @ pt).round(4), "→ point: (3, 1)")

# (b) RT = R @ T: translate first, then rotate
RT = R @ T   # shape (3, 3)
print("(b) RT @ [1,0,1]:", (RT @ pt).round(4), "→ point: (0, 4)")

# (c) Geometric verification
# (a): rotate (1,0) -> (0,1), then translate right 3 -> (3, 1) ✓
# (b): translate (1,0)+3 -> (4,0), then rotate 90° CCW -> (0, 4) ✓
print("\n(c) Geometric check:")
print("    Rotate (1,0)→(0,1), then translate right 3 → (3, 1) ✓")
print("    Translate (1,0)→(4,0), then rotate 90°CCW → (0, 4) ✓")
(a) TR @ [1,0,1]: [3. 1. 1.] → point: (3, 1)
(b) RT @ [1,0,1]: [0. 4. 1.] → point: (0, 4)

(c) Geometric check:
    Rotate (1,0)→(0,1), then translate right 3 → (3, 1) ✓
    Translate (1,0)→(4,0), then rotate 90°CCW → (0, 4) ✓

Solution 7.5

import numpy as np

# (a) Verify by block multiplication
# T @ T_inv = [R | t] [R^T | -R^T t] = [R R^T | R(-R^T t) + t]
#             [0 | 1] [0   | 1      ]   [0     | 1             ]
#                                       = [I | -t + t] = [I | 0]
#                                         [0 | 1     ]   [0 | 1]  ✓

def Rz3d(gamma):
    c, s = np.cos(gamma), np.sin(gamma)
    return np.array([[c,-s,0.,0.],[s,c,0.,0.],[0.,0.,1.,0.],[0.,0.,0.,1.]])

def T3d(tx, ty, tz):
    M = np.eye(4); M[:3,3]=[tx,ty,tz]; return M

def pose_inverse(T):
    T_inv = np.eye(4)     # shape (4, 4)
    R = T[:3, :3]         # shape (3, 3)
    t = T[:3,  3]         # shape (3,)
    T_inv[:3, :3] = R.T
    T_inv[:3,  3] = -R.T @ t
    return T_inv

T = T3d(2., -1., 3.) @ Rz3d(np.pi/5)   # shape (4, 4)
T_inv = pose_inverse(T)
print("(a) T @ T_inv = I?", np.allclose(T @ T_inv, np.eye(4)))

# (b) Operation count
# Efficient: R.T @ t = 9 multiplications + 6 additions; copy R^T = 9 assignments
# LU inversion (4x4): ~(4^3)/3 ≈ 21 multiplications for factorisation alone,
# then forward/back substitution ~2*(4^2) = 32 → ~50 total vs ~15 for pose_inverse
print("(b) Efficient: ~15 multiplications;  LU inv (4×4): ~50 multiplications")

# (c) 1 kHz × 10k calls/s — cumulative savings matter; also: no numerical instability
print("(c) 10,000 × 3× speedup = significant on embedded/real-time hardware;")
print("    pose_inverse also has zero numerical error for exact rotation matrices")
(a) T @ T_inv = I? True
(b) Efficient: ~15 multiplications;  LU inv (4×4): ~50 multiplications
(c) 10,000 × 3× speedup = significant on embedded/real-time hardware;
    pose_inverse also has zero numerical error for exact rotation matrices

Solution 7.6

import numpy as np

# Source points
src = np.array([[0., 0.], [1., 0.], [0., 1.]])   # shape (3, 2)

# Target points
dst = np.array([[1., 1.], [2.5, 1.3], [1.2, 2.4]])   # shape (3, 2)

# (a) Build 6×6 system: 3 points × 2 equations each
n = len(src)
C = np.zeros((2*n, 6))   # shape (6, 6)
b = np.zeros(2*n)         # shape (6,)

for i, (ps, pd) in enumerate(zip(src, dst)):
    x, y   = ps
    xp, yp = pd
    C[2*i,   :] = [x, y, 1., 0., 0., 0.]
    b[2*i]      = xp
    C[2*i+1, :] = [0., 0., 0., x, y, 1.]
    b[2*i+1]    = yp

# (b) Solve the square 6×6 system
params = np.linalg.solve(C, b)   # shape (6,)
M = np.array([[params[0], params[1], params[2]],
              [params[3], params[4], params[5]],
              [0.,        0.,        1.       ]])   # shape (3, 3)
print("(b) Homogeneous affine matrix:")
print(M.round(4))

# (c) Verify
for ps, pd in zip(src, dst):
    ph = np.array([ps[0], ps[1], 1.])   # shape (3,)
    out = M @ ph                          # shape (3,)
    print(f"  {ps}{out[:2].round(4)}  expected {pd}  "
          f"match: {np.allclose(out[:2], pd)}")
(b) Homogeneous affine matrix:
[[1.5 0.2 1. ]
 [0.3 1.4 1. ]
 [0.  0.  1. ]]
  [0. 0.] → [1. 1.]  expected [1. 1.]  match: True
  [1. 0.] → [2.5 1.3]  expected [2.5 1.3]  match: True
  [0. 1.] → [1.2 2.4]  expected [1.2 2.4]  match: True

Solution 7.7

import numpy as np

def T3d(tx, ty, tz):
    M = np.eye(4); M[:3,3]=[tx,ty,tz]; return M

def Rz3d(gamma):
    c,s=np.cos(gamma),np.sin(gamma)
    return np.array([[c,-s,0.,0.],[s,c,0.,0.],[0.,0.,1.,0.],[0.,0.,0.,1.]])

# (a) Rotation about point c about z-axis:
#   1. Translate c to origin: T(-cx, -cy, -cz)
#   2. Rotate:                Rz(theta)
#   3. Translate back:        T(cx, cy, cz)
# Full: T(c) @ Rz(theta) @ T(-c)

# (b) Implementation
def Rc_3d(cx, cy, cz, theta):
    return T3d(cx, cy, cz) @ Rz3d(theta) @ T3d(-cx, -cy, -cz)   # shape (4, 4)

cx, cy, cz = 2., 3., 0.
theta = np.pi / 3
M = Rc_3d(cx, cy, cz, theta)
print("(b) Rotation-about-point matrix:")
print(M.round(4))

# (c) Verify c is a fixed point
c_h = np.array([cx, cy, cz, 1.])   # shape (4,)
c_out = M @ c_h                     # shape (4,)
print(f"\n(c) M @ c = {c_out[:3].round(6)}  (should equal c = [{cx},{cy},{cz}])")
print(f"    Fixed point: {np.allclose(c_out[:3], [cx, cy, cz])}")

# Extra: a point away from c DOES move
p_h = np.array([cx + 1., cy, cz, 1.])   # shape (4,)
print(f"\n    Point c+(1,0,0): {(M @ p_h)[:3].round(4)}  (not a fixed point)")
(b) Rotation-about-point matrix:
[[ 0.5    -0.866   0.      3.5981]
 [ 0.866   0.5     0.     -0.2321]
 [ 0.      0.      1.      0.    ]
 [ 0.      0.      0.      1.    ]]

(c) M @ c = [2. 3. 0.]  (should equal c = [2.0,3.0,0.0])
    Fixed point: True

    Point c+(1,0,0): [2.5   3.866 0.   ]  (not a fixed point)

Solution 7.8

import numpy as np
import matplotlib.pyplot as plt

def T3d(tx, ty, tz):
    M = np.eye(4); M[:3,3]=[tx,ty,tz]; return M

def Rz3d(gamma):
    c,s=np.cos(gamma),np.sin(gamma)
    return np.array([[c,-s,0.,0.],[s,c,0.,0.],[0.,0.,1.,0.],[0.,0.,0.,1.]])

L1, L2 = 1.0, 0.5
t1 = np.pi/3    # 60°
t2 = -np.pi/6   # -30°

# (a) Compute T_ee
T_ee = Rz3d(t1) @ T3d(L1, 0., 0.) @ Rz3d(t2) @ T3d(L2, 0., 0.)   # shape (4, 4)
print("(a) T_ee:")
print(T_ee.round(4))

# (b) End-effector position
origin_h = np.array([0., 0., 0., 1.])   # shape (4,)
ee_pos = T_ee @ origin_h                  # shape (4,)
print(f"\n(b) End-effector position: {ee_pos[:3].round(4)}")

# Analytical check:
# Joint 1 at origin, pointing at angle t1=60°
# Link 1 reaches: (L1*cos(t1), L1*sin(t1)) = (0.5, 0.866)
# Link 2 at angle t1+t2 = 30°: adds (L2*cos(30°), L2*sin(30°)) = (0.433, 0.25)
j1 = np.array([L1*np.cos(t1), L1*np.sin(t1)])
ee = j1 + np.array([L2*np.cos(t1+t2), L2*np.sin(t1+t2)])
print(f"    Analytical check (2D): {np.round(ee, 4)}")
print(f"    Match: {np.allclose(ee_pos[:2], ee)}")

# (c) Plot
fig, ax = plt.subplots(figsize=(5, 5))
base = np.array([0., 0.])
j1_pos = np.array([L1*np.cos(t1), L1*np.sin(t1)])
ee_2d  = ee

ax.plot([0, j1_pos[0]], [0, j1_pos[1]], '-o',
        color='#4a90d9', lw=3, ms=8, label=f'Link 1 (L={L1})')
ax.plot([j1_pos[0], ee_2d[0]], [j1_pos[1], ee_2d[1]], '-o',
        color='tomato', lw=3, ms=8, label=f'Link 2 (L={L2})')
ax.scatter(*base, color='black', s=100, zorder=6, label='Base (origin)')
ax.scatter(*ee_2d, color='#2ecc71', s=100, zorder=6, label=f'EE {ee_2d.round(3)}')
ax.set_xlim(-0.3, 1.4); ax.set_ylim(-0.3, 1.4)
ax.set_aspect('equal')
ax.axhline(0, color='#333333', lw=0.4, alpha=0.5)
ax.axvline(0, color='#333333', lw=0.4, alpha=0.5)
ax.grid(True, alpha=0.2)
ax.set_xlabel('x'); ax.set_ylabel('y')
ax.set_title(r'2-link arm: $\theta_1=60°$, $\theta_2=-30°$', fontsize=10)
ax.legend(fontsize=8)
plt.tight_layout()
plt.show()
(a) T_ee:
[[ 0.866 -0.5    0.     0.933]
 [ 0.5    0.866  0.     1.116]
 [ 0.     0.     1.     0.   ]
 [ 0.     0.     0.     1.   ]]

(b) End-effector position: [0.933 1.116 0.   ]
    Analytical check (2D): [0.933 1.116]
    Match: True