5 The Collatz Conjecture: Simple Rules, Deep Mystery

Pick any positive integer. If it is even, divide by 2. If it is odd, multiply by 3 and add 1. Repeat until you reach 1. That is the entire rule. You could explain it to a middle-schooler in thirty seconds, yet as of 2026 no one has proved it always works. Paul Erdős called the problem “absolutely hopeless” (Lagarias 1985), and yet it has a magnetic pull. Computing reveals its hidden structure far faster than pure thought can. Let us see what we find.

Veritasium
The Simplest Math Problem No One Can Solve
youtu.be/094y1Z2wpJg

Explains the Collatz conjecture, shows the wild trajectories of different starting numbers, and explores why mathematicians believe but cannot prove it always reaches 1.

5.1 A Rule Anyone Can Follow

The Collatz function is the map defined by

\[f(n) = \begin{cases} n / 2 & \text{if } n \text{ is even,} \\ 3n + 1 & \text{if } n \text{ is odd.} \end{cases}\]

Starting from a positive integer $n_0$, we form the Collatz sequence $n_0, f(n_0), f(f(n_0)), \ldots$ and keep applying $f$ until we reach 1.

Example with $n_0 = 5$. Since 5 is odd, $f(5) = 16$. Since 16 is even, $f(16) = 8$. Continuing: $8 \to 4 \to 2 \to 1$. The full sequence is $\{5, 16, 8, 4, 2, 1\}$, reaching 1 in five steps.

Example with $n_0 = 6$. $6 \to 3 \to 10 \to 5 \to 16 \to 8 \to 4 \to 2 \to 1$. Eight steps.

Example with $n_0 = 27$. The sequence rises as high as 9232 at step 77 before finally crashing down to 1 – after 111 steps. It is remarkable that a number as small as 27 can produce such a long journey.

The Collatz conjecture, first stated by Lothar Collatz in 1937 (Lagarias 1985), claims that for every positive integer $n_0$ the sequence eventually reaches 1. The conjecture has been verified computationally for every starting value below $3 \times 10^{20}$ (Barina 2021), yet no proof exists.

One family is trivially easy: if $n_0 = 2^k$, the rule halves it $k$ times to reach 1, with no odd steps along the way. Powers of 2 always terminate, and they terminate quickly. Everything else is a mystery.

5.2 Implementing the Rule as a Loop

The while loop introduced in Section 1.1 is a natural fit: keep applying the rule while $n \neq 1$, collecting each value. The % operator from Section 2.2 tells us whether $n$ is even or odd. The // operator performs integer (floor) division, discarding any remainder: 7 // 2 is 3, not 3.5.

def collatz_sequence(n):
    seq = [n]
    while n != 1:
        n = n // 2 if n % 2 == 0 else 3 * n + 1
        seq.append(n)
    return seq

for start in [5, 6, 27]:
    seq = collatz_sequence(start)
    print(f"n={start:3d}: {len(seq)-1:4d} steps, "
          f"max={max(seq):6d}")

n=  5:    5 steps, max=    16
n=  6:    8 steps, max=    16
n= 27:  111 steps, max=  9232

Research Example: How Wild Is the Journey From 27 to 1?

The rule is completely deterministic — every step is forced. Does n = 27 travel quietly to 1, or does it take a spectacular detour first?

# uses: collatz_sequence()
import matplotlib.pyplot as plt

seq27 = collatz_sequence(27)
plt.style.use('default')
fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(seq27, lw=1, color='steelblue')
ax.axhline(1, color='gray', lw=0.7, ls='--')
ax.set_xlabel("Step")
ax.set_ylabel("Value")
ax.set_title("Collatz sequence starting from n = 27")
fig.tight_layout()
plt.show()

The sequence rockets to 9232 — over 340 times its starting value — at step 77, then crashes to 1 after 111 steps. Every step was forced by the previous one, yet the trajectory looks completely random. You just traced the journey that has captivated mathematicians for 90 years.

Research Example: Do All Paths Lead to 1?

Stack twenty different Collatz trajectories on a single chart — do they all converge, or do some escape upward?

import matplotlib.pyplot as plt

def collatz_sequence(n):
    seq = [n]
    while n != 1:
        n = n // 2 if n % 2 == 0 else 3 * n + 1
        seq.append(n)
    return seq

starts = [7, 11, 17, 19, 25, 27, 31, 37, 41, 43,
          47, 53, 59, 67, 73, 79, 83, 89, 97, 101]
cmap_f = plt.get_cmap('viridis', len(starts))
fig, ax = plt.subplots(figsize=(9, 5))
for i, s in enumerate(starts):
    seq = collatz_sequence(s)
    ax.plot(seq, lw=0.9, alpha=0.75, color=cmap_f(i))
ax.set_xlabel("Step")
ax.set_ylabel("Value (log scale)")
ax.set_yscale('log')
ax.set_title(
    "Twenty Collatz sequences: same rule, utterly different paths"
)
fig.tight_layout()
plt.show()

Figure 5.1: Twenty Collatz sequences overlaid — same rule, twenty utterly different paths, all eventually crashing into 1. Colors run from dark blue (small starts) to bright yellow (larger starts).

Every thread — no matter how high it climbs or how long it wanders — crashes into 1. Stacking twenty paths turns a conjecture into something you can almost feel. And you produced the whole picture in three lines per trajectory.

5.3 Stopping Times

The stopping time (or total stopping time) of $n$ is the number of steps needed to reach 1: the smallest $k$ such that $f^k(n) = 1$.

def stopping_time(n):
    count = 0
    while n != 1:
        n = n // 2 if n % 2 == 0 else 3 * n + 1
        count += 1
    return count

for n in [1, 2, 5, 6, 27]:
    print(f"stopping_time({n:3d}) = {stopping_time(n)}")

stopping_time(  1) = 0
stopping_time(  2) = 1
stopping_time(  5) = 5
stopping_time(  6) = 8
stopping_time( 27) = 111

Research Example: Which Numbers Take the Longest to Reach 1?

Plot every stopping time from n = 1 to 1,000 — do slow numbers cluster together, or spike up at random with no warning?

import matplotlib.pyplot as plt

def stopping_time(n):
    count = 0
    while n != 1:
        n = n // 2 if n % 2 == 0 else 3 * n + 1
        count += 1
    return count

N = 1000
ns = list(range(1, N + 1))
times = [stopping_time(n) for n in ns]
fig, ax = plt.subplots(figsize=(9, 4))
sc = ax.scatter(ns, times, s=2, c=times, cmap='viridis', alpha=0.8)
cb = fig.colorbar(sc, ax=ax, pad=0.01)
cb.set_label('Stopping time', fontsize=9)
ax.set_xlabel("Starting value $n$")
ax.set_ylabel("Stopping time")
ax.set_title(
    "Mystery in the stopping times — n = 1 to 1,000",
    fontsize=11
)
fig.tight_layout()
plt.show()

Figure 5.2: Stopping times for n = 1 to 1,000. Most numbers reach 1 in under 100 steps, but isolated spikes — n = 27 being the most dramatic at 111 steps — appear with no pattern.

The spike at n = 27 towers over its neighbors at stopping time 111 — more than four times the local average. No formula predicts it. That unpredictability is the Collatz mystery captured in one image, and you produced it in under ten lines.

Research Example: What Does the Full Spread of Journey Lengths Look Like?

Compute all 10,000 stopping times up to n = 10,000 and plot the distribution — is it symmetric, or skewed with a long tail of stubborn numbers?

import matplotlib.pyplot as plt

BLUE = '#1f77b4'
RED  = '#d62728'

def stopping_time(n):
    count = 0
    while n != 1:
        n = n // 2 if n % 2 == 0 else 3 * n + 1
        count += 1
    return count

all_times = [stopping_time(n) for n in range(1, 10_001)]
mean_t = sum(all_times) / len(all_times)
fig, ax = plt.subplots(figsize=(9, 4))
ax.hist(all_times, bins=60, color=BLUE, edgecolor='none', alpha=0.85)
ax.axvline(mean_t, color=RED, lw=1.5, ls='--',
           label=f'mean = {mean_t:.1f} steps')
ax.set_xlabel('Stopping time (steps to reach 1)')
ax.set_ylabel('Count')
ax.set_title('Stopping-time distribution, n = 1 to 10,000')
ax.legend(fontsize=9)
fig.tight_layout()
plt.show()

Figure 5.3: Stopping-time distribution for n = 1 to 10,000 — right-skewed with a heavy tail; the dashed line marks the mean.

Most numbers reach 1 in under 150 steps, but a stubborn right tail stretches well past 200 — and that heavy tail is where the mystery hides. You just computed 10,000 stopping times and plotted the full picture in a handful of lines.

5.4 Delay Records

A delay record is a starting value $n$ whose stopping time strictly exceeds the stopping time of every smaller starting value – a new all-time champion for length.

# uses: stopping_time()
records = []
best = -1
for n in range(1, 10_001):
    t = stopping_time(n)
    if t > best:
        best = t
        records.append((n, t))

print(f"{'n':>8}  {'stopping time':>14}")
for n, t in records[:20]:
    print(f"{n:8d}  {t:14d}")

       n   stopping time
       1               0
       2               1
       3               7
       6               8
       7              16
       9              19
      18              20
      25              23
      27             111
      54             112
      73             115
      97             118
     129             121
     171             124
     231             127
     313             130
     327             143
     649             144
     703             170
     871             178

The delay records grow slowly and irregularly. Notice that $n = 27$ sets a record (111 steps) that is not broken until $n = 54 = 2 \times 27$, which adds exactly one extra halving step at the beginning. Delay records where $n = 2m$ and $m$ is itself a record are called parachute numbers – they inherit their length from their half.

Research Example: How Rare Are Genuine Delay Records?

How does the record stopping time grow as n increases — in steady stair-steps, or in long flat stretches broken by rare dramatic leaps?

import matplotlib.pyplot as plt

def stopping_time(n):
    count = 0
    while n != 1:
        n = n // 2 if n % 2 == 0 else 3 * n + 1
        count += 1
    return count

records = []
best = -1
for n in range(1, 10_001):
    t = stopping_time(n)
    if t > best:
        best = t
        records.append((n, t))

rec_n = [r[0] for r in records]
rec_t = [r[1] for r in records]

fig, ax = plt.subplots(figsize=(8, 3))
ax.step(rec_n, rec_t, where='post',
        color='steelblue', lw=1.5)
ax.set_xlabel("Starting value n")
ax.set_ylabel("Stopping time")
ax.set_title(
    "Delay record stopping times (n up to 10,000)"
)
fig.tight_layout()
plt.show()

The staircase has long flat stretches — hundreds of numbers that all fall below the current record — then the bar jumps suddenly when a new champion appears. Records are genuinely rare, and the step function captures that rarity at a glance.

Research Example: How High Does a Sequence Soar Before Coming Home?

A related question: how high does a sequence soar before descending? The ratio $\text{alt}(n)/n$ measures it — most numbers barely rise, but a few explode to hundreds of times their starting value.

import matplotlib.pyplot as plt

def collatz_sequence(n):
    seq = [n]
    while n != 1:
        n = n // 2 if n % 2 == 0 else 3 * n + 1
        seq.append(n)
    return seq

N = 1000
alt_ratios = [max(collatz_sequence(n)) / n for n in range(1, N + 1)]
fig, ax = plt.subplots(figsize=(9, 4))
sc = ax.scatter(range(1, N + 1), alt_ratios, s=3, c=alt_ratios,
                cmap='viridis', alpha=0.8)
cb = fig.colorbar(sc, ax=ax, pad=0.01)
cb.set_label('alt(n) / n', fontsize=9)
ax.set_xlabel('Starting value n')
ax.set_ylabel('Maximum reached / starting value (log scale)')
ax.set_yscale('log')
ax.set_title(
    'Altitude ratio: how high does the sequence soar?'
)
fig.tight_layout()
plt.show()

Figure 5.4: Altitude ratio alt(n)/n for n = 1 to 1,000 — most sequences barely exceed their start, but isolated numbers rocket to 300× their launch value.

A handful of numbers rocket 300× above their starting value before crashing home — and you found them all in one scatter plot. The rule is always the same; the journey is never predictable.

5.5 The Collatz Graph

Instead of tracing a sequence forward, we can work backward from 1: which numbers flow into 1? Which flow into those? This reverse view builds the Collatz tree (Martin 2011).

Under the forward rule, any even number $2m$ maps to $m$. Going backward: every $m$ has $2m$ as a predecessor. Additionally, any odd number $q$ satisfying $3q + 1 = n$, meaning $q = (n-1)/3$, is a predecessor of $n$ – provided $(n-1)$ is divisible by 3 and $(n-1)/3$ is a positive odd integer.

def collatz_preds(n):
    """Return all k with collatz_step(k) == n."""
    result = [2 * n]
    if n > 1 and (n - 1) % 3 == 0:
        m = (n - 1) // 3
        if m >= 1 and m % 2 == 1:
            result.append(m)
    return result

# Who maps to 16?
print("Predecessors of 16:", collatz_preds(16))
for p in collatz_preds(16):
    step = p // 2 if p % 2 == 0 else 3 * p + 1
    print(f"  f({p}) = {step}")

Predecessors of 16: [32, 5]
  f(32) = 16
  f(5) = 16

We build the backward tree with a breadth-first search (BFS), expanding predecessors level by level. collections.deque is a double-ended queue that pops from the left in constant time, which makes BFS efficient.

# uses: collatz_preds()
from collections import deque

MAX_DEPTH = 12

children_map = {}
node_level = {1: 0}
level_nodes = {0: [1]}

queue = deque([(1, 0)])
while queue:
    n, d = queue.popleft()
    if d >= MAX_DEPTH:
        continue
    for p in collatz_preds(n):
        if p not in node_level:
            node_level[p] = d + 1
            children_map.setdefault(n, []).append(p)
            level_nodes.setdefault(d + 1, []).append(p)
            queue.append((p, d + 1))

total = sum(len(v) for v in level_nodes.values())
print(f"Total nodes in tree: {total}")
print(f"\n{'depth':>7}  {'nodes':>6}")
for d in range(MAX_DEPTH + 1):
    print(f"{d:7d}  {len(level_nodes[d]):6d}")

Total nodes in tree: 47

  depth   nodes
      0       1
      1       1
      2       1
      3       1
      4       1
      5       2
      6       2
      7       4
      8       4
      9       6
     10       6
     11       8
     12      10

Research Example: Does the Backward Collatz Tree Branch Fast or Slow?

The tree fans out as depth grows — each level contains more nodes than the last, hinting at exponential growth. Does the count per level double like a perfect binary tree, or does it grow more slowly?

from collections import deque
import matplotlib.pyplot as plt

BLUE = '#1f77b4'

def collatz_preds(n):
    result = [2 * n]
    if n > 1 and (n - 1) % 3 == 0:
        m = (n - 1) // 3
        if m >= 1 and m % 2 == 1:
            result.append(m)
    return result

MAX_DEPTH = 12
children_map = {}
node_level = {1: 0}
level_nodes = {0: [1]}
queue = deque([(1, 0)])
while queue:
    n, d = queue.popleft()
    if d >= MAX_DEPTH:
        continue
    for p in collatz_preds(n):
        if p not in node_level:
            node_level[p] = d + 1
            children_map.setdefault(n, []).append(p)
            level_nodes.setdefault(d + 1, []).append(p)
            queue.append((p, d + 1))

depths = list(range(MAX_DEPTH + 1))
counts = [len(level_nodes[d]) for d in depths]
fig, ax = plt.subplots(figsize=(9, 4))
bars = ax.bar(depths, counts, color=BLUE, alpha=0.85)
ax.set_xlabel('Depth (steps backward from 1)')
ax.set_ylabel('Nodes at this depth')
ax.set_title('Backward Collatz tree: nodes per depth level')
for bar, cnt in zip(bars, counts):
    ax.text(bar.get_x() + bar.get_width() / 2, cnt + 0.3, str(cnt),
            ha='center', va='bottom', fontsize=8)
fig.tight_layout()
plt.show()

Figure 5.5: Nodes per depth level in the backward Collatz tree rooted at 1. The tree branches roughly by a factor of 1.5 per level — approaching (but never reaching) the factor of 2 implied by the halving rule alone.

The count grows — but not by a clean factor of 2. Only some nodes gain an odd predecessor; the rest only double. That 1.5× branching rate is a fingerprint of the Collatz rule itself, and you measured it in four lines.

Research Example: What Does the Collatz Tree Actually Look Like?

Now we assign positions (depth along $x$, index within level along $y$) and draw the tree — what shape does it take when you can see every branch at once?

from collections import deque
import matplotlib.pyplot as plt

def collatz_preds(n):
    result = [2 * n]
    if n > 1 and (n - 1) % 3 == 0:
        m = (n - 1) // 3
        if m >= 1 and m % 2 == 1:
            result.append(m)
    return result

MAX_DEPTH = 12
children_map = {}
node_level = {1: 0}
level_nodes = {0: [1]}
queue = deque([(1, 0)])
while queue:
    n, d = queue.popleft()
    if d >= MAX_DEPTH:
        continue
    for p in collatz_preds(n):
        if p not in node_level:
            node_level[p] = d + 1
            children_map.setdefault(n, []).append(p)
            level_nodes.setdefault(d + 1, []).append(p)
            queue.append((p, d + 1))

pos = {}
for d, group in level_nodes.items():
    w = len(group)
    for i, node in enumerate(group):
        pos[node] = (d, i - (w - 1) / 2.0)

fig, ax = plt.subplots(figsize=(13, 5))
for parent, kids in children_map.items():
    x1, y1 = pos[parent]
    for kid in kids:
        x2, y2 = pos[kid]
        ax.plot([x1, x2], [y1, y2], '-',
                color='lightgray', lw=0.8, zorder=1)
for node, (x, y) in pos.items():
    ax.plot(x, y, 'o', ms=13, color='steelblue',
            zorder=2)
    ax.text(x, y, str(node), ha='center',
            va='center', fontsize=5,
            color='white', zorder=3)
ax.set_xlim(-0.5, MAX_DEPTH + 0.5)
ax.set_xlabel("Steps from 1 (backward)")
ax.set_title(
    "Collatz backward tree, depth 12 from root 1"
)
ax.set_yticks([])
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
fig.tight_layout()
plt.show()

Every branch fans left from 1, and every node carries a specific number — you built this entire picture with a breadth-first search and a handful of ax.plot calls. The Collatz conjecture says this tree, extended forever, contains every positive integer exactly once.

Every node in this tree eventually reaches 1 under the Collatz rule. If the conjecture is true, the complete tree – extended to infinite depth with no node-value cutoff – contains every positive integer exactly once. The branching pattern you see is real structure, not randomness: it encodes which numbers have the “bonus” odd predecessor $(n-1)/3$.

5.6 Why Is This So Hard?

Paul Erdős (1913–1996)
CC BY 3.0, Kmhkmh, via Wikimedia Commons

The Collatz conjecture has resisted proof for nearly 90 years despite massive computational evidence. Here is why.

The heuristic argument. Whenever $n$ is odd, $3n+1$ is always even (odd times 3 is odd; odd plus 1 is even). So every odd step is immediately followed by at least one halving step. Over two steps, an odd $n$ becomes approximately $3n/2$ – larger. But $(3n+1)/2$ is often even again, yielding a second halving: approximately $3n/4$ – smaller. The pattern “one tripling, two halvings” multiplies $n$ by $3 \times \frac{1}{2} \times \frac{1}{2} = \frac{3}{4}$. Since $3/4 < 1$, this heuristic cycle produces a net decrease.

We can measure this directly. The geometric mean of the step ratios $n_{i+1}/n_i$ tells us the typical factor per step:

import math

log_ratios = []
for n in range(2, 501):
    seq = collatz_sequence(n)
    for i in range(len(seq) - 1):
        log_ratios.append(math.log(seq[i + 1] / seq[i]))

geo_mean = math.exp(sum(log_ratios) / len(log_ratios))
print(f"Steps analyzed:       {len(log_ratios)}")
print(f"Geometric mean ratio: {geo_mean:.4f}")
print(f"(below 1 means sequences tend to shrink)")

Steps analyzed:       26143
Geometric mean ratio: 0.9049
(below 1 means sequences tend to shrink)

The geometric mean is less than 1, confirming that sequences trend downward on average. But “on average” is not “always,” and that gap is exactly where the difficulty lives.

The deep obstacle. The Collatz rule mixes multiplication (the $3n$ and $/2$ steps) with addition (the $+1$). These two worlds have separate algebraic structures, and their interaction is notoriously resistant to analysis. A proof would need to show that no number – not even one too large to test – can sustain a sequence that never reaches 1. Numbers that cycle without reaching 1, or that grow without bound, are theoretically possible but have never been found.

Terence Tao (b. 1975)
Public domain, The White House, via Wikimedia Commons

Tao’s 2019 result. In 2019, Terence Tao proved that for “almost all” starting values (in a precise probabilistic sense covering 100% of integers), the Collatz sequence eventually drops below any fixed bound you choose (Tao 2022). This is the strongest rigorous result to date. It stops just short of the full conjecture – it does not rule out a single rare counterexample hiding beyond the reach of any computer.

Numberphile
UNCRACKABLE? The Collatz Conjecture
youtu.be/5mFpVDpKX70

A mathematician explains why the Collatz problem has resisted every proof attempt and why even Terence Tao’s partial result left the conjecture wide open.

Alan Turing (1912–1954)
Public domain, Elliott & Fry, via Wikimedia Commons

The halting problem parallel. Turing showed in 1936 (Turing 1936) that no algorithm can decide, in general, whether an arbitrary computer program will ever stop running. The Collatz rule is one specific program, not an arbitrary one, so Turing’s theorem does not directly apply. But the comparison is instructive: a rule that iterates in an unpredictable, pseudo-random way resembles a general computation, and that is precisely why it resists the kind of clean inductive arguments that work for most problems in number theory.

5.7 Further Research Topics

The following topics are listed in order of increasing difficulty. Start by exploring computationally, form a conjecture from what you observe, then try to explain it.

1. Powers of 2. Compute stopping_time(2**k) for $k = 1, 2, \ldots, 20$. Observe that the answer is always $k$. Write a proof by induction: if $n = 2^k$, then every step is a halving (the odd rule never fires), so after exactly $k$ halvings we reach $2^0 = 1$.

(Adapted from Martin (2011).)

3. Even vs. odd starting values. Compute the average stopping time separately for even and odd starting values below 10000. Which group takes longer on average? Notice that every even $n = 2m$ satisfies $\text{stopping\_time}(2m) = \text{stopping\_time}(m) + 1$. Use this to explain the difference between the two distributions without computing a single stopping time from scratch.