Mathematical induction, a practical guide

I've recently been teaching mathematical induction to second-year university students of informatics. I'd like to summarize some of the background around induction proofs, as the subject slightly more complicated than a single class warrants.

Preliminaries

Predicate

Let $A$, be a set. A predicate on $A$ is a subset of $A$.

The difference between a predicate and a subset lies in the way we talk about them. We say a predicate $P$ holds for an element $a$ of $A$, if $a$ is an element of $P$. We write '$P$ holds for $a$' as $P(a)$.

For example, ${ x \in \mathbb{N} \mid x < 5}$ ($= {0, 1, 2, 3, 4}$), is a predicate on $\mathbb{N}$. This predicate can be written in terms of a single logical expression involving a 'parameter' from $\mathbb{N}$ as follows: $$ P(n) = n < 5 $$ We can therefore abbreviate it as $\overset{<5}{\triangleleft}{A}$, $\lambda x \rightarrow x < 5$ or simply $(< 5)$ using some notational abuse.

Binary Relation

We will restrict our attention to internal binary relations:

Let $A$ be a set. A binary relation on $A$ is a subset of $A \times A$.

The difference between a binary relation and a subset of $A \times A$ again lies only in the way we talk about them. If $R$ is the symbol for a given binary relation, we abbreviate $(a, b) \in R$ to $a R b$.

For example, 'divides' ($\mid$) is a binary relation on $\mathbb{N}$ that contains the tuples $(a, b)$ for which $a \mid b$ holds.

Reflexivity

We call a relation $R$ on a set $A$ reflexive if the following property holds: $$\forall a \in A: (a, a) \in R$$

Well-founded relation

We will need well-founded relations in well-founded induction so let's define them up front.

Let $A$ be a set and $\prec$ a binary relation on $A$. We call $\prec$ well-founded if there don't exist any so-called 'infinitely descending chains' in $A$. In other words, the following property holds for $\prec$. $$ \neg \left( \exists \left{ a_i \in A \mid i \in \mathbb{N} \right }:\ \forall i \in \mathbb{N}:\ a_{i+1} \prec a_i \right) $$

Exercise: Prove that a non-empty well-founded relation cannot be reflexive.

Otherwise we can take an arbitrary element $a$ and build the infinite descending chain of only $a$'s.

Note that this means that a non-empty well-founded relation can never be an order as orders are defined to be reflexive.

Minimal element

A minimal element is usually defined for partial orders, but we can define it for a well-founded relation as well:

A minimal element $m$ of a set $A$ according to a well-founded relation $\prec$ is an element as follows: $$ \neg \exists a \in A:\ a \prec m $$

Note that if we were to define the minimal element of a partial order in the same way, we would have to add the requirement that $a$ should be different from $m$ in the defining formula, otherwise there could never be any minimal elements for a partial order because partial orders are reflexive by definition.

Well-founded induction

The must fundamental kind of induction that I teach is well-founded induction.

Let $A$ be a set and $\prec$ a well-founded relation on that set $A$. Let $P$ be a predicate on the set $A$.

We now have an inference rule as follows:

$$ \left( \forall a \in A:
\left( \forall b \in A:\ b \prec a \Rightarrow P(b) \right) \Rightarrow P(a) \right) \Rightarrow \left( \forall a \in A:
P(a) \right) $$

An intuitive explanation

An intuitive explanation for why this inference rule makes sense can be stated as follows. (Of course this could never be the real proof because that would make for a cyclic argument.)

We assume the left-hand side of the implication ($\Rightarrow$) and prove the right-hand side.

Let $M_0$ be the subset of $A$ that contains only the minimal elements of $A$. In other words, let $M_0$ be the following subset of $A$: $$ M_0 = \left{ m \in A \mid \neg \left(\exists n \in A:\ n \prec m \right) \right} $$

Note that for all the elements of this set, $P$ must hold according to the our assumption.

We now define the subset $M_i$ of $A$ for an $i \in \mathbb{N}0$ as follows: $$ M_i = \left{ m \in A \mid \forall n \in A:\ n \prec m \Rightarrow n \in M{i-1} \right} $$ Intuitively this is the next set of elements for which we prove that $P$ holds. We use our assumpution again to prove that $P$ holds for all elements in $M_{i}$ because it holds for all elements in $M_{i-1}$. If we go on like this, eventually we will have shown that $P$ holds for all elements of $A$.

Why a well-founded relation

Note that if $\prec$ would not be well-founded, then by definition there must exist an infinite descending chain of elements in $A$.

If there exists an infinite descending chain of elements in $A$, then there need not exist any minimal elements. (Allthough the non-existence of minimal elements is not a necessary requirement for there to be a problem, this is the simplest way of showing that at least well-foundedness is necessary for induction.) If there aren't any minimal elements in $A$ then there is no way to start the recursive argument that we build in the previous section.

Suppose, for example, that we would use a reflexive relation, then $\left(\forall b \in A:\ b \prec a \Rightarrow P(b)\right) \Rightarrow P(a)$ is automatically true for all elements $a$ in $A$ and all predicates $P$ on $A$ while $\forall a \in A: P(a)$ may not be true.

Structural Induction

Structural definition is a special case of well-founded induction where the set $A$ is a set of terms in some recursively defined language of terms. For example, $A$ could be the set of trees in Haskell:

data Tree
  = Leaf
  | Branch Tree Tree

The well-founded relation that is then used is called a subterm relation $\sqsubset$. It is usually defined as the transitive closure (smallest relation that includes the original but is also transitive) of the obvious well-founded relation that defines any term to be a subterm in another term if it is part of the recursive definition of that term.

For example, in the case of Trees, that would mean that $\sqsubset$ is defined as the transitive closure of the relation recursively defined as follows:

$\forall$ t1, t2 :: Tree : t1 $\sqsubset$ Branch t1 t2 and t2 $\sqsubset$ Branch t1 t2

Because structural induction is just a special case of well-founded induction, we can use the same inference rule.

Exercise: Define a subterm relation on Haskell's list structure:

A possible subterm relation is the transitive closure of the relation recusively defined as follows: $\forall$ e :: a, l :: [a] : l $\sqsubset$ (a:l)

Exercise: Find the subterms of [1,2,3]. Is [1,2] a subterm?

The subterms are [2,3], [3] and []. [1, 2] is *not* a subterm as per this definition.

A note on structural induction on data in Haskell

Inductively defined terms in a language are usually defined to be finite in depth. As it would be undicidable in general to check whether a function produces only finite terms, and Haskell uses lazy evaluation anyway, Haskell's types naturally contain infinite terms as well.

However, for a subterm relation to be well-founded, $A$ must contain only terms of finite depth. Otherwise there trivially exist inifinite descending chains of subterms. As such we have to assume terms in depth when writing induction proofs.

Natural induction

The (strong) natural induction inference rule is just a special case of structural induction as per the von Neumann construction of natural numbers.

Alternatively you could say that it's a special case of well-founded induction by proving that $<$ on is a well-founded relation on $\mathbb{N}$.

How to write an induction proof when your grades depend on it

A proof is for humans. The languages we use for communication with other humans tend to have enough redundancy built into them such that even when words are omitted in speech, communication can still proceed safely. This implies that we tend to consider communication, as we would use it in speech, slightly redundant when we write it down. There is enough context, and words will not just dissapear in text, to make out what was meant.

In the same way, a mathematically inclined person will be able to figure out from the context what you meant with a proof. This comes in handy in literature, but can be confusing in class.

In class, the person grading your solution does not care as much for checkng whether the theorem holds (through the proof) as they do for checking whether you wrote down a correct proof (even if it's already clear that the theorem holds), understood the material and should be awarded a good grade as a result. The way you write down proofs should be different.

Concretely:

When in doubt, be more verbose.
Quantify all variables explicitly.
Specify which kind of induction you are trying to use. (General well-founded induction, strong structural induction, strong or weak natural induction)
Write down the set $A$ you want to inductively prove a predicate over.
Write down the predicate $P$ that you want to prove for all elements of that set.
Make sure that the 'argument' to the predicate is in fact in the set that your predicate is on. That is, $P$ is in fact a predicate on $A$.
Define the well-founded relation that you will be using and prove that it is in fact well-founded if it's not a commonly-known-to-be-well-founded relation like $<$.
If you make a case distinction, write down that you are trying to make a case distinction and prove that the union of the cases forms the universe that you would like to make the case distinguish in. For example, $x = 0$ and $x \neq 0$ are complementary but $x = 0$ and $x > 0$ may not be.
In a case distinction at the top-level, write down the predicate again for each specific case. For example, in the case of Haskell lists, write out what $P($[]$)$ means explicitly for the base case.
In the case of weak natural induction, write down the induction hypothesis explicitly if you refer to it.

A general schema for a structural induction proof

Let $Q$ be the description of some predicate on a set of terms $A$. This description will usually not be given a name separately. Suppose we are trying to prove $\forall a \in A: Q(a)$. The rest of this section is a general schema for a proof that uses strong structural induction. The blanks, indicated by the fact that they are written in code style, are the parts that you fill in for a concrete proof. The rest can remain virtually untouched.

Theorem: $\forall a \in A: Q(a)$

Proof:

Define $P$ to be a predicate on the set $A$ as follows:

[Definition of P]

We prove this theorem by strong structural induction of the predicate $P$ over the set $A$ using the following subterm relation $\sqsubset$.

[Definition of ⊏]

Note that $\sqsubset$ is in fact well-founded:

[Proof that ⊏ is well-founded]

Let $a$ be an arbitrary element of $A$. We distinguish the following cases:

(Base cases) $a$ is of the form minimal element of A. We prove P(a) in this specific case:

Proof that P(a) holds.

(Inductive case) $a$ is of the form construction from subterms b, c, ... in A for some subterms b, c, ... in $A$. We prove P(a) in this specific case from P(b) ∧ P(c) ∧ ... (which we call the induction hypothesis).

Proof that P(a) holds.

(Conclusion) By the inference rule of strong strucural induction and the cases numbered 1 and 2 above, we conclude that $P$ holds for all elements of $A$.

QED.