I've recently been teaching mathematical induction to second-year university students of informatics. I'd like to summarize some of the background around induction proofs, as the subject slightly more complicated than a single class warrants.
Preliminaries
Predicate
Let $A$, be a set. A predicate on $A$ is a subset of $A$.
The difference between a predicate and a subset lies in the way we talk about them. We say a predicate $P$ holds for an element $a$ of $A$, if $a$ is an element of $P$. We write '$P$ holds for $a$' as $P(a)$.
For example, ${ x \in \mathbb{N} \mid x < 5}$ ($= {0, 1, 2, 3, 4}$), is a predicate on $\mathbb{N}$. This predicate can be written in terms of a single logical expression involving a 'parameter' from $\mathbb{N}$ as follows: $$ P(n) = n < 5 $$ We can therefore abbreviate it as $\overset{<5}{\triangleleft}{A}$, $\lambda x \rightarrow x < 5$ or simply $(< 5)$ using some notational abuse.
Binary Relation
We will restrict our attention to internal binary relations:
Let $A$ be a set. A binary relation on $A$ is a subset of $A \times A$.
The difference between a binary relation and a subset of $A \times A$ again lies only in the way we talk about them. If $R$ is the symbol for a given binary relation, we abbreviate $(a, b) \in R$ to $a R b$.
For example, 'divides' ($\mid$) is a binary relation on $\mathbb{N}$ that contains the tuples $(a, b)$ for which $a \mid b$ holds.
Reflexivity
We call a relation $R$ on a set $A$ reflexive if the following property holds: $$\forall a \in A: (a, a) \in R$$
Well-founded relation
We will need well-founded relations in well-founded induction so let's define them up front.
Let $A$ be a set and $\prec$ a binary relation on $A$. We call $\prec$ well-founded if there don't exist any so-called 'infinitely descending chains' in $A$. In other words, the following property holds for $\prec$. $$ \neg \left( \exists \left{ a_i \in A \mid i \in \mathbb{N} \right }:\ \forall i \in \mathbb{N}:\ a_{i+1} \prec a_i \right) $$
Exercise: Prove that a non-empty well-founded relation cannot be reflexive.
Minimal element
A minimal element is usually defined for partial orders, but we can define it for a well-founded relation as well:
A minimal element $m$ of a set $A$ according to a well-founded relation $\prec$ is an element as follows: $$ \neg \exists a \in A:\ a \prec m $$
Note that if we were to define the minimal element of a partial order in the same way, we would have to add the requirement that $a$ should be different from $m$ in the defining formula, otherwise there could never be any minimal elements for a partial order because partial orders are reflexive by definition.
Well-founded induction
The must fundamental kind of induction that I teach is well-founded induction.
Let $A$ be a set and $\prec$ a well-founded relation on that set $A$. Let $P$ be a predicate on the set $A$.
We now have an inference rule as follows:
$$ \left( \forall a \in A:
\left( \forall b \in A:\ b \prec a \Rightarrow P(b) \right) \Rightarrow P(a) \right) \Rightarrow \left( \forall a \in A:
P(a) \right) $$
An intuitive explanation
An intuitive explanation for why this inference rule makes sense can be stated as follows. (Of course this could never be the real proof because that would make for a cyclic argument.)
We assume the left-hand side of the implication ($\Rightarrow$) and prove the right-hand side.
Let $M_0$ be the subset of $A$ that contains only the minimal elements of $A$. In other words, let $M_0$ be the following subset of $A$: $$ M_0 = \left{ m \in A \mid \neg \left(\exists n \in A:\ n \prec m \right) \right} $$
Note that for all the elements of this set, $P$ must hold according to the our assumption.
We now define the subset $M_i$ of $A$ for an $i \in \mathbb{N}0$ as follows: $$ M_i = \left{ m \in A \mid \forall n \in A:\ n \prec m \Rightarrow n \in M{i-1} \right} $$ Intuitively this is the next set of elements for which we prove that $P$ holds. We use our assumpution again to prove that $P$ holds for all elements in $M_{i}$ because it holds for all elements in $M_{i-1}$. If we go on like this, eventually we will have shown that $P$ holds for all elements of $A$.
Why a well-founded relation
Note that if $\prec$ would not be well-founded, then by definition there must exist an infinite descending chain of elements in $A$.
If there exists an infinite descending chain of elements in $A$, then there need not exist any minimal elements. (Allthough the non-existence of minimal elements is not a necessary requirement for there to be a problem, this is the simplest way of showing that at least well-foundedness is necessary for induction.) If there aren't any minimal elements in $A$ then there is no way to start the recursive argument that we build in the previous section.
Suppose, for example, that we would use a reflexive relation, then $\left(\forall b \in A:\ b \prec a \Rightarrow P(b)\right) \Rightarrow P(a)$ is automatically true for all elements $a$ in $A$ and all predicates $P$ on $A$ while $\forall a \in A: P(a)$ may not be true.
Structural Induction
Structural definition is a special case of well-founded induction where the set $A$ is a set of terms in some recursively defined language of terms. For example, $A$ could be the set of trees in Haskell:
data Tree
= Leaf
| Branch Tree Tree
The well-founded relation that is then used is called a subterm relation $\sqsubset$. It is usually defined as the transitive closure (smallest relation that includes the original but is also transitive) of the obvious well-founded relation that defines any term to be a subterm in another term if it is part of the recursive definition of that term.
For example, in the case of Tree
s, that would mean that $\sqsubset$ is defined as the transitive closure of the relation recursively defined as follows:
$\forall$ t1, t2 :: Tree
: t1
$\sqsubset$ Branch t1 t2
and t2
$\sqsubset$ Branch t1 t2
Because structural induction is just a special case of well-founded induction, we can use the same inference rule.
Exercise: Define a subterm relation on Haskell's list structure:
Exercise: Find the subterms of [1,2,3]
. Is [1,2]
a subterm?
A note on structural induction on data in Haskell
Inductively defined terms in a language are usually defined to be finite in depth. As it would be undicidable in general to check whether a function produces only finite terms, and Haskell uses lazy evaluation anyway, Haskell's types naturally contain infinite terms as well.
However, for a subterm relation to be well-founded, $A$ must contain only terms of finite depth. Otherwise there trivially exist inifinite descending chains of subterms. As such we have to assume terms in depth when writing induction proofs.
Natural induction
The (strong) natural induction inference rule is just a special case of structural induction as per the von Neumann construction of natural numbers.
Alternatively you could say that it's a special case of well-founded induction by proving that $<$ on is a well-founded relation on $\mathbb{N}$.
How to write an induction proof when your grades depend on it
A proof is for humans. The languages we use for communication with other humans tend to have enough redundancy built into them such that even when words are omitted in speech, communication can still proceed safely. This implies that we tend to consider communication, as we would use it in speech, slightly redundant when we write it down. There is enough context, and words will not just dissapear in text, to make out what was meant.
In the same way, a mathematically inclined person will be able to figure out from the context what you meant with a proof. This comes in handy in literature, but can be confusing in class.
In class, the person grading your solution does not care as much for checkng whether the theorem holds (through the proof) as they do for checking whether you wrote down a correct proof (even if it's already clear that the theorem holds), understood the material and should be awarded a good grade as a result. The way you write down proofs should be different.
Concretely:
When in doubt, be more verbose.
Quantify all variables explicitly.
Specify which kind of induction you are trying to use. (General well-founded induction, strong structural induction, strong or weak natural induction)
Write down the set $A$ you want to inductively prove a predicate over.
Write down the predicate $P$ that you want to prove for all elements of that set.
Make sure that the 'argument' to the predicate is in fact in the set that your predicate is on. That is, $P$ is in fact a predicate on $A$.
Define the well-founded relation that you will be using and prove that it is in fact well-founded if it's not a commonly-known-to-be-well-founded relation like $<$.
If you make a case distinction, write down that you are trying to make a case distinction and prove that the union of the cases forms the universe that you would like to make the case distinguish in. For example, $x = 0$ and $x \neq 0$ are complementary but $x = 0$ and $x > 0$ may not be.
In a case distinction at the top-level, write down the predicate again for each specific case. For example, in the case of Haskell lists, write out what $P($
[]
$)$ means explicitly for the base case.In the case of weak natural induction, write down the induction hypothesis explicitly if you refer to it.
A general schema for a structural induction proof
Let $Q$ be the description of some predicate on a set of terms $A$. This description will usually not be given a name separately. Suppose we are trying to prove $\forall a \in A: Q(a)$. The rest of this section is a general schema for a proof that uses strong structural induction. The blanks, indicated by the fact that they are written in code style
, are the parts that you fill in for a concrete proof. The rest can remain virtually untouched.
Theorem: $\forall a \in A: Q(a)$
Proof:
Define $P$ to be a predicate on the set $A$ as follows:
[Definition of P]
We prove this theorem by strong structural induction of the predicate $P$ over the set $A$ using the following subterm relation $\sqsubset$.
[Definition of ⊏]
Note that $\sqsubset$ is in fact well-founded:
[Proof that ⊏ is well-founded]
Let $a$ be an arbitrary element of $A$. We distinguish the following cases:
(Base cases) $a$ is of the form
minimal element of A
. We proveP(a) in this specific case
:
Proof that P(a) holds.
(Inductive case) $a$ is of the form
construction from subterms b, c, ... in A
for somesubterms b, c, ...
in $A$. We proveP(a) in this specific case
fromP(b) ∧ P(c) ∧ ...
(which we call the induction hypothesis).
Proof that P(a) holds.
(Conclusion) By the inference rule of strong strucural induction and the cases numbered 1 and 2 above, we conclude that $P$ holds for all elements of $A$.
QED.