If, like me, you're interested in writing extremely safe code, you've probably been annoyed about function's contract before. In this post, I present a way of improving the correctness of elegant code and the safety of correct code.
I will take a mathematical function as an example because that's simpler, but the point stands for functions concerning business logic as well. I will use Haskell to show some code but the idea I present applies to languages other than Haskell as well. Even imperative and/or untyped languages.
Arguing with a programmer
Mathematical functions are quite easily designed with respect to error handling, right? When I ask you to implement the greatest common divisor function, you may have to look up Euclid's algorithm, but you have a general idea of the expected result. You may implement something like this:
gcd :: Integer -> Integer -> Integer
gcd a b = gcd b $ rem a b
This code elegantly embodies the idea of Euclid's algorithm, but it doesn't even terminate. When I ask you to revise this code you will add the base case:
gcd :: Integer -> Integer -> Integer
gcd a 0 = a
gcd a b = gcd b $ rem a b
Now this code terminates, but it's incorrect.
First off: What is the value of gcd 0 0
? gcd 0 0
is undefined, but this code will evaluate it to 0
. When I demand you fix it, you grudgingly add a case for that as well:
gcd :: Integer -> Integer -> Integer
gcd 0 0 = error "The greatest common divisor of 0 and 0 does not exist."
gcd a 0 = a
gcd a b = gcd b $ rem a b
gcd
is defined for negative numbers, but this code will not produce the correct result. In fact it will evaluate gcd (-25) (-5)
to -5
while the greatest common divisor is mathematically defined to be a positive number.
You argue that people who use this function will know what they're doing, but when I press you on it, you change the code again to fix this issue:
gcd :: Integer -> Integer -> Integer
gcd 0 0 = error "The greatest common divisor of 0 and 0 does not exist."
gcd a 0 = abs a
gcd a b
| a < 0 || b < 0 = gcd (abs a) (abs b)
| otherwise = gcd b $ rem a b
You look at the code some more and decide that its functionality is correctly defined. At this point I, annoyingly, argue that this simple function should be total. We can prove that it terminates for any input, but after writing this proof, we want to be sure that that means that our code is safe to run.
This code is not safe to run. Annoyed by my persistence, you make what you promise to yourself is the last change to the code:
gcd :: Integer -> Integer -> Maybe Integer
gcd 0 0 = Nothing
gcd a 0 = Just $ abs a
gcd a b
| a < 0 || b < 0 = gcd (abs a) (abs b)
| otherwise = gcd b $ rem a b
Now I will argue that Maybe Integer
is not really the right type for the result of this function because the result of gcd
is always positive. I argue that we should define a more informative type Natural
and make the return type Maybe Natural
.
At this point you're ready to start flipping tables so I stop arguing and leave it at this.
The result of my pestering
We've gone from some simple code that embodies the idea of the algorithm...
gcd :: Integer -> Integer -> Integer
gcd a 0 = a
gcd a b = gcd b $ rem a b
... to some rather unwieldy code that performs safely:
gcd :: Integer -> Integer -> Maybe Integer
gcd 0 0 = Nothing
gcd a 0 = Just $ abs a
gcd a b
| a < 0 || b < 0 = gcd (abs a) (abs b)
| otherwise = gcd b $ rem a b
In the process, you went from happily looking at elegant code to disliking the code and figuratively pelting me with vegetables.
Solutions?
How then, do we write code that covers all cases without removing the elegance?
One obvious solution would be to keep the function simple, partial and only partially correct:
-- | Greatest common divisor
--
-- Only produces a correct result for positive inputs a and b where at least
-- one of a and b is non-zero. Will return bogus results otherwise.
gcd :: Integer -> Integer -> Integer
gcd a 0 = a
gcd a b = gcd b $ rem a b
This way we move the responsibility of handling the errors to the user. This is not really a solution, the problem is just moved outside the scope of your work. Unfortunately this is still the most commonly used 'solution'.
Closely related to this solution, but already much better, is to annotate gcd
with some Liquid Haskell annotations to ensure that the arguments are in the correct subset of Integer
. This is still not a solution for users that don't use Liquid Haskell, nor is its use (and thus the safety) enforced.
A second option is to write correct code to define a partial function:
-- | Greatest common divisor
--
-- Will error when both arguments are zero.
gcd :: Integer -> Integer -> Integer
gcd 0 0 = error "The greatest common divisor of 0 and 0 does not exist."
gcd a 0 = abs a
gcd a b
| a < 0 || b < 0 = gcd (abs a) (abs b)
| otherwise = gcd b $ rem a b
This approach is often used in Prelude
because it allows for the easiest use of the code. It is beginner-friendly and good enough for simple situations.
For safe applications, this solution is still unsatisfactory. Ideally we would like to avoid non-total functions. How do we make this safe code elegant again?
Here's my proposition: We can separate the core logic from the error handling, add enough comments and make the code safe. Remember, writing more code is not bad as long as it makes the code clearer.
Here we put the embodiment of the algorithm in a where clause, along with a comment about its shortcomings, and write the error-handling around it:
-- | Greatest common divisor
--
-- Returns Nothing when both arguments are zero and Just $gcd(a, b)$ otherwise.
gcd :: Integer -> Integer -> Maybe Integer
gcd 0 0 = Nothing
gcd x y
-- Mathematically: gcd a b == gcd (abs a) (abs b)
= Just $ gcd' (abs x) (abs y)
where
-- Only works if:
-- * At least one of the arguments is non-zero
-- * Both arguments are positive
0 = a
gcd' a = gcd' b $ a `rem` b gcd' a b
Safe
If you're interested in writing safer code, I suggest you have a look the Safe library on Hackage. Put its safe alternatives in your own Prelude
and systematically remove the non-total functions.