The undefined trick

Date 2021-09-10

This post describes "the undefined trick" for introducing compile errors when the number of fields in a record is changed, in Haskell.

Example situation: HTML Description

Suppose you are working on SEO for a website. The website displays blog posts, one per page. You're tasked with writing the code that generates the <meta name="description"> tag that helps determine what your blog post will look like in Google's search results. For example, this blog post might show up as follows:

Example search results

Here the <meta name="description"> tag would be:

<meta
  name="description"
  content="This post describes &quot;the undefined trick&quot; for introducing compile errors when the number of fields in a record is changed, in Haskell.">

Now suppose that you store blog posts in memory as some sort of record:

data BlogPost
  { blogPostDate :: !Day
  , blogPostTitle :: !Text
  , blogPostDescription :: !(Maybe Text)
  } deriving (Show, Eq, Generic)

Google's SEO guidelines around meta descriptions specifically mention:

Make sure that every page on your site has a meta description.

Include clearly tagged facts in the description.

These snippets are also truncated to at most 160 characters. So it is very important that we put some thought into how to generate these descriptions. Furthermore, every part of a BlogPost value is potentially relevant to the description.

We will want to write a function like this, to generate the descriptions:

blogPostHtmlDescription :: BlogPost -> Text

Say we've come up with some smart way of generating a nice description. All is well and good, until someone adds another field to the BlogPost record, maybe a list of tags, like this:

data BlogPost
  { blogPostDate :: !Day
  , blogPostTitle :: !Text
  , blogPostDescription :: !Text
  , blogPostTags :: ![Text] -- <-- New field!!
  } deriving (Show, Eq, Generic)

These tags could be very useful to take into account for the <meta name="description">, but whomever added the field did not consider this, so potentially very valuable SEO is left on the table.

It would be nice if we could somehow guarantee, in the definition of blogPostHtmlDescription, that if anyone added a field to the BlogPost type, they would be forced to reconsider whether it needs to be taken into in this function.

Other examples include

  • When implementing hash, you will want to consider every field of the record, otherwise your hash function contains trivial collisions.
  • When implementing toJSON, you will want to consider including every field in the JSON.Value, otherwise you might lose information via serialisation.

General problem

If a function myFunc takes a product type as an argument, and it is important that every field of this input is used (or at least considered), nothing warns you when adding a new field.

Solutions

Each of the following solutions can help with the problems laid out above, by having some part of CI yell at you if you have not provided evidence that you have considered the new field when one is added to your record.

Roundtrip tests

Roundtrip tests are by far the best way to solve this issue, when they are applicable. Forgetting to serialise the new field in a ToJSON instance is problematic, but if you have a roundtrip test for JSON serialisation, then that means you also implemented a FromJSON instance. If you have also implemented a FromJSON instance, you will get a compile error if you have not implemented deserialisation for the new field. Once you have fixed that compile error, the roundtrip tests will now fail. (Important caveat: You need to be using property tests that generate non-trivial values for the new field, for this to work. Something like Test.Syd.Validity.Aeson.jsonSpecOnValid with validity-based generators, for example.)

The part of CI that will yell at you if you have forgotten to implement serialisation for your new field, will be your tests that now fail. We could actually have faster feedback, at compile-time (see below).

Positional arguments

Another solution would be to use positional arguments, like so:

blogPostHtmlDescription :: BlogPost -> Text
blogPostHtmlDescription (BlogPost date title description) =

This way, when you add a field, you will get a compile-error saying that the BlogPost pattern takes 4 arguments but only 3 have been given. This works, but is sub-optimal because now your code is sensitive to changing the order of the fields in the BlogPost type. For example, if you switch around the title and description fields (which have the same type), your code will still compile but will now be subtly wrong.

The undefined trick

The undefined trick is my favourite way to solve this problem, even if it looks like a bit of a hack.

It consists of using unused pattern bindings:

blogPostHtmlDescription :: BlogPost -> Text
blogPostHtmlDescription bp =
  let BlogPost _ _ _ = undefined
  in

This way the compiler will still yell at you, saying that the BlogPost pattern takes 4 arguments but only 3 have been given, but no brittleness is introduced with respect to field ordering.

Furthermore, even if you decide that the new field is not relevant, your code reviewer will still see this in the commit diff:

- let BlogPost _ _ _ = undefined
+ let BlogPost _ _ _ _ = undefined

Your code reviewer can see this and know that this means that the new field could have been relevant here but you decided it was not. Now you have an artefact of the decision that the new field was not relevant here. Moreover, you can be more sure that you did not forget to change anything that you should have, and CI did not yell about.

Previous
JSON Vulnerability in Haskell's Aeson library

Start your Haskell project from a template

Haskell templates
Next
How to manage your work on GitHub with Smos