Announcing conformance

This post announces the conformance library factored out from ical. It implements RFC 2119 in order to help you write implementations for other specifications. The conformance library exists to let you write a single parser that you can then run in multiple modes: strict and lenient.

Robustness and testing

If you have ever implemented a specification, you have probably seen a section like this:

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

This section makes it clear how common terms like "MUST" should be interpreted. It is to be read in context of the robustness principle:

Be conservative in what you do, be liberal in what you accept from others.

Let's say you've implemented a specification. Usually that means that your implementation does some things: produce/render data, perform API calls, and accepts some things: parse data, serve API calls.

The least you can do to suggest that you implement a specification and adhere to the robustness principle is test that:

You can parse the data that you produce strictly.
You can parse others' data leniently.

Approached naively, this would involve two implementations of a parser: A strict one and a lenient one.

The conformance library exists to let you write a single parser that you can then run in multiple modes: strict and lenient.

Robustness in the face of violations

There is a second problem that the conformance library solves: The problem of powerful implementers violating the specification in externally fixable ways.

For example, take the case of the "Production Identifier" (PRODID) property of calendars in the Internet Calendaring (RFC 5545) specification. The specification says:

Conformance: The property MUST be specified once in an iCalendar object.

And we know from RFC 2119 what "MUST" means:

MUST: This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification.

So the specification(s) are very clear: Any implementation is supposed to reject a calendar without the PRODID property, right?

Along comes Apple, which exports this calendar file without any PRODID property: (As of when this post is published, Apple still does this.)

BEGIN:VCALENDAR
VERSION:2.0
X-WR-CALNAME:with Syd
X-APPLE-CALENDAR-COLOR:#ff2d55
END:VCALENDAR

Assuming that we don't really need the PRODID for our application, we are faced with a difficult choice as an implementer:

Reject this calendar, and just never allow users to integrate with Apple.
Accept this calendar, and break the specification, contributing to this problem.

Of course we could try contacting Apple, but that doesn't help in the meantime until they (never) fix the issue. It's also not just Apple who does this. I've seen clearly invalid calendar files from Google calendar, Microsoft outlook, Booking.com, and Fastmail as well.

Instead we are forced to at least give users the option of accepting invalid data. The conformance library solves this by providing an "extra lenient" mode for parsers in which fixable errors like these are fixed by "guessing" a fix.

API Overview

The core of the conformance library is the ConformT monad transformer. It has these type parameters:

newtype ConformT ue fe w m a
                  ^  ^ ^ ^- The underlying parsing monad.
                  |  |  \-- The type for "SHOULD", "SHOULD NOT", and "OPTIONAL" warnings
                  |   \---- The type for fixable errors, in case "MUST" or "MUST NOT" is violated in a fixable way.
                   \------- The type for unfixable errors, in case "MUST" or "MUST NOT" is violated in an unfixable way.

If any of the three ue, fe, w are not necessary, you can use Void for them. For example, if you don't use the warnings:

type MyParser a = ConformT Error FixableError Void UnderlyingParser a

You can then write your parser as you would, using lift where you do the underlying parsing:

lift :: P a -> ConformT ue fe w P a

You can emit warnings:

emitWarning :: W -> ConformT ue fe W P ()

You can emit fixable errors. These will halt execution in the strict modes but not in a lenient mode.

emitFixableError :: FE -> ConformT ue FE w P () 

... and you can error with unfixable errors:

unfixableError :: UE -> ConformT UE fe w P a

Once you have your parser, you can run it strictly to error any warnings. This lets you test that you only produce data without warnings:

runConformTStrict ::
       ConformT ue fe w P a
    -> P (Either (Either ue ([fe], [w])) a)

You can also run your parser normally, which lets you parse data from any compliant implementer and obtain the warnings:

runConformT ::
       ConformT ue fe w P a
    -> P (Either (Either ue fe) (a, [w]))    

You can run your parser leniently, fixing any errors from non-compliant implementers:

runConformTLenient ::
       ConformT ue fe w P a
    -> P (Either ue (a, ([fe], [w]))                     

Lastly, you can even choose which fixable errors you want to fix and which you don't, at runtime, by supplying a predicate that decides which fixable errors to fix:

runConformTFlexible ::
  (fe -> P Bool) ->
  ConformT ue fe w P a ->
  P (Either (Either ue fe) (a, ([fe], [w])))

Example

Suppose we have a simple 'language' specification that has the mention of RFC 2119 about how to interpret words like "MUST":

A code is defined as two characters.
The characters MUST be alphabetic characters.
The first character MUST be upper-case.
The second character SHOULD be upper-case.

Examples:

AB
De

You can now implement a parser like this:

module Example where

import Conformance
import Control.Monad
import Data.Char as Char

myParser :: String -> Conform String String String (Char, Char)
myParser = \case
  [c1, c2] -> do
    let checkAlpha c =                                                                                               
          if Char.isAlpha c                                                                             
            then pure ()                                                              
            else unfixableError $ "Not an alphabetic character: " ++ show c
    checkAlpha c1
    c1' <-              
      if Char.isUpper c1
        then pure c1
        else do                                                    
          emitFixableError "The first character is not upper-case."
          pure $ Char.toUpper c1
    checkAlpha c2                                                                       
    when (not (Char.isUpper c2)) $ emitWarning "The second character is not upper-case."
    pure (c1', c2)                                             
  _ -> unfixableError "Did not specify exactly two characters."

Here we used an unfixable error for violations that we cannot fix: Not enough characters, or non-alphabetic characters.

We used a fixable error for a violation that we can fix: The first character is not upper case. We can fix that by making the character upper-case with toUpper. (Note that that only works because the characters are alphabetic.)

Lastly we use a warning for a "SHOULD" in the spec: The second character is not upper-case.

We can then run our parser on some examples:

Strictly:

ghci> runConformStrict $ myParser "AB"
Right ('A','B')
ghci> runConformStrict $ myParser "Ab"
Left (Right ([], ["The second character is not upper-case."]))
ghci> runConformStrict $ myParser "aa"
Left (Right (["The first character is not upper-case."], []))
ghci> runConformStrict $ myParser "A1"
Left (Left "Not an alphabetic character: '1'")

Normally:

ghci> runConform $ myParser "AB"
Right (('A','B'),[])
ghci> runConform $ myParser "Ab"
Right (('A','b'),["The second character is not upper-case."])
ghci> runConform $ myParser "aa"
Left (Right "The first character is not upper-case.")
ghci> runConform $ myParser "A1"
Left (Left "Not an alphabetic character: '1'")

Leniently:

runConformLenient $ myParser "AB"
Right (('A','B'), ([], []))
ghci> runConformLenient $ myParser "Ab"
Right (('A','b'), ([], ["The second character is not upper-case."]))
runConformLenient $ myParser "aa"
Right (('A','a'), (["The first character is not upper-case."], ["The second character is not upper-case."]))
ghci> runConformLenient $ myParser "A1"
Left "Not an alphabetic character: '1'"

Conclusion.

The conformance library lets you write specification-compliant parsers while also letting you test that your own output is strictly specification-compliant. It can be found on Hackage and on GitHub.

For usage examples, you can have a look at my ical implementation.