Announcing autodocodec

This post announces the new autodocodec library and its companions. The library was inspired by aesons vulnerability, the response to my rules for sustainable Haskell and the excellently documented tomland library by Kowainik. It is already used in production at CS-SYD and is ready for others to try.

Inspiration

When the aesons vulnerability blog post came out, I was reminded of Chris Done's idea of streaming JSON without an intermediary type to avoid the vulnerability. The idea was to declare what to parse, rather than how, and have the parsing library do the parsing for you in.

My rules for sustainable Haskell used to specify "write your JSON serialisation instances manually, and add roundtrip tests and golden tests". After much debate, we have concluded that generated instances + roundtrip tests + golden tests are fine as well. However, this still did not sit right with me because both generic programming and Template Haskell are more complex than necessary for this task, in my opinion. In what follows we will want to write our instances by hand afterall, because of the necessity to document our codecs.

Other than just encoding and decoding, we also noticed a pressing need for good, definitely-correct documentation for the formats that we use. We needed to be able to supply Swagger2/OpenAPI3 documentation for the consumers of our APIs. For configuration files, we also wanted to be able to show nice human-readable documentation of what is expected in the configuration file.

This idea of self-documenting codecs has been discovered independently a few more times:

I have also written a self-documenting decoder (without encoding) before: yamlparse-applicative, which I get to deprecate today.

This seemed like a good enough indication that it was worthwhile to spend a few weeks putting together a solid library that we could use in production.

During my early exploration, I got stuck a few times and noticed that the tomland library had already solved most of the problems that I had gotten stuck on.

That is all to say that this library could not have existed without the giants whose shoulders I've gotten to stand on when writing it.

Enter Autodocodec

Autodocodec is short for "Auto-documenting-codec", or "self(auto)- documenting encoders and decoders". Writing a Codec from autodocodec lets you encode and decode values, and document the parser all with a single value.

Here is an example type:

data Example = Example
  { exampleText :: Text,
    exampleBool :: Bool,
    exampleRequiredMaybe :: Maybe Text,
    exampleOptional :: Maybe Text,
    exampleOptionalOrNull :: Maybe Text,
    exampleOptionalWithDefault :: Text,
    exampleOptionalWithNullDefault :: [Text],
    exampleSingleOrList :: [Text]
  }
  deriving (Show, Eq, Generic)

We can implement a HasCodec instance from autodocodec like so:

instance HasCodec Example where
  codec =
    object "Example" $
      Example
        <$> requiredField "text" "a text" .= exampleText
        <*> requiredField "bool" "a bool" .= exampleBool
        <*> requiredField "maybe" "a maybe text" .= exampleRequiredMaybe
        <*> optionalField "optional" "an optional text" .= exampleOptional
        <*> optionalFieldOrNull "optional-or-null" "an optional-or-null text" .= exampleOptionalOrNull
        <*> optionalFieldWithDefault "optional-with-default" "foobar" "an optional text with a default" .= exampleOptionalWithDefault
        <*> optionalFieldWithOmittedDefault "optional-with-null-default" [] "an optional list of texts with a default empty list where the empty list would be omitted" .= exampleOptionalWithNullDefault
        <*> optionalFieldWithOmittedDefaultWith "single-or-list" (singleOrListCodec codec) [] "an optional list that can also be specified as a single element" .= exampleSingleOrList

Now you can use toJSONViaCodec to encode a value of type Example to JSON in the same way that this instance would:

instance ToJSON Example where
  toJSON Example {..} =
    JSON.object $
      concat
        [ [ "text" JSON..= exampleText,
            "bool" JSON..= exampleBool,
            "maybe" JSON..= exampleRequiredMaybe,
            "optional-with-default" JSON..= exampleOptionalWithDefault
          ],
          [ "optional" JSON..= opt
            | opt <- maybeToList exampleOptional
          ],
          [ "optional-or-null" JSON..= opt
            | opt <- maybeToList exampleOptionalOrNull
          ],
          [ "optional-with-null-default" JSON..= exampleOptionalWithNullDefault
            | not (null exampleOptionalWithNullDefault)
          ],
          [ case exampleSingleOrList of
              [e] -> "single-or-list" JSON..= e
              l -> "single-or-list" JSON..= l
            | not (null exampleSingleOrList)
          ]
        ]

You can also use parseJSONViaCodec to decode a JSON Value in the same way that this instance would:

instance FromJSON Example where
  parseJSON = JSON.withObject "Example" $ \o ->
    Example
      <$> o JSON..: "text"
      <*> o JSON..: "bool"
      <*> o JSON..: "maybe"
      <*> o JSON..:? "optional"
      <*> o JSON..:? "optional-or-null"
      <*> o JSON..:? "optional-with-default" JSON..!= "foobar"
      <*> o JSON..:? "optional-with-null-default" JSON..!= []
      <*> ( ((: []) <$> o JSON..: "single-or-list")
              <|> (o JSON..:? "single-or-list" JSON..!= [])
          )

Note that because we wrote only one HasCodec instance, we do not have to worry about typos anywhere. Encoding and Decoding will "just" roundtrip.

But wait, there is more. You also get some nice, colourful, human-readable documentation for this codec:

Nice, colourful, human-readable documentation for the example type's codec

And even better: you also get machine-readable documentation for this codec in the form of a ToSchema instance for Swagger2 and OpenAPI3 schemas.

In short, you can now do this (using DerivingVia):

data Example = [...]
  deriving stock (Show, Eq)
  deriving
    ( FromJSON, -- <- FromJSON instance for free.
      ToJSON, -- <- ToJSON instance for free.
      Swagger.ToSchema, -- <- Swagger schema for free.
      OpenAPI.ToSchema -- <- OpenAPI schema for free.
    )
    via (Autodocodec Example)

instance HasCodec Example where
  codec = [...]

(You can find more examples in the API usage test package.)

Documentation

I have heard a lot of complaining about how Haskell libraries are under documented, so I figured I would set a good example by over(?) documenting autodocodec. I must admit that the result looks a lot nicer than I expected:

I made a lot of use of doctest to write sustainable documentation.

Tests

If you know my work, you know that I will definitely have enjoyed testing this library. Everything I would expect of a library like this is tested;

Encoding and decoding roundtrips through JSON.
For standard types, encoding behaves in the same way that aeson does.
Error messages for decoding are still good.
Generated Human-readible documentation looks good.
Generated JSON schemas look good.
Generated Swagger schemas look good.
Generated OpenAPI schemas look good.
Generated values are accepted by the corresponding generated JSON schemas.
Generated values are accepted by the corresponding generated Swagger schemas.
Generated values are accepted by the corresponding generated OpenAPI schemas.
Encoding and decoding roundtrips through YAML.
We try to make sure that backward compatibility is maintained.
Codecs are more or less inspectable.
Encoding and decoding is still fast

I used a lot of golden tests to test subjective things like "the generated documentation still looks good". You can find these golden tests in the API usage test package, in case you want to see what the generated documentation looks like.

Performance

If you care about performance of serialisation, you should probably not be using JSON, but just because I knew someone would be annoying about this.

Yes, performance is fine, I checked.