Announcing safe-coloured-text 0.2.0.0 with a quick primer on character encodings

Date 2022-06-28

This post announces the new version 0.2.0.0 of the safe-coloured-text library. The safe-coloured-text library lets you safely output coloured text to a terminal. The idea for version 0.2.0.0 came from a very smart and annoyingly sensible comment on reddit. The first (0.1.0.0) version made the now-considered-erroneous decision to require the user to use UTF8. The newest (0.2.0.0) version relaxes that requirement by using Text instead of ByteString.

A very colourful golden test

A quick primer on character encodings

Human language is seriously complex. Representing human language in computers is even more complex. This has to do with human history but more importantly also the history of computing and (historical) efficiency requirements. Here is an extremely simplified summary.

  • Human text consists of Characters. [*]

  • Unicode assigns a number to every [*] character. We call these numbers code points.

  • A character encoding lets you map a sequence of code points from and to a sequence of octets (bytes [*]).

  • We would like encodings to be efficient for common use-cases like "English text only" or "Text with European languages only". [*]

  • UTF8 is a common encoding that is a good compromise for most use-cases.

  • UTF8 is not the standard everywhere, and even Haskell's text package used UTF16 internally until recently.

  • Systems try to specify the encoding that they want programs to use in various ways like, for example, the LANG environment variable.

[*]: Not really, but we've more or less been able to pretend so anyway.

Relevant Haskell types

With that in mind, these are the relevant types in Haskell:

  • Char: A unicode code point

  • String: A list of Chars: type String = [Char].

  • Text: Like String, but more performant for most use-cases. (Text also doesn't support certain code points, like unmatched UTF16 surrogate code points, in versions before text-2.0.)

  • ByteString: Like [Word8], but more performant for most use-cases.

These types are different for Real and Important reasons. Some examples include:

Notable changes

Version 0.2.0.0 of the safe-coloured-text library

The default output of the safe-coloured-text library is now Text instead of ByteString. Existing functions are deprecated according to the following scheme:

  • renderChunks is now a deprecated synonym of renderChunksUtf8BSBuilder.

  • renderChunksUtf8BSBuilder is a new function that outputs a ByteString.Builder.

  • renderChunksBuilder is a new function that outputs a Text.Builder.

  • renderChunksText is a new function that outputs a Text.

  • renderChunksBS is now a deprecated synonym of renderChunksUtf8BS.

  • renderChunksUtf8 is a new function that outputs a ByteString.

Note that the new version of the library requires you to choose an encoding in order to continue outputting raw bytes, but does not break reverse dependencies that want to keep using renderChunks or renderChunksBS.

Version 0.2.0.0 of the autodocodec-yaml library

The autodocodec-yaml library lets you output a schema for a JSON (and YAML) codec in a nice and colourful way. The functions that output these nicely coloured schemas now produce Text values instead of ByteStrings.

A nice and colourful yamlschema

Version 0.11.0.0 of the sydtest library

The sydtest testing framework now tries to respect the system's locale by using the functions in Data.Text.IO instead of outputting UTF8 bytes directly.

Previous
How to deal with money in software

Looking for a lead engineer?

Hire me
Next
2021; Year in review