A configuration loading scheme for tools in Haskell

Using arguments, configurations, settings, options or instructions in any language can be a struggle. There are quite a few libraries to solve just this problem and most of them involve either some global state or an object that is carried around through the program. Seeing as neither of these options are feasible and scalable in Haskell (when implemented naively), one has to think twice about how exactly to do it. In this blogpost I propose a general scheme to deal with all of these.

This is a very practical blogpost. The scheme is geared toward small command-line tools but can also be used more broadly. It is possible to apply the idea presented here directly, but the reader is encouraged to modify the proposed scheme to suit their needs.

Definitions and Types

Commands

There are different ways to specify modified behavior of a program. The first of these that the user comes in contact with are the command-line arguments.

Regular command-line arguments are never prefixed by -. They serve to specify commands and arguments to commands. Examples include the file.txt in cat file.txt and deploy config.sus in spark deploy config.sus

These commands are represented by a data type Command that is strictly a sum-type. The definition of this data type might look like this:

data Command
    = CommandParse FilePath
    | CommandDeploy FilePath
    | CommandCompare FilePath FilePath

For the record; there are two exceptions to the "no---prefixes" rule: --help and --version.

Flags

Next up are the closely related command-line flags. They are always prefixed by - for short flags and -- for long flags. Examples include switches like --verbose. We use a data type Flags to represent the relevant values of the command-line flags and arguments. The definition of this data type might look like this:

data Flags
    = Flags
    { flagVerbose :: Bool
    } 

Except for the flags --help and --version, flags should never specify commands. They should only modify the behavior already specified by commands.

Arguments

A Command value and a Flags value together make the Arguments.

type Arguments = (Command, Flags)

We will use this definition later.

Configurations

The last piece of configuration comes in the form of configuration files. Common examples of contents of these include the url's of the servers to fetch data from. The contents of a configuration file are represented by a data type Configuration.

data Configuration 
  = Configuration
  { confServerUrl  :: ByteString
  , confServerPort :: Int
  }

I leave the actual encoding of Configurations in bytes on disk up to the user.

Putting all of these together into instructions

You could drag around a triple of type (Command, Flags, Configuration) around in the entire program, but this approach has some disadvantages:

It is entirely possible that not all (Command, Flags, Configuration) triples represent valid settings for your program.
This is a triple and not a single value. You would have to take the value you need out of the right part of the triple every time you use it later and that is somewhat cumbersome.

The proposed solution to this problem uses a data type called Settings and a type synonym type Instructions = (Command, Settings). The idea is to combine the Command, Flags and Configuration values together into a Settings value. Exactly how this can be done is discussed in the next section.

Constructing settings

To construct the Settings value we need to keep the following in mind:

We can get the command-line arguments to the program with the System.Environment.getArgs :: IO [String] function
Where to find the configuration file may need to depend on the command-line arguments.

The goal is to build a function of type IO Settings or IO (Either Error Settings)

First we construct an Arguments value from the arguments with a function with the following signature:

getArguments :: [String] -> Either Error Arguments

This function embodies the idea that not all combinations of command-line arguments are necessarily valid. As for implementing this function, which I leave to the reader, I recommend optparse-applicative.

Next is the configuration file. We use a function getConfiguration :: Arguments -> IO (Either Error Configuration) to read the configuration file, failing with a nice error message if anything goes wrong. The location of the configuration file can be a default value and/or specified by the Arguments.

To build the Instructions value, we write a function combineToInstructions :: Arguments -> Configuration -> Either Error Instructions. The goal is to write this function in such a way that it only produces a Right value if the resulting Instructions are valid (and to test that).

If not all of the above data types have a meaning in your specific use case, for example if you do not need any configuration files, you can always make the data type isomorphic to the unit type ().

data Configuration = Configuration

The last piece of the puzzle is then to write the getInstructions :: IO Instructions function.

getInstructions :: IO Instructions
getInstructions = do
  strArgs <- getArgs
  case getArguments strArgs of
    Left err -> die err
    Right args -> do
      eec <- getConfiguration args
      case eec >>= combineToInstructions args of
        Left err -> die err
        Right instr -> return instr
  where
    die err = do
        putStrLn err
        exitFailure

You could also write a getInstructions :: IO (Either Error Instructions) and then handle failures later:

getInstructions :: IO (Either Error Instructions)
getInstructions = do
  strArgs <- getArgs
  case getArguments strArgs of
    l@(Left _) -> return l
    Right args -> do
      eec <- getConfiguration args
      return $ eec >>= combineToInstructions

Using instructions

Given a value of type Instructions, we pattern match on the Command part of the tuple in a dispatch :: Command -> ReaderT Settings IO () function. The main code then looks as follows:

main :: IO ()
main = do
  (command, settings) <- getInstructions
  runReaderT (dispatch command) settings

You may want to use the type synonym type Configured = ReaderT Settings IO.

Using the settings

What exactly you do with the Settings of course depends entirely on what you would like your program to do. A Reader monad or ReaderT monad transformer are very easy ways to drag the configuration along throughout the program without having to explicitly make the Settings an argument to every function.

A note on the `MonadReader` type class

It may be tempting to define one giant monad transformer stack like this:

type MyStack = ReaderT Settings (WriterT MyLogData [...] (ExceptT Text IO) [...])

However, if you then write every function in this monad, every function gets stuck in IO. It becomes cumbersome to test these. Instead you would really like to only put functions in the parts of the monad stack they need. Pure functions don't need to be put in an IO transformer, functions that don't fail don't need to be put in ExceptT, etc.

This problem can be solved with the mtl-style monad transformer type classes. A function that requires IO, we give a MonadIO m constraint and use liftIO. A function that requires Settings, we give a MonadReader Settings m constraint, etc.

We can then put the dispatched function in the MyStack monad and use all the functions that have constrains that are satisfied by this monad. Meanwhile we can still test the pure functions purely.

Conclusion

This concludes a schema for settings in Haskell, any feedback will be greatly appreciated.