Option parsing in Haskell, Part 2: A standard approach to settings in Haskell

I wrote the first part of this two-part series for FP Complete. It can be found on their website. That post describes how settings should be handled. This post will describe how to do that in Haskell in practice.

Following suggested approach to passing settings

Following the first two pieces of advice comes very easily in Haskell. Passing settings as an argument, instead of storing them in global state, is almost a given. We would have to resort to System.IO.Unsafe.unsafePerformIO to go against this principle. The second piece of advice, "Use immutable settings" is also mostly a given. I will not even go into how you could go against it in Haskell.

The next two pieces of advice are certainly less trivial. We will use a single example in the rest of this blog post: Suppose we were to write a very simple program with some of the functionality of grep and call it my-grep. We would like to be able to handle two cases:

Find a string
Replace one string by another

Keep in mind that the example is just that: an example. Focus on the principles and use good judgement to write code differently as needed.

Purely functional argument parsing.

Purely functional argument parsing may be easier to do in Haskell than in other languages, but we still have to make the conscious decision to do so.

First we define the appropriate types. We need a Command type that contains all the command-specific information that we can get from the command-line arguments. This information should be an accurate, unprocessed reflection of what is present in the command-line arguments. Use simple types, and err on the site of using Maybe for values that are optional instead of using default values.

Example:

data Command
  = CommandFind FindArgs
  | CommandReplace ReplaceArgs

data FindArgs = FindArgs String
data ReplaceArgs = ReplaceArgs String String

We will also need a type that represents the non-command-specific flags: Flags.

Example:

data Flags = Flags
  { flagConfigFile :: Maybe FilePath
  , flagVerbosity :: Maybe String
  }

Finally, we add one more type to package up the previous two:

data Arguments = Arguments Command Flags

Now we have to write a pure parsing function.

parseArguments :: [String] -> Either ArgError Arguments

The specifics of the error-case are not as important as making sure that the error is pure and not just an exception. It is fine if we handle this ArgError by dieing, because it is usually a good idea to stop the program if the argument parsing fails, but a pure function should exist for testing.

We suggest using optparse-applicative or optparse-simple to do the actual argument parsing. There are excellent tutorials in the README, on 24 days of hackage and in the optparse-simple README.

When using optparse-applicative, please use help and fullDesc generously and set prefShowHelpOnError and prefShowHelpOnEmpty to True.

We suggest turning on stack build :my-program --file-watch --exec='my-program' to see what the output looks like while writing this part.

Pre-processing settings

Now that we have gathered the arguments, the next step is to gather the appropriate information from the environment, the configuration file(s), and possibly even other sources like the program name. In Haskell, we have access to the arguments via getArgs :: IO [String], to the program name via getProgName :: IO String, and to the environment via getEnvironment :: IO [(String, String)]. All of these functions live in IO, but it is important that we keep as much of the pre-processing as possible pure.

Gathering the relevant part of the environment

The environment is up first. We will define a new type that represents the relevant part of the environment: This type should be an accurate representation of the information found in the environment. Again: err on the side of using Maybe instead of default values.

data Environment = Environment
  { envVerbosity :: Maybe String
  , envConfigFile :: Maybe FilePath
    [...]
  }

We will parse the relevant part of the environment with a function that has the following type:

relevantEnvironment :: [(String, String)] -> Environment

Note that we do not allow the gathering from the environment to differ based on the arguments we just parsed. Also note that relevantEnvironment is pure and not allowed to error. This helps to ensure that it does not perform any processing yet.

Gathering the configuration

We approach gathering from the configuration in the same manner. First we define a type whose values represents the configuration that we may find in the configuration files. Similar to the Arguments and the Environment, Configuration should use Maybe values to signify when a certain value is not configured.

data Configuration = Configuration
  { confVerbosity :: Maybe String
    [...]
  }

Using the Arguments and Environment that we just gathered, we will get the configuration from configuration files with a function of this type:

getConfiguration
  :: Arguments -> Environment -> IO Configuration

Note that the IO is necessary for reading files, so we may as well die here if anything goes wrong with reading configuration files. You may want to return Maybe Configuration from the getConfiguration function in case you want to handle situations in which no configuration file exists yet. Note also that the previous steps were necessarily completed beforehand, because usually we would want to be able to override the config file path with a --config-file option or a MY_GREP_CONFIG_FILE environment variable.

Settings

The next step is to actually process these Arguments, Environment and Configuration values. We define a type Dispatch that contains all the command-specific settings that the program will use, and a Settings type that contains all the non-command specific settings.

data Instructions
    = Instructions Dispatch Settings

data Dispatch
    = DispatchFind FindSettings
    | DispatchReplace ReplaceSettings

data FindSettings = FindSettings
    { findSetQuery :: Text
      [...]
    }

data ReplaceSettings = ReplaceSettings
    { replaceSetOriginal :: Text
    , replaceSetReplacement :: Text
      [...]
    }

data Settings = Settings
    { setVerbosity :: LogLevel
      [...]
    }

We then build the settings that our application will use from all this gathered information. The format should be easy for our program to use. We would prefer to get as much of the validation out of the way, by dieing early. Building the settings happens with the combineToInstructions function:

combineToInstructions
  :: Command 
  -> Flags 
  -> Environment
  -> Configuration
  -> IO Instructions

Make sure that there is still no application logic in these settings. As an example: our little grep program needs to know in which files to look, and we want to be able to specify that using a directory name. Specifying foo as the directory in which to look, should mean 'look in all the files inside this directory'. When processing that concept, we are allowed to pre-process a FilePath into a Path Abs Dir, but not list the directory to get a [Path Abs File]. The first part is considered parsing. The second part is considered application logic.

The API

To encapsulate all of this logic, we put it in an OptParse module. We export Dispatch(..), Settings(..), FindSettings(..), ReplaceSettings(..) and a function called getInstructions:

getInstructions :: IO Instructions

This function is transitively responsible for all argument parsing, gathering from the environment and Configuration file(s), and combining all of that information into the Instructions. Should anything go wrong, this function is allowed to die, so that the program only ever has to deal with valid settings.

Now we can write the application using the easy-to-use settings we wanted:

import MyGrep.OptParse

main :: IO ()
main = do
  Instructions dispatch settings <- getInstructions
  runReaderT (doSomethingWith dispatch) settings