Safer paths, part 1 - valid and typed paths

[ERT: 2m39s]

Filepaths have been a pain in my neck for years. Paths are hard, overused, misused and mostly unsafe. In this post I present a newly released library that serves to make working with paths safer in the common use-case.

Misuse and scope

Have you every thought about why we use paths in the first place? and why we use strings to denote them? Is it not weird that . means the ‘current’ directory (whatever that even means) and is also the separator for extensions? Is it not absurd that / is both the root of the path system and the separator? Path are hard, but in any case, we are stuck with them now.

Paths are often overused or misused. By this I mean that they are used to denote more than just the location of a file(/…). Sometimes a path is a vital part of the meaning of the contents of the file. Sometimes the mere existence of a file at a certain path even has a meaning.

The safepath library is only to be used for real paths. Paths that can point to a directory, a file, a device, a socket, etc. It is not meant to be used for glob patterns, for $PATH’s, etc. It also encourages safer use of paths: only absolute paths, no Strings and valid-by-construction Path’s.

Data.FilePath versus System.FilePath

Data.FilePath uses an opaque datatype to represent a path instead of a plain String. This is the exact opposite of what System.FilePath does.

The safepath readme fully addresses why Data.FilePath chose to use an opaque data type where System.FilePath chooses String instead.

In short:

  • safepath only addresses ‘real’ paths.
  • safepath does not allow for invalid paths.
  • A subsequent library will have ‘wrappers’ for directory.
  • safepath encourages safe usage of paths: preferably no semantics in paths and no String juggling.

Data.FilePath versus System.Path

Data.FilePath’s opaque data type with one type parameter: Path. There are two possible occurrences of this path:

type AbsPath = Path Absolute
type RelPath = Path Relative

This means that there is a type-level distinction between absolute and relative paths now. It ensures that always the right type of path is passed to a function. It should also encourage programmers to only ever use Absolute paths in the heart of their application.

This approach may looks familiar. There are the path and pathtype libraries that do something similar, only with more type parameters.

These different parameters serve to also support:

  • Whether a path points to a file or a directory
  • What platform the path originates from

The Path in Data.FilePath does not make these distinctions for two reasons:

  • Whether a path points to a file or a directory can not be determined from the path itself. Whether it points to a file or a directory on disk is not even an immutable fact. It is dangerous to pretend that Path Dir is any safer than ∃ t -> Path t. As such, Data.FilePath does not make this distinction.

  • Supporting different platforms’ paths in the same system is not a common enough use-case. The common use-case is to use the hosts platform’s paths.


A path is data with invariants

A path has invariants. Some can be encoded in the type, others have to be contained and heavily tested. Of course these should be hidden from the user, but they should be thoroughly tested. Looking at the other ‘safe’ path libraries, I see only a handful of tests and no property tests to check invariants.

Extreme testing

Using validity, genvalidity and genvalidity-hspec, safepath has been tested extremely heavily.

In terms of lines, there are many more for testing as there are for code. In absolute numbers, there are:

  • over 150 doctests
  • over 100 property tests
  • over 10000 unit tests for parsing and rendering

Doctests ensure that the semantics of functions are at least intuitively clear. Property tests look for edge cases that need to be handled and ensure that the invariants of the Data.FilePath types are maintained under all circumstances. Lastly, the unit tests serve to ensure that when the handling of the paths changes behind the scenes, the API retains its semantics.

Published: August 14 2016

Category: Project
  • tags: paths, Haskell, safety