Filepaths have been a pain in my neck for years. Paths are hard, overused, misused and mostly unsafe. In this post I present a newly released library that serves to make working with paths safer in the common use-case.
Misuse and scope
Have you every thought about why we use paths in the first place? and why we use strings to denote them? Is it not weird that .
means the 'current' directory (whatever that even means) and is also the separator for extensions? Is it not absurd that /
is both the root of the path system and the separator? Path are hard, but in any case, we are stuck with them now.
Paths are often overused or misused. By this I mean that they are used to denote more than just the location of a file(/...). Sometimes a path is a vital part of the meaning of the contents of the file. Sometimes the mere existence of a file at a certain path even has a meaning.
The safepath
library is only to be used for real paths. Paths that can point to a directory, a file, a device, a socket, etc. It is not meant to be used for glob patterns, for $PATH
's, etc. It also encourages safer use of paths: only absolute paths, no String
s and valid-by-construction Path
's.
Data.FilePath versus System.FilePath
Data.FilePath
uses an opaque datatype to represent a path instead of a plain String
. This is the exact opposite of what System.FilePath
does.
The safepath
readme fully addresses why Data.FilePath
chose to use an opaque data type where System.FilePath
chooses String
instead.
In short:
safepath
only addresses 'real' paths.safepath
does not allow for invalid paths.A subsequent library will have 'wrappers' for
directory
.safepath
encourages safe usage of paths: preferably no semantics in paths and noString
juggling.
Data.FilePath versus System.Path
Data.FilePath
's opaque data type with one type parameter: Path
. There are two possible occurrences of this path:
type AbsPath = Path Absolute
type RelPath = Path Relative
This means that there is a type-level distinction between absolute and relative paths now. It ensures that always the right type of path is passed to a function. It should also encourage programmers to only ever use Absolute paths in the heart of their application.
This approach may looks familiar. There are the path
and pathtype
libraries that do something similar, only with more type parameters.
These different parameters serve to also support:
Whether a path points to a file or a directory
What platform the path originates from
The Path
in Data.FilePath
does not make these distinctions for two reasons:
Whether a path points to a file or a directory can not be determined from the path itself. Whether it points to a file or a directory on disk is not even an immutable fact. It is dangerous to pretend that
Path Dir
is any safer than∃ t -> Path t
. As such,Data.FilePath
does not make this distinction.Supporting different platforms' paths in the same system is not a common enough use-case. The common use-case is to use the hosts platform's paths.
Implementation
A path is data with invariants
A path has invariants. Some can be encoded in the type, others have to be contained and heavily tested. Of course these should be hidden from the user, but they should be thoroughly tested. Looking at the other 'safe' path libraries, I see only a handful of tests and no property tests to check invariants.
Extreme testing
Using validity
, genvalidity
and genvalidity-hspec
, safepath
has been tested extremely heavily.
In terms of lines, there are many more for testing as there are for code. In absolute numbers, there are:
over 150 doctests
over 100 property tests
over 10000 unit tests for parsing and rendering
Doctests ensure that the semantics of functions are at least intuitively clear. Property tests look for edge cases that need to be handled and ensure that the invariants of the Data.FilePath
types are maintained under all circumstances. Lastly, the unit tests serve to ensure that when the handling of the paths changes behind the scenes, the API retains its semantics.