Safer paths, part 1 - valid and typed paths
Filepaths have been a pain in my neck for years. Paths are hard, overused, misused and mostly unsafe. In this post I present a newly released library that serves to make working with paths safer in the common use-case.
Misuse and scope
Have you every thought about why we use paths in the first place? and why
we use strings to denote them? Is it not weird that
. means the
‘current’ directory (whatever that even means) and is also the
separator for extensions? Is it not absurd that
/ is both the
root of the path system and the separator? Path are hard, but in any case, we
are stuck with them now.
Paths are often overused or misused. By this I mean that they are used to denote more than just the location of a file(/…). Sometimes a path is a vital part of the meaning of the contents of the file. Sometimes the mere existence of a file at a certain path even has a meaning.
is only to be used for real paths. Paths that can point to a directory, a
file, a device, a socket, etc. It is not meant to be used for glob patterns,
$PATH’s, etc. It also encourages safer use of paths: only
absolute paths, no
Strings and valid-by-construction
Data.FilePath versus System.FilePath
Data.FilePath uses an opaque datatype to represent a path
instead of a plain
String. This is the exact opposite of what
readme fully addresses why
Data.FilePath chose to use an
opaque data type where
safepathonly addresses ‘real’ paths.
safepathdoes not allow for invalid paths.
- A subsequent library will have ‘wrappers’ for
safepathencourages safe usage of paths: preferably no semantics in paths and no
Data.FilePath versus System.Path
Data.FilePath’s opaque data type with one type parameter:
Path. There are two possible occurrences of this path:
type AbsPath = Path Absolute type RelPath = Path Relative
This means that there is a type-level distinction between absolute and relative paths now. It ensures that always the right type of path is passed to a function. It should also encourage programmers to only ever use Absolute paths in the heart of their application.
These different parameters serve to also support:
- Whether a path points to a file or a directory
- What platform the path originates from
Data.FilePath does not make these
distinctions for two reasons:
Whether a path points to a file or a directory can not be determined from the path itself. Whether it points to a file or a directory on disk is not even an immutable fact. It is dangerous to pretend that
Path Diris any safer than
∃ t -> Path t. As such,
Data.FilePathdoes not make this distinction.
Supporting different platforms’ paths in the same system is not a common enough use-case. The common use-case is to use the hosts platform’s paths.
A path is data with invariants
A path has invariants. Some can be encoded in the type, others have to be contained and heavily tested. Of course these should be hidden from the user, but they should be thoroughly tested. Looking at the other ‘safe’ path libraries, I see only a handful of tests and no property tests to check invariants.
In terms of lines, there are many more for testing as there are for code. In absolute numbers, there are:
- over 150 doctests
- over 100 property tests
- over 10000 unit tests for parsing and rendering
Doctests ensure that the semantics of functions are at least intuitively
clear. Property tests look for edge cases that need to be handled and ensure
that the invariants of the
Data.FilePath types are maintained
under all circumstances. Lastly, the unit tests serve to ensure that when the
handling of the paths changes behind the scenes, the API retains its