The genvalidity
library and its companion libraries have recently gotten some nice random distribution and performance upgrades. This post will announce genvalidity-criterion
and give a simple overview of the performance improvements.
In a previous blogpost, I covered the size parameter for property testing generators in depth. That blogpost covers some important considerations for the improvements that have recently been made to genvalidity-*
so you may want to read that one first.
Generating a list.
Generating a list of elements in QuickCheck
looks like this:
listOf :: Gen a -> Gen [a]
=
listOf gen $ \n -> do
sized <- choose (0,n)
k replicateM k gen
In other words, the size parameter is not distributed across the list at all, but still used to generate the size of the list. This means that arbitrary :: Gen [a]
will generate lists of lengths up to the size, arbitrary :: Gen [[a]]
will generate lists with a quadratic number of elements with respect to the size parameter, etc.
Generating a list of elements in genvalidity
generates an arbitrary partition of the size parameter. That is to say, a list of integers that sum to the size parameter. This list is then used to distribute the total size across the elements of the list.
genListOf :: Gen a -> Gen [a]
=
genListOf func $ \n -> do
sized <- arbPartition n
pars $ \i -> resize i func forM pars
The latest version of genvalidity
(0.9.1.0, at the time of writing) still does this, but generates the arbitrary partition in a much more intelligent way. Previous versions would only generate fairly small lists, albeit with well distributed sizes.
arbPartition :: Int -> Gen [Int]
= go i >>= shuffle
arbPartition i where
go k| k <= 0 = pure []
| otherwise = do
<- choose (1, k)
first <- arbPartition $ k - first
rest return $ first : rest
Indeed, this code makes it unlikely that a list of a length around size
is generated. Most lengths would have been around size 5
.
The current version certainly also generates lists with a length up to the size parameter, still with well distributed sizes.
0 = pure []
arbPartition = genListLengthWithSize i >>= go i
arbPartition i where
go :: Int -> Int -> Gen [Int]
= do
go size len <- replicateM len $ choose (0, 1)
us let invs = map (invE 0.25) us
-- Rescale the sizes to (approximately) sum to the given size.
pure $ map (round . (* (fromIntegral size / sum invs))) invs
-- Use an exponential distribution for generating the
-- sizes in the partition.
invE :: Double -> Double -> Double
= - log (1 - u) / lambda
invE lambda u
-- Generate a list length with the given size
genListLengthWithSize :: Int -> Gen Int
= round . invT (fromIntegral maxLen) <$> choose (0, 1)
genListLengthWithSize maxLen where
-- Use a triangle distribution for generating the
-- length of the list
-- with minimum length '0', mode length '2'
-- and given max length.
invT :: Double -> Double -> Double
=
invT m u let a = 0
= m
b = 2
c = (c - a) / (b - a)
fc in if u < fc
then a + sqrt (u * (b - a) * (c - a) )
else b - sqrt ((1 - u) * (b - a) * (b - c))
This code is much more complex, and it requires some knowledge of the inverse transform sampling trick, triangle distributions and exponential distributions. It turns out that we can use some fancy maths to generate a value from any distribution as long as we know some particular things about it. In this code, we use a triangle distribution to generate the length of the list, and an exponential distribution to generate the size of the elements in the list.
In short: list sizes are now much more likely to be of a size closer to the value of the size parameter while the size parameter is still distributed accross the elements of the list.
Benchmarking generators with genvalidity-criterion
When property test suites are slow, usually the problem is that the generators are slow. Some of the genvalidity-*
generators could use improvements, so I set about speeding them up.
There is little that annoys me more about programming than developers spending time on performance. I have yet to meet a single developer who writes correct enough code that they should be allowed to care about performance. Nevertheless, sometimes performance is a feature, and in this case better performance will mean more correct code.
The first step, as usual, is to set up a feedback loop. How do I know if my performance improvements are indeed improvements? We use benchmarks. How do we benchmark property testing generators? Using genvalidity-criterion
.
The new genvalidity-criterion
allows you to simply benchmark generators. There are these two important benchmark combinators:
genValidBench @a :: Benchmark
genBench :: String -> Gen a -> Benchmark
Here is an example:
{-# LANGUAGE AllowAmbiguousTypes #-}
{-# LANGUAGE TypeApplications #-}
module Main where
import Criterion.Main as Criterion
import Data.GenValidity.Criterion
main :: IO ()
=
main
Criterion.defaultMain@()
[ genValidBench @Bool
, genValidBench @Ordering
, genValidBench @Char
, genValidBench @Int
, genValidBench @Word
, genValidBench ]
This new little library allowed me to accurately make the following improvements.
Speeding up generators
Speeding up genvalidity-text
Text used to be generated using Data.Text.pack <$> genValid
. That is to say, text was generated by generating a list of characters and then turning them into a Text
value.
We were able to speed up text generation 3-5x by using Data.Text.unfoldrN
directly. As a nice benefit, that approach works for any Gen Char
:
genTextBy :: Gen Char -> Gen Text
To generate a Char
in the default GenValid
instance, the size parameter is not considered. (The QuickCheck
instance mostly generates ASCII characters.)
Speeding up genvalidity-containers
Generating trees used to have a strong bias towards the root element being bigger than the sub forest. We managed to speed up generating trees slightly, while also fixing this problem. The size of the elements of the tree is now determined ahead of time, by generating a nonempty list first and then turning it into a tree randomly.
-- | Generate a tree of values that are generated as specified.
--
-- This takes the size parameter much better into account
genTreeOf :: Gen a -> Gen (Tree a)
= do
genTreeOf func <- genNonEmptyOf func
ne
turnIntoTree newhere
turnIntoTree :: NonEmpty a -> Gen (Tree a)
:| es) = do
turnIntoTree (e <- turnIntoGroups es
groups <- mapM turnIntoTree groups
subtrees pure (Node e subtrees)
turnIntoGroups :: [a] -> Gen [NonEmpty a]
= go []
turnIntoGroups where
go :: [a] -> [a] -> Gen [NonEmpty a]
=
go acc [] case NE.nonEmpty acc of
Nothing -> pure []
Just ne -> pure [ne]
:es) =
go acc (e
frequency1
[ ( do rest <- go [] es
, pure ((e :| acc) : rest))
4, go (e : acc) es)
, ( ]