The `genvalidity`

library and its companion libraries have recently gotten
some nice random distribution and performance upgrades.
This post will announce `genvalidity-criterion`

and give a simple overview of the performance improvements.

In a previous blogpost, I covered the size parameter for property testing generators in depth.
That blogpost covers some important considerations for the improvements that have recently been made to `genvalidity-*`

so you may want to read that one first.

### Generating a list.

Generating a list of elements in `QuickCheck`

looks like this:

```
listOf :: Gen a -> Gen [a]
listOf gen =
sized $ \n -> do
k <- choose (0,n)
replicateM k gen
```

In other words, the size parameter is not distributed across the list at all, but still used to generate the size of the list.
This means that `arbitrary :: Gen [a]`

will generate lists of lengths up to the size, `arbitrary :: Gen [[a]]`

will generate lists with a quadratic number of elements with respect to the size parameter, etc.

Generating a list of elements in `genvalidity`

generates an arbitrary partition of the size parameter.
That is to say, a list of integers that sum to the size parameter.
This list is then used to distribute the total size across the elements of the list.

```
genListOf :: Gen a -> Gen [a]
genListOf func =
sized $ \n -> do
pars <- arbPartition n
forM pars $ \i -> resize i func
```

The latest version of `genvalidity`

(0.9.1.0, at the time of writing) still does this, but generates the arbitrary partition in a much more intelligent way.
Previous versions would only generate fairly small lists, albeit with well distributed sizes.

```
arbPartition :: Int -> Gen [Int]
arbPartition i = go i >>= shuffle
where
go k
| k <= 0 = pure []
| otherwise = do
first <- choose (1, k)
rest <- arbPartition $ k - first
return $ first : rest
```

Indeed, this code makes it unlikely that a list of a length around `size`

is generated.
Most lengths would have been around size `5`

.

The current version certainly also generates lists with a length up to the size parameter, still with well distributed sizes.

```
arbPartition 0 = pure []
arbPartition i = genListLengthWithSize i >>= go i
where
go :: Int -> Int -> Gen [Int]
go size len = do
us <- replicateM len $ choose (0, 1)
let invs = map (invE 0.25) us
-- Rescale the sizes to (approximately) sum to the given size.
pure $ map (round . (* (fromIntegral size / sum invs))) invs
-- Use an exponential distribution for generating the
-- sizes in the partition.
invE :: Double -> Double -> Double
invE lambda u = - log (1 - u) / lambda
-- Generate a list length with the given size
genListLengthWithSize :: Int -> Gen Int
genListLengthWithSize maxLen = round . invT (fromIntegral maxLen) <$> choose (0, 1)
where
-- Use a triangle distribution for generating the
-- length of the list
-- with minimum length '0', mode length '2'
-- and given max length.
invT :: Double -> Double -> Double
invT m u =
let a = 0
b = m
c = 2
fc = (c - a) / (b - a)
in if u < fc
then a + sqrt (u * (b - a) * (c - a) )
else b - sqrt ((1 - u) * (b - a) * (b - c))
```

This code is much more complex, and it requires some knowledge of the inverse transform sampling trick, triangle distributions and exponential distributions. It turns out that we can use some fancy maths to generate a value from any distribution as long as we know some particular things about it. In this code, we use a triangle distribution to generate the length of the list, and an exponential distribution to generate the size of the elements in the list.

In short: list sizes are now much more likely to be of a size closer to the value of the size parameter while the size parameter is still distributed accross the elements of the list.

### Benchmarking generators with `genvalidity-criterion`

When property test suites are slow, usually the problem is that the generators are slow.
Some of the `genvalidity-*`

generators could use improvements, so I set about speeding them up.

There is little that annoys me more about programming than developers spending time on performance. I have yet to meet a single developer who writes correct enough code that they should be allowed to care about performance. Nevertheless, sometimes performance is a feature, and in this case better performance will mean more correct code.

The first step, as usual, is to set up a feedback loop.
How do I know if my performance improvements are indeed improvements?
We use benchmarks.
How do we benchmark property testing generators?
Using `genvalidity-criterion`

.

The new `genvalidity-criterion`

allows you to simply benchmark generators.
There are these two important benchmark combinators:

`genValidBench @a :: Benchmark`

`genBench :: String -> Gen a -> Benchmark`

Here is an example:

```
{-# LANGUAGE AllowAmbiguousTypes #-}
{-# LANGUAGE TypeApplications #-}
module Main where
import Criterion.Main as Criterion
import Data.GenValidity.Criterion
main :: IO ()
main =
Criterion.defaultMain
[ genValidBench @()
, genValidBench @Bool
, genValidBench @Ordering
, genValidBench @Char
, genValidBench @Int
, genValidBench @Word
]
```

This new little library allowed me to accurately make the following improvements.

### Speeding up generators

#### Speeding up `genvalidity-text`

Text used to be generated using `Data.Text.pack <$> genValid`

.
That is to say, text was generated by generating a list of characters and then turning them into a `Text`

value.

We were able to speed up text generation 3-5x by using `Data.Text.unfoldrN`

directly.
As a nice benefit, that approach works for any `Gen Char`

:

```
genTextBy :: Gen Char -> Gen Text
```

To generate a `Char`

in the default `GenValid`

instance, the size parameter is not considered.
(The `QuickCheck`

instance mostly generates ASCII characters.)

#### Speeding up `genvalidity-containers`

Generating trees used to have a strong bias towards the root element being bigger than the sub forest. We managed to speed up generating trees slightly, while also fixing this problem. The size of the elements of the tree is now determined ahead of time, by generating a nonempty list first and then turning it into a tree randomly.

```
-- | Generate a tree of values that are generated as specified.
--
-- This takes the size parameter much better into account
genTreeOf :: Gen a -> Gen (Tree a)
genTreeOf func = do
ne <- genNonEmptyOf func
turnIntoTree ne
where
turnIntoTree :: NonEmpty a -> Gen (Tree a)
turnIntoTree (e :| es) = do
groups <- turnIntoGroups es
subtrees <- mapM turnIntoTree groups
pure (Node e subtrees)
turnIntoGroups :: [a] -> Gen [NonEmpty a]
turnIntoGroups = go []
where
go :: [a] -> [a] -> Gen [NonEmpty a]
go acc [] =
case NE.nonEmpty acc of
Nothing -> pure []
Just ne -> pure [ne]
go acc (e:es) =
frequency
[ ( 1
, do rest <- go [] es
pure ((e :| acc) : rest))
, (4, go (e : acc) es)
]
```