CS SYD - Property testing in depth: genvalidity's fixed-size type generators

This post announces genvalidity-0.10.0.0 and recently released companion libraries. This post explores the changes to the generators of Int, Word, Float, Double, Natural and Integer.

Generating `Int` and `Word` values

The QuickCheck approach to generating values of type Int or Word uses the arbitrarySizedIntegral function.

-- | Generates an integral number. The number can be positive or negative
-- and its maximum absolute value depends on the size parameter.
arbitrarySizedIntegral :: Integral a => Gen a
arbitrarySizedIntegral =
  sized $ \n ->
  inBounds fromInteger (choose (-toInteger n, toInteger n))

Values of types Int8...Int64 and Word8...Word64 are generated using the arbitrarySizedBoundedIntegral function.

(You don't really need to understand how this works.)

-- | Generates an integral number from a bounded domain. The number is
-- chosen from the entire range of the type, but small numbers are
-- generated more often than big numbers. Inspired by demands from
-- Phil Wadler.
arbitrarySizedBoundedIntegral :: (Bounded a, Integral a) => Gen a
arbitrarySizedBoundedIntegral =
  withBounds $ \mn mx ->
  sized $ \s ->
    do let bits n | n == 0 = 0
                  | otherwise = 1 + bits (n `quot` 2)
           k = (toInteger s*(bits mn `max` bits mx `max` 40) `div` 80)
           -- computes x `min` (2^k), but avoids computing 2^k
           -- if it is too large
           x `minexp` k
             | bits x < k = x
             | otherwise = x `min` (2^k)
           -- x `max` (-2^k)
           x `maxexpneg` k = -((-x) `minexp` k)
       n <- choose (toInteger mn `maxexpneg` k, toInteger mx `minexp` k)
       return (fromInteger n)

All that matters from the above piece of code is that you realise that both of these generators mostly only generate very small values. Observe:

λ> sample (arbitrary :: Gen Word)
0
2
2
6
8
8
1
5
10
7
8

λ> sample (arbitrary :: Gen Word64)
1
1
5
10
10
193
405
1536
172
13414
7990

In particular, values around minBound (for IntX) or maxBound are never tried. As a result, the following property just passes:

λ> quickCheck $ \n -> abs (n :: Int) >= 0 
+++ OK, passed 100 tests.

This is a clear example of a bad generator that hides a real failure. Indeed, a better generator may have generated minBoud :: Int to find the following failure:

λ> minBound :: Int
-9223372036854775808
λ> abs (minBound :: Int)
-9223372036854775808
λ> abs (minBound :: Int) >= 0
False

The new genUnchecked Int instance uses a more sophisticated approach. It does not distinguish between Int and Int64, and uses the following helper function:

-- | Generate Int, Int8, Int16, Int32 and Int64 values smartly.
--
-- * Some at the border
-- * Some around zero
-- * Mostly uniformly
genIntX :: forall a. (Integral a, Bounded a, Random a) => Gen a
genIntX =
  frequency
    [ (1, extreme)
    , (1, small)
    , (8, uniform)
    ]
  where
    extreme :: Gen a
    extreme = sized $ \s -> oneof
      [ choose (maxBound - fromIntegral s, maxBound)
      , choose (minBound, minBound + fromIntegral s)
      ]
    small :: Gen a
    small = sized $ \s -> choose (- fromIntegral s, fromIntegral s)
    uniform :: Gen a
    uniform = choose (minBound, maxBound)

This generator generates small values, just like QuickCheck does, 10% of the time. It generates extreme values, around the bounds, 10% of the time. And finally, it generates uniformly distributed values the rest of the time.

This generator does not hide the failure from before, indeed it finds the minBound counterexample:

λ> quickCheck $ forAll genUnchecked $ \n -> abs (n :: Int) >= 0
*** Failed! Falsified (after 2 tests):
-9223372036854775808

To generate Word values, the new genvalidity generators use a similar approach, but obviously do not try to generate negative values.

Generating `Float` and `Double` values

Generating floating point values has been controversial already.

To generate a floating point number, QuickCheck uses the arbitrarySizedFractional function.

-- | Generates a fractional number. The number can be positive or negative
-- and its maximum absolute value depends on the size parameter.
arbitrarySizedFractional :: Fractional a => Gen a
arbitrarySizedFractional =
  sized $ \n ->
    let n' = toInteger n in
      do b <- choose (1, precision)
         a <- choose ((-n') * b, n' * b)
         return (fromRational (a % b))
 where
  precision = 9999999999999 :: Integer

The biggest problem with this generator is that it mostly only generates small values. As a result, we get the following false negative:

λ> quickCheck $ \f -> (f :: Double) < 200
+++ OK, passed 100 tests.

The next big problem is that this generator never generates denormalised values. Hence the following false negative (with a counterexample):

λ> quickCheck $ \f -> f == (f :: Double)
+++ OK, passed 100 tests.
λ> let inf = read "NaN" :: Double in inf == inf
False

The last problem is that this generator does not generate values around the bounds of Float or Double. This poses the following false negative:

λ> quickCheck $ \f -> length (show (round (f :: Double) :: Integer)) < 100
+++ OK, passed 100 tests.
λ> Prelude> length (show (round (1E200)))
200

Granted, this last one does not sound like a big problem, but a problem like this has already occurred in practice in a central package in the ecosystem.

In genvalidity, the generators are implemented as genFloatX castWord32ToFloat for Float and genFloatX castWord64ToDouble for Double:

-- | Generate floating point numbers smartly:
--
-- * Some denormalised
-- * Some around zero
-- * Some around the bounds
-- * Some by encoding an Integer and an Int to a floating point number.
-- * Some accross the entire range
-- * Mostly uniformly via the bitrepresentation
--
-- The function parameter is to go from the bitrepresentation to the floating point value.
genFloatX
  :: forall a w. (Read a, RealFloat a, Bounded w, Random w)
  => (w -> a)
  -> Gen a
genFloatX func =
  frequency
    [ (1, denormalised)
    , (1, small)
    , (1, aroundBounds)
    , (1, uniformViaEncoding)
    , (6, reallyUniform)
    ]
  where
    denormalised :: Gen a
    denormalised =
      elements
        [ read "NaN"
        , read "Infinity"
        , read "-Infinity"
        , read "-0"
        ]
    -- This is what Quickcheck does,
    -- but inlined so QuickCheck cannot change
    -- it behind the scenes in the future.
    small :: Gen a
    small = sized $ \n -> do
      let n' = toInteger n
      let precision = 9999999999999 :: Integer
      b <- choose (1, precision)
      a <- choose ((-n') * b, n' * b)
      pure (fromRational (a % b))
    upperSignificand :: Integer
    upperSignificand = floatRadix (0.0 :: a) ^ floatDigits (0.0 :: a)
    lowerSignificand :: Integer
    lowerSignificand = - upperSignificand -- Floating point numbers are symmetric
    (lowerExponent, upperExponent) = floatRange (0.0 :: a)
    aroundBounds :: Gen a
    aroundBounds = do
      s <- sized $ \n -> oneof
        [ choose (lowerSignificand, lowerSignificand + fromIntegral n)
        , choose (upperSignificand - fromIntegral n, upperSignificand)
        ]
      e <- sized $ \n -> oneof
        [ choose (lowerExponent, lowerExponent + n)
        , choose (upperExponent - n, upperExponent)
        ]
      pure $ encodeFloat s e
    uniformViaEncoding :: Gen a
    uniformViaEncoding = do
      s <- choose (lowerSignificand, upperSignificand)
      e <- choose $ floatRange (0.0 :: a)
      pure $ encodeFloat s e
    -- Not really uniform, but good enough
    reallyUniform :: Gen a
    reallyUniform = func <$> choose (minBound, maxBound)

This generator generates some denormalised values, some small values (like QuickCheck does), some values around the boundaries of the type, some values via the encoding (no denormalised values), but most values in a truly uniform manner. You may wonder what the point is of the truly uniform generator if there is already the uniformViaEncoding generator. The uniformViaEncoding generator will not generate different NaN or Infinity values, while the realyUniform generator certainly will generate different NaN values.

I am pleased to say that this new generator has turned all the above false negatives into true positives:

λ> quickCheck $ forAllUnchecked $ \f -> (f :: Double) < 200
*** Failed! Falsified (after 1 test and 867 shrinks):
6.953710695e12

λ> quickCheck $ forAllUnchecked $ \f -> (f :: Double) == f
*** Failed! Falsified (after 2 tests):
NaN

λ> quickCheck $ forAllUnchecked $ \f -> length (show (round (f :: Double) :: Integer)) < 100
*** Failed! Falsified (after 2 tests and 458 shrinks):    
1.0005000767033805e99

Here is a sample of what is being generated:

λ> sample (genUnchecked :: Gen Double)                   
NaN
-1.5382021342352537e-256
-1.4064318498034637e-185
-1.8756650198989648
2.3970675652314379e297
-Infinity
-0.0
1.5822616745384519e58
0.0
-4.436269974742419
Infinity

Generating `Integer` and `Natural`

Generating Integer and Natural values in QuickCheck also uses the arbitrarySizedIntegral and arbitrarySizedNatural functions. That means that these generators have the same problems as discussed above, but most importantly: They never generate the values that these types were made to contain. If your application uses Integer or Natural values, then I will argue that that means you expect to be dealing with values that are too big (or too small) for Int64 (or Word64). Unfortunately, the QuickCheck arbitrary generator will never generate those:

λ> sample (arbitrary :: Gen Integer)
0
-2
0
-4
3
0
8
-3
-15
-6
-2

The new genvalidity generator treats Integer like more of a list-like type:

genInteger :: Gen Integer
genInteger = sized $ \s -> oneof $
  (if s >= 10 then (genBiggerInteger :) else id)
    [ genIntSizedInteger
    , small
    ]
  where
    small = sized $ \s ->  choose (- toInteger s, toInteger s)
    genIntSizedInteger = toInteger <$> (genIntX :: Gen Int)
    genBiggerInteger = sized $ \s ->do
      (a, b, c) <- genSplit3 s
      ai <- resize a genIntSizedInteger
      bi <- resize b genInteger
      ci <- resize c genIntSizedInteger
      pure $ ai * bi + ci

It generates small integers, Int-sized integers and bigger integers:

λ> sample (genUnchecked :: Gen Integer)
0
1892136562016313391
-3
3
8567266908859058210
78953779483980475410579428557318571161
-8861699909278588009
-366627378773369755
8322693498186903360
-6
-12194936973173690077167751826204497798

Real world impact

If you are already using validity-based testing (kudos to you), then you should expect some more test failures when you upgrade to the latest version.

This upgrade has already found bugs in validity itself, in smos, in mergeless, in mergeful, in persistent, in intray and in tickler. This is essentially every project I tried it out on before releasing and one more (persistent).

Property testing in depth: genvalidity's fixed-size type generators

Generating Int and Word values

Generating Float and Double values

Generating Integer and Natural

Real world impact

Generating `Int` and `Word` values

Generating `Float` and `Double` values

Generating `Integer` and `Natural`