CS SYD - Testing the Super User Spark with HSpec

After having built a working system that I could use, Now I wanted the Super User Spark to be extremely safe and correct. Types had already helped me a lot. Now I turned to testing, rigorous testing.

Hspec

For the testing framework, I chose hspec. Hspec allows for integration with Quickcheck and HUnit and has you write a friendly DSL to define the tests.

Small-scale testing

Unit tests

When I first tried to implement tests for my code, I noticed that most of the code wasn't very test-friendly. I used a unwieldy monad transformer stack and hardly any code kept out of IO. Piece by piece, I refactored out as much pure code as I could and started to write some unit tests.

eol
  succeeds for a line feed
  succeeds for carriage return
  succeeds for a CRLF
  fails for the empty string
  fails for spaces
  fails for tabs
  fails for linespace

describe "eol" $ do
    it "succeeds for a line feed" $ do
        shouldSucceed eol "\n"
    it "succeeds for a carriage return" $ do
        shouldSucceed eol "\r"
    it "succeeds for a CRLF" $ do
        shouldSucceed eol "\r\n"
    it "fails for the empty string" $ do
        shouldFail eol ""
    ...

Property tests

I started with parser tests and I quickly noticed that writing unit tests for parsers is incredibly boring, especially for combinations of parsers. Property tests offered a nice shortcut. Now I could test the linespace parser for an arbitrary number of spaces and tabs instead of just the few unit test cases I would write.

linespace
  succeeds for spaces
  succeeds for tabs
  succeeds for mixtures of spaces and tabs

describe "linespace" $ do
    it "succeeds for spaces" $ do
        forAll (listOf $ pure ' ') $ shouldSucceed linespace
    it "succeeds for tabs" $ do
        forAll (listOf $ pure '\t') s shouldSucceed linespace
    ...

Property test failures

One of the nice features of property tests is that the tester, if antropomorfised, becomes obnauxious.

Me: This property should hold for ANY STRING, please test that.

Quickcheck: Okay, let's see.
Quickcheck: ...
Quickcheck: YOUR CODE IS WRONG, THAT PROPERTY DOESN'T HOLD FOR THIS STRING:
  "s¨E{¨hUÿ¸§lóFgh[è//XZ["î"

Me: Okay, but they're probably not going to use that as a filepath and ...

Quickcheck: IT ALSO DOESN'T HOLD FOR THIS STRING:
  "Ùá³53KìbWn/Ws9u`¸Í:c6îG¢!tG!wåDã4CXy:l19»ê|«FLh{±Bl,M Qu "

Me: Well, that's just ...

Quickcheck: AND THIS ONE:
  "òcm_#ne8p®yµöï.c3æïE<½ÞÐ2íOlçúw¡ÿ´+³sjii;MiëԻ_L)N}]N\Ó@*
   Ûоúm³Øçq§0u´b7µÙÒuAõLnχbY¥~FH¨`KIîjZ>jòÌúÈѤÝH§ÅF¡%àôށC¼
   ƐæÇě1?K°][ºãmÌqjóY¾UùOI7©RÉ#'¬%¦ñúLGõ5iÑÈǳåR¤ɲ|ÚN脪7˘'HÄ
   MdûÇÊôeñ«lۈEØøýh¨ûµ´6v®P}ÐXðյ¹ÒZo?¤µ-j§K¾ßâl|Y׶;ÅH)dâGà 
   $¾îÌՑ¹UrßÈؽýq1yDÙmיYý`^AYMt9O!¦õy¦Ï)±iú¸Ú@{8ôJA?Q¯yî»É3%"

Me: OKAY! okay, I get it!

Tests for impure functions

The most important part of spark is of course the deployer. In other words: the part that does something. These are arguably the most important parts to test.

I'd rather not test impure functions at all, but because I do have to, I refactored the impure code to pure code that outputs instructions for what to do when IO is available. Then I could test the side effects on a per-instruction basis. For example, here are some snippets from the refactored code:

data Instruction = Instruction FilePath FilePath DeploymentKind
    deriving (Show, Eq)

performDeployment :: Instruction -> IO ()
performDeployment (Instruction src dst kind)
  = case kind of
      LinkDeployment -> link src dst
      CopyDeployment -> copy src dst

copy :: FilePath -> FilePath -> IO ()
link :: FilePath -> FilePath -> IO ()

Now I could test these two impure functions:

let sandbox = "test_sandbox"
let setup = createDirectoryIfMissing sandbox
let teardown = removeDirectoryRecursive sandbox

beforeAll_ setup $ afterAll_ teardown $ do
  describe "copy" $ do
    it "succcesfully copies this file" $ do
      withCurrentDirectory sandbox $ do
        let src = "testfile"
        let dst = "testcopy"
        writeFile src "This is a file."

        diagnoseFp src `shouldReturn` IsFile
        diagnoseFp dst `shouldReturn` Nonexistent

        copy src dst -- Under test

        diagnoseFp src `shouldReturn` IsFile
        diagnoseFp dst `shouldReturn` IsFile

        dsrc <- diagnose src
        ddst <- diagnose dst
        diagnosedHashDigest ddst `shouldBe` diagnosedHashDigest dsrc

        removeFile src
        removeFile dst

        diagnoseFp src `shouldReturn` Nonexistent
        diagnoseFp dst `shouldReturn` Nonexistent

Regression tests

I found quite a few bugs while testing spark. I added a regression test for all the bugs I found. Some of these look somewhat funny if you don't know their story.

Somewhere in the deployment process, directories and their files needed to be hashed recursively. I naively implemented this with lazy ByteStrings. As a result, I got this error:

resource exhausted (Too many open files)

Apparently the implementation of lazy ByteString keeps files open until the contents are used. After I replaced the lazy ByteString implementation with strict ByteStrings, all was well. Now there is a test that gives you the following output:

hashFilePath
  has no problem with hashing a directory of 20000 files

Automatic black-box tests

To lower the bar of entry for people who've found a problem and would like to contribute, I implemented some automatic black-box tests.

Binary black-box tests

For the parser and the compiler, there are some binary tests. These tests pass or fail based on whether the parser/compiler succeeds or fails. They only require you to add a source file to the appropriate directory.

Correct succesful parse examples
  test_resources/shouldParse/with_quotes.sus
  test_resources/shouldParse/short_syntax.sus
  test_resources/shouldParse/littered_with_comments.sus
  test_resources/shouldParse/empty_card.sus
  test_resources/shouldCompile/bash.sus
  test_resources/shouldCompile/complex.sus
Correct unsuccesfull parse examples
  test_resources/shouldNotParse/empty_file.sus
  test_resources/shouldNotParse/missing_implementation.sus
Correct succesful compile examples
  test_resources/shouldCompile/bash.sus
  test_resources/shouldCompile/complex.sus
Correct unsuccesfull compile examples
  test_resources/shouldNotParse/empty_file.sus
  test_resources/shouldNotParse/missing_implementation.sus

Exact black-box tests

For the compiler, there are some black-box tests that require you to add both a card file and the exact desired compilation result to the appropriate directory.

exact tests
  test_resources/exact_compile_test_src/bash.sus
  test_resources/exact_compile_test_src/alternatives.sus
  test_resources/exact_compile_test_src/internal_sparkoff.sus
  test_resources/exact_compile_test_src/nesting.sus
  test_resources/exact_compile_test_src/sub.sus
  test_resources/exact_compile_test_src/sub/subfile.sus

End-to-end tests

To ensure end-to-end correctness, there are also some tests that cover the full spectrum of operation. From argument parsing to interaction with the file system.

This kind of test simply calls the main :: IO () function with withArgs :: [String] -> IO a -> IO a and specific arguments.

EndToEnd
  standard bash card test
    parses correcty
    compiles correctly
    checks without exceptions
    deploys correctly