Date 2021-10-24

This post outlines different types and levels of testing that you can add to your testing portfolio.

The central assumption is that the marginal effectiveness of tests drops as you add more similar tests to your portfolio. So to get the most bang for your buck, you want to add as many different types of tests to your portfolio.

### Purpose of a testing portfolio

To write code that works, you want to give it every opportunity to break during development, so that developers can fix the breakage before the code breaks for customers. Testing gives your code an opportunity to break earlier.

These are some other ways to phrase this:

• Testing lets you provide evidence that your code is not broken in the ways that you check for.
• Testing helps you to check that your code doesn't not work. (Note the intentional double negative.)

### Principles of a testing portfolio

Different types of testing can be categorised by the capabilities they require. Some tests can be written as a single boolean expression that should evaluate to True, others need a file system, others still should be run against the staging version of the system under test.

What follows are guiding principles for your testing portfolio.

(Note that all of the tests below can be example-based tests, table-based tests, randomised property tests, exhaustive property tests,... as well.)

#### Principle of simplicity

Tests should be written with the least number of required capabilities.

For example: If a piece of code can be tested (without mocking!) without the file system, then that is preferable to writing the same test in a way that it would interact with the file system. This helps you to discover bugs sooner.

When a bug is found using a complex test, it can also be helpful to write increasingly granular regression tests to find the problem before fixing it.

#### Principle of least capability

Tests should be run with the least number of capabilities provided.

For example: Do not make the network available to tests that should not require it.

By providing too many capabilities, you can introduce flakiness and will most likely slow down your test suite. You will also make it more difficult to debug any code that is found lacking.

In what follows, I will use nix as an example system that you can use to minimise the provided capabilities You can use any such system, but you must not neglect this aspect of your portfolio, otherwise all your tests are bound to become end-to-end tests.

### Levels of testing

This is an overview of the typical levels of testing that you will want to add to your portfolio. Note that this is not a comprehensive list, but rather a starting point to start thinking about your own portfolio.

The attributes by which we categorise levels of testing are the following:

• Purity: Is the test side-effect free?
• Local resources: Can the test use local resources like a local file system, the local network?
• Pollution: Does the test cause test pollution?
• Internet: Can they use the non-local network?
• System under test: Which system is under test? Is it the version that is currently being developed, or one in an environment like staging?
• Location of test: Which system is doing the testing? Is it the version that is currently being developed, or one in an environment like staging?

#### Level 0: Language-embedded tests

Language-embedded tests are not usually called tests. The are all the things that your programming language takes care of for you. In any sensible language, this includes:

1. Your code has parses, i.e. has no syntax errors.
2. Your code type-checks, i.e. does not contain type errors.
3. Your code compiles, i.e. can be run at all.

(If your language of choice does not have these, good luck making anything that works. COUGH Python COUGH)

                             | Pure | Filesystem | Pollution | Internet | Against  | Where    | How
-----------------------------|------|------------|-----------|----------|----------|----------|------------------
Language-embedded tests     | Yes  | No         | No        | No       | Local    | Local    | during nix build


#### Level 1: Pure tests

Pure tests are a boolean expression that should evaluate to True. They tests do not use any resources at all. Pure tests can be run as part of a regular test suite as supported by your language of choice.

They should be run in parallel with parallelism only limited by the hardware they are run on. They can be executed in a sandbox like the one that a nix-build provides.

                             | Pure | Filesystem | Pollution | Internet | Against  | Where    | How                               |
-----------------------------|------|------------|-----------|----------|----------|----------|-----------------------------------|
Pure tests                  | Yes  | No         | No        | No       | Local    | Local    | during nix build                  |

• Pure tests should never be flaky.
• Pure tests will probably be the fastest. Estimate about 100-1000 per second.

Pure tests can catch small-scale (but no less important) logical errors.

#### Level 2: Local tests

Local tests are a piece of code that is considered passing if it does not crash. They may use any local resource, as long as they take care of test pollution. They can be run as part of a regular test suite as supported by your language of choice.

This means that they may use the local file system as long as:

1. They clean up after themselves.
2. Other tests do not interact with the same part of the file system.

This is usually done using temporary directories.

Local tests may also use the local network to set up a real local server as long as:

1. The server and its dependencies is tore down cleanly.
2. Other tests do not interact with the same server.

This is usually done by choosing an arbitrary open port to have the server listen on.

Local tests should be run in parallel with parallelism limited by the resources that they consume. They can also be executed in a sandbox like the one that a nix-build provides, as long as the dependencies can be made available reproducibly.

                             | Pure | Filesystem | Pollution | Internet | Against  | Where    | How
-----------------------------|------|------------|-----------|----------|----------|----------|------------------
Local tests                 | No   | Yes        | No        | No       | Local    | Local    | during nix build

• Local tests should never be flaky.
• Local tests will probably be slower than unit tests, but quite fast still. Estimate about 100 per minute.

Local tests can catch medium-scale errors, as well as resource usage errors, test pollution issues and thread safety issues.

#### Level 3: Local End-to-end tests

Local end-to-end tests are a system-wide test that can span multiple machines across a virtual network that unites them. They may use any resource within that virtual network. They are the first level of end-to-end test in that they do not avoid test pollution, but instead want to test an entire system from start to finish.

Local end-to-end tests may not access the internet as a whole, but instead have to stay inside their virtual network.

To run such tests, one can use NixOS tests to virtualise the machines and network under test.

                             | Pure | Filesystem | Pollution | Internet | Against  | Where    | How
-----------------------------|------|------------|-----------|----------|----------|----------|---------------
Local end-to-end tests      | No   | Yes        | Yes       | No       | Local    | Local    | in nixos test

• Local end-to-end tests could be flaky because of timing issues. When this happens, it likely points to a real problem.
• Local end-to-end tests are slow. Estimate 1 minute per test.

Local end-to-end tests can catch problems that only happen in a real system but not in integration tests. They are also a good way to test deployment because your system is deployed to a VM.

#### Level 4: Remote end-to-end tests

Remote end-to-end tests run tests against a deployed system in a non-production environment like staging. They may access the internet (including that deployed environment).

(One can use the same code that they use for their local end-to-end tests but run the test against staging instead of a virtual network.)

To run such tests, one can use a local script with dependencies provided by something like an impure nix-shell.

                             | Pure | Filesystem | Pollution | Internet | Against  | Where    | How
-----------------------------|------|------------|-----------|----------|----------|----------|--------------------------------
Remote end-to-end tests     | No   | Yes        | Yes       | Yes      | Staging  | Local    | from impure nix-shell, locally

• Remote end-to-end tests could be flaky because of all sorts of reasons. Such flakiness does not always point to a real problem. Consider unreliable networks for example.
• Remote end-to-end tests are slow and are bound by the performance of the system under test's load as well. Estimate 1-10 minutes per test.

Remote end-to-end tests can catch problems that only happen in a real system that carries over state across tests. For example: database migration issues that can only become apparent when a database already exists. They are a good way to test issues with long-lived state.

#### Level 5: Continuous end-to-end tests

Continuous end-to-end tests run as a real deployment in a non-production environment like staging. They test another system that is also in a non-production environment, like staging.

It is good practice to run these tests periodically, for example every night, instead of just once.

                             | Pure | Filesystem | Pollution | Internet | Against  | Where    | How
-----------------------------|------|------------|-----------|----------|----------|----------|------------------------
Continuous end-to-end tests | No   | Yes        | Yes       | Yes      | Staging  | Staging  | In separate deployment

• Continuous end-to-end tests can be flaky for any number of reasons. Such flakiness does not always point to a real problem.
• Continuous end-to-end tests are slow, but run asynchronously. They may take up to the entire period of their recurrence.

Continuous end-to-end tests can be used to provide evidence of compatibility:

 Environment of the tests | Environment of the system under test | Purpose
-----------------------------------------------------------------------------------------------------------
Testing                  |  Testing                             | Current compatibility of the new system
Testing                  |  Staging                             | Backward compatibility
Staging                  |  Testing                             | Forward compatibility
Staging                  |  Staging                             | Current compatibility of the old system


Continuous end-to-end tests can catch compatibility problems, problems that only occur across longer spans of time, as well as issues with long-running deployments.

### Conclusion

• There are different levels of testing, and it is good to know about them.
• Write tests at many different levels, it gets you more results for your efforts.
• Constrain flakiness where possible.

### Appendix: Overview

                             | Pure | Filesystem | Pollution | Internet | Against  | Where    | How
-----------------------------|------|------------|-----------|----------|----------|----------|--------------------------------
Language-embedded tests     | Yes  | No         | No        | No       | Local    | Local    | during nix build
Unit tests                  | Yes  | No         | No        | No       | Local    | Local    | during nix build
Local tests                 | No   | Yes        | No        | No       | Local    | Local    | during nix build
Local end-to-end tests      | No   | Yes        | Yes       | No       | Local    | Local    | in nixos test
Remote end-to-end tests     | No   | Yes        | Yes       | Yes      | Staging  | Local    | from impure nix-shell, locally
Continuous end-to-end tests | No   | Yes        | Yes       | Yes      | Staging  | Staging  | In separate deployment


Announcing autodocodec

Know a technical team that could use strong technical leadership?

Test pollution and how to fix it