Nix does not guarantee reproducibility

This post serves to show that Nix does not guarantee that builds are reproducible so that we may learn and improve our builds.

Some of the following problems are not solvable. In such a case there may be ways to mitigate some of them.

However, it is important that we don't lie about them when evangelising Nix.

Why bother writing this?

I have been building and deploying software with Nix since 2018, so I have seen a lot of Nix evangelism. I use Nix to achieve what I think is a competitive advantage in being able to ship faster and more reliably, and I recommend using Nix to my clients as well.

However, I also see exaggerated claims about Nix, specifically the idea that "Nix guarantees reproducibility". This can lead newcomers to try Nix, only to be disappointed when they find out that it does not. These would-be users can then also (justifiably) distrust Nix users because "if they can't even be honest about this, what else are they lying about?".

This situation is frustrating because Nix represents real improvements to reproducibility, but is not a silver bullet. As such, driving would-be users away with lies does them a disservice.

This post serves as proof that Nix does indeed not guarantee reproducibility, but that should not stop you from using it for the sake of improved reproducibility.

Reproducibility

It is not quite clear what is meant when people say "Nix guarantees reproducibility", so below are different definitions, each with examples of Nix does not guarantee them.

Reproducible successes

Definition

A build is reproducibly successful if and only if "If it succeeds to build once, it will always succeed to build." holds.

Counterexamples

Resource no longer available

unavailablePage = builtins.fetchurl {
  url = "https://vine.co/MyUserName";
  sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
};

Fixed-output derivations produce the same result every time if they succeed (and hashing is not broken). However, there's nothing to guarantee that the output can indeed be reproduced at all. Sometimes resources on the internet become unavailable for reasons entirely beyond our control.

In this build, for example, we try to fetch a web page that is no longer available (a Vine user profile).

$ nix build .#unreproduciblePackages.x86_64-linux.unavailablePage
error: unable to download 'https://vine.co/MyUserName': HTTP error 404

This build would have succeeded once upon a time, but will no longer succeed.

A real-world example of where this happens quite often is old versions of LaTeX libraries. If you write a book in LaTeX and package it with nix, you need to keep the build up to date in order to be able to build the book again in the future.

One possible mitigation would be to use Nix's caching mechanism to make sure that you transparently cache all required resources in-house. This can work as long as you can share this cache with anyone else who wants to perform the same build, and you don't even need to trust the cache because you have specified the hash of the result.

Random failure

randomSuccess = pkgs.runCommand "random-success" { } ''
  if [[ "$RANDOM" > 16000 ]]
  then
    exit 1
  else
    echo "true" > $out
  fi
'';

When builds use randomness, whether they succeed or not may depend on that randomness. In such a case, the build will sometimes fail where previously it would have passed.

In this build, for example, we access randomness to fail about half of the time:

$ nix build .#unreproduciblePackages.x86_64-linux.randomSuccess
error: builder for '/nix/store/p8a5a8fijg7qh464c0mvvh5yzndsx7vm-random-success.drv' failed with exit code 1
$ echo $?
1

$ nix build .#unreproduciblePackages.x86_64-linux.randomSuccess
$ echo $?
0

A real-world example of where this might happen is builds that run randomised property tests without a fixed seed. Indeed, when you run property tests, it is important to choose a fixed-seed in your builds.

Nix could try to mitigate this problem by not making randomness available to non-fixed-output derivations, but should not do that because that would mean that Nix builds could never generate secrets. Perhaps Nix could make a new type of alternative build for producing randomness so that this issue could be compartmentalised, but it is not clear whether that would be an effective solution to any real problem.

Reproducible failures

Definition

A build fails reproducibly if and only if "If it fails to build once, it will always fail to build." holds.

Counterexamples

Benchmarks

When (naively) running a benchmark in a Nix build, and failing if the benchmarked software is not fast enough, you will find that the build succeeds (or not) depending on how powerful the build machine is. Indeed, the build might pass on a beefy machine while failing on a laptop.

Nix might be able to mitigate this issue by requiring certain hardware before attempting a build, but this would still only be a partial solution because the hardware could be shared or otherwise compromised.

Resource usage

Some (all?) builds will fail if run on a machine with insufficient resources. Indeed, builds need some amount of memory and some amount of disk space to succeed.

Nix could maybe mitigate this issue by learning about the resource requirements that builds have and fail early. However, this would not fix the issue because of the randomness problem outlined in a previous section.

Reproducible results

Definition

A build has reproducible results if and only if "If it builds successfully, the result will be bit-for-bit identical to any other successful build".

You might think that a version of this definition could be "If it succeeds to build, the result will be practically equivalent to any other successful build". However, adversarially speaking, this is not a weaker definition but instead an equivalent definition.

Indeed, it cannot be determined whether a non-bit-for-bit-equal build is "functionally equivalent" in general, so we must assume that they are not.

Counterexamples

Producing randomness

randomOutput = pkgs.runCommand "random-output" { } ''
  echo $RANDOM > $out
'';

A build might produce randomness as part of its output. As such, the output could be different across builds.

In the following build, a different number is produced every time, and we can see (with --rebuild) that Nix can tell that it's not a deterministic build:

$ nix build .#unreproduciblePackages.x86_64-linux.randomOutput --rebuild
error: derivation '/nix/store/aisn9vhwqlkay45zj2p3v6h9yhjb6ll2-random-output.drv' may not be deterministic: output '/nix/store/s6y0k5kdmiwy6jrv7bqjgsgrhd21s2my-random-output' differs

A real-world example of this is a build that produces a test "secret" key.

Nix could try to mitigate this problem by not making randomness available to non-fixed-output derivations, but should not do that because that would comprise a backdoor in builds. Indeed, one could predict any secret that Nix might generate, making it no longer secret.

Producing output based on multithreading

Some builds produce different output based on how threads are scheduled. The GHC Haskell compiler does this, for example. The following is a build of a Haskell package (this applies to almost any Haskell package):

$ nix build .#unreproduciblePackages.x86_64-linux.multithreadedOutput --rebuild
error: derivation '/nix/store/iqxgnqjm57qpfxnlncghirapqm6gg0y8-validity-0.12.0.1.drv' may not be deterministic: output '/nix/store/ibkkj6xxdhdgw3rn1bs6iizyq6ivq0jx-validity-0.12.0.1' differs

This is a longstanding GHC issue and not at all unique to GHC. As long as GHC is bug-free, this shouldn't matter for results.

Nix could mitigate this issue by running all builds on a single core, but that should not because that would slow down builds massively. It also wouldn't necessarily help because running a build on one core does not prevent GHC from spawning multiple green threads anyway.

Conclusion

The strongest claim I want to make about how Nix works is:

Nix does the opposite of what you would do if you were deliberately trying to muck things up.

In other words: Nix is the best chance I have to do builds in a sane way.

Any claim similar to "Nix makes builds reproducible" or "You can rely on Nix builds to be reproducibility" are evidently (see above) false.

References

This text was originally posted on GitHub. Each of the builds above can still be found in the flake.nix there.