Testing agenix-rekey
What's agenix-rekey
One notorious problem on NixOS system is secret management.
You want all your configuration
as part of your nix config, but that config will then end up in your /nix/store
where everything is world readable. Or even worse you want to publish your config on GitHub so
other can copy what you're doing. So what do you do with secret values? With your password, api keys, your access
tokens?
One solution to this problem is agenix. What this does is that instead
of the unecrypted file, your store and public repository contain age
encrypted files. As age
can use ssh keys
for encryption, agenix
uses your system hostkey(the on in /etc/ssh/ssh_host_ed25519_key
) for the cryptography.
On bootup(or any system activation) agenix
then decrypts the files from the store to a predefined location, where your
programs can read and use them.
There are a few downsides to this however. Firstly you need to use your hostkey for encryption, which usually is not a problem, as it's never supposed to leave your system and should be kept secret, but still I prefer having my secret keys on a hardware key, such as a yubikey, for additional safety. This way I there is no way to leak them in any way. But secondly to edit your files you need your private key. Which is not a problem for secrets intended for your current host, but becomes a problem if changing secrets for another host. And suddenly problem one also becomes relevant again, as you're now forced to copy around your private keys.
So oddlama and I developed a frontend for agenix
called agenix-rekey.
The idea being that the secrets in our repository are encrypted for the yubikey, and will then be rekeyed,
decrypted and reencrypted, for a different
recipient, in this case the host on which they are needed.
This works great, it has some problems of course, but in general we are very happy with it. It has even incorporated some additional capabilities, such as secret generating which is another thing unsolved in upstream NixOS.
On glaring issue so far however has been the lack of tests. Most of the project consists of generating increasingly complicated bash files, a language known for some weird edge cases. And with another PR open that adds more logic to the scripts, I thought it time I'll try and add some testing.
Nix tests
Something to note about test in nix is that they are just builds,
if the build suceeds, the test passed, if it didn't, it failed.
This has among other, one obvious advantage, builds are usually executed in a sandbox, so any side effects
the test may have, will not make it to your actual system. Since agenix-rekey
does have
some side effects on the system, I knew I wanted to use the nix-way of doing tests, to avoid having
to deal with these effects.
Actually there are two approaches to nix testing that you may encounter. On one side the normal nix way, just a build that does some things, tests some conditions. These are your normal nix package tests, that just run the upstream package's tests. Then there are NixOS tests, these are usually used in NixOS to test modules. Here you have a complete framework for setting up vm, configuring them, and running them, with your actual test being invariants that the running system has to fullfil.
They each have their own advantages and disadvantages, the normal test are of course way
more lightweight as they don't need to spawn at least one VM for each test. But they are run
in the normal nix sandbox, which has some limitations, most notably you cannot call nix
in
it unless you specifically enable the recursive-nix
option.
This is relevant for use because the way agenix-rekey
works is by calling nix
internally to
query the secrets it needs to reencrypt. So I either had to make the test dependent on recursive-nix
or use the VM based approach.
I choose the second option, also because it allows me write things into the system, e.g. the ssh host-key
that I am prohibited from acessing in the normal nix sandbox.
So easy, right? Spawn a VM copy the flake into it, call agenix rekey
or whatever script I want to test,
and see if it returns some secrets.
Humble beginnings
Basic setup was as easy as expected, but it soon turned out this was going to be harder than expected.
First we need two flakes, the 'testing' flake where the test is defined and ran, and the 'tested' flake,
containing the definitions to be tested.
The testing flake will then spawn the VM, copy the tested flake into it and run a predefined script.
This script should then usually use agenix-rekey
in some way and check the outputs, or check if agenix
itself is able to successfully activate afterwards. This, however, came with some problems.
NixOS test, as any normal nix build, do not have internet access during building. This is done to ensure reproducibility, if you want to download anything you need to use a fixed output derivation. Basically you need to tell nix the hash of your resulting artifact to ensure reproducibility. This not easily possible for tests, as by design they will be run with a different input everytime, and thus their output will change.
The solution is to make sure everything that's needed in the VM is downloaded and built beforehand, and then given as an input to the test, so it does not need to access the internet.
There are two things agenix-rekey
needs to access when executing.
On one hand it needs the flake-inputs, as a flake-only project we expect to be able to partly evaluate a
local flake. For this to function the flake inputs need to be cached.
On the other hand the script itself needs some runtime input, such as age
, so it can fulfill its purpose.
Flake inputs
A flake is basically just a function, taking inputs and computing outputs.
These inputs are themselves either flakes or some other fixed output derivation.
A flake.lock
file is used to lock the inputs hashes and revision.
In an ideal world there would be a import_flake
nix builtin, that parses the flake.lock
and generates the necessary input
derivations from it. That however does not exist as far as I know(flake-compat does something
close to it I think, but I don't think it exposes a way to access the input derivation easily).
But there is a way to override an input, basically telling nix: use this folder as the input instead of downloading it from the internet. This might be usefull, because it means we just need any checked out version of the input we can copy that into the flake and set it as the version to use. Gladly we have a way to get nix to give us a checked out version of any flake input, just use them as an input to the testing flake.
Thus we can add any inputs the tested flake needs as an input to the testing flake, copy them to the VM and set them as overrides.
However this means that the testing flake inputs have to be a superset of the tested flake's inputs.
This is the reason why the test are in separate flake from agenix-rekey
itself as we did not want everyone using agenix-rekey
to have to download all the test dependencies and flakes don't have any possibility for dev dependencies.
Another problem is transitive inputs. As inputs to flakes are usually flakes they can have inputs themselves, which are of course also needed when evaluating the flake. The only way I know of, how to make these inputs overridable is to have them follow a top-level flake input, ensuring the input structure is flat.
Flat inputs
To minimize storage flakes allow you to override transitive inputs with your own input. Basically if you have a flake inputs like this:
inputs = {
nixpkgs = {
url = "...";
};
agenix = {
url = "...";
inputs.nixpkgs = "...";
};
}
Here we have nixpkgs
as an input to our flake, as well as agenix
. agenix
however also has nixpkgs
as an input. This result in two
nixpkgs
as inputs to our flake, our own and the one agenix
locked in its flake.lock
. These might be the same on accident,
but this is highly inlikely, especially for nixpkgs
with it's high number of commit.
We can however force agenix to use our nixpkgs
instead of its own:
inputs = {
nixpkgs = {
url = "...";
};
agenix = {
url = "...";
inputs.nixpkgs.follows = "nixpkgs";
};
}
This way agenix will use whatever the top-level nixpkgs
input is, if we override that it will be used for everything.
Using this trick as long as we override all inputs and we can be sure the flake will be evaluatable.
Runtime inputs
The other type of dependency needed are the runtime dependencies of the rekey scripts.
These mostly consist of age
, or rage
if you so wish, and various coreutils.
They are needed in two scenarios:
On one hand you might want to call something as part of your test.
This is rather easy to solve by
just adding it to the runtimeInputs
of the test script. This will ensure
they are added to the PATH
and can then easily be called.
The other case is more difficult. While the flake input overwrite allowed us to evaluate the flake
we also need to be able to build the agenix-rekey
scripts, without downloading anything.
This means we need to ensure that all needed derivations are already part of the nix store.
Normal nix derivation are uniquely identified by their input, which include all dependencies, but also all
build instructions and meta informations. So by overriding the flake inputs with our own previously we already
ensured that the most basic inputs are the same. So as long as we ensure that we instantiate nixpkgs the same way and
don't introduce any weird overrides, we can als just add the needed dependencies to our runtimeInputs
and it should just work.