DANE broke

21. March 2025

Intro

For reasons unknown I am runnig my own mailserver on my own hardware. This forces me to indulge in quite a bit of setup to make SPF, DKIM and everything inbetween happy, of course all with the added fun factor that a single error might earn you a spamlist place, making your domain, your IP and probably your whole existence completey unusable.

So you might understand my shock when a few weeks ago I received an email to the postmaster address from a DANE survey warning me of an "Email server technical issue".

Now as hinted to above modern email has quite a few extensions and protocols that try to make it secure and as free of spam as possible. I guess protocols developed 40 years ago don't quite hold up modern standards or attacks. And since god forbid we break backwards compatibility, we just have to keep on adding new semi useful extensions, don't want to make it too easy now would we?

Anyway. Never having heard of DANE I was guessing it's another one of those extensions. The email contained the message:

The TLSA RRsets of some of your email servers do not match their actual
certificate chains.

which substantiated my suspicion. It also told me that is had something to do with DNS, RRsets being sort hand for Ressource Record sets, which are the basic informations sets of DNS.

At this point I was confused, stalwart, my current mailserver of choice, gave me the DNS records it said I should use, and they've seemingly been working fine for the last year. What changed? Did I do something wrong or did stalwart? Has it actually worked previously, or was it always broken and everyone just ignored it?

So let's see what this dane is about and what went wrong.

DANE

DNS-based Authentication of Named Entities, DANE for short is a extension for DNS that allows you to associate Certificates with your domain.

Why do we want this

Hopefully I don't have to explain why we want certificates, (TLDR: Encryption good, Man-in-the-middle bad). But certificates are only as secure as the entities that can generate them. Those entities are usually defined by your browser's, or operating system's included list of certificate authorities. For Firefox, as an example, this includes basically every large telecommunications provider in the world, Google, Amazon and a bunch of other smaller companies most people have never heard of.

But they didn't add Honest Achmed. I mean he literally has honest in his name, why wouldn't you trust him?

Currently any one of those can generate a valid certificate for your website/mail, and thus if any one of those has a security breach, your mail and website is impacted. Pretty annoying. DANE is one of the solutions to this problem. While not supported in any of the major browsers it seems to be somewhat in use for mail.

Specifics

DANE uses DNS records to define which certificates should be accepted for the given DNS name. These records, called TLSA contain 4 fields, 3 numbers declaring what, and how the valid cetificates are restricted.

Usage

The first field describes what we are restricting. DANE can be used in addition to or as an alternative to the certificate. A value of 0 or 1 means that the certificate still has to be accepted by your local trust score while as value of 2 or 3 means if the DANE entry matches you do not have to verify the root with a CA.

The other dimension for match is if the specific certificate has to match or if any certificate in the chain is sufficient. With a value of 1 or 3 the match has to be done on the end certificate, whereas with 0 or 2 any intermediate certificate can match, this includes the root.

Selector

The second number sets if the whole certificate should be checked(0), or if it's sufficient to check the public key(1).

Matching type

The last number describes if the matching data is contained in full(0) or if only a sha-256(1) or a sha-512(2) is appendend

Data

Lastly the data with which the selected certificate part should match is contained in the entry

Problem

Now that we know this, what went wrong with stalwart? Well stalwart by default tries to include as many TLSA entries as possible. This results in a list of 8 different entries. Namely these:

3 0 1
3 0 2
3 1 1
3 1 2
2 0 1
2 0 2
2 1 1
2 1 2

These then contain the respective values for my currently in use certificate. Relevant for us is basically only the first number. We don't care whether the public key or the actual certificate is restricted neither do we care whether the whole or just a hash is compared. We do however care whether any chain element has to match and what we are matching.

I am using LetsEncrypt for my certificates, and I'm not even managing them myself but am using lego with the help of the nixos acme module. This means that every few weeks, or days once the switch to short lived certificates is done, I automatically get a new certificate without doing anything. This of course implies I'm also not changing my dns entries for every new certificate.

This explain why things suddenly broke. My old certificate ran out and the new one did not match anymore. But it's been working for a while before that, definitely longer than the 90 my certificates are valid for, so why did it survive the previous change?

This is where the 2 and 3 become relevant, recall that the 2 means we match any intermediate or root certificate in the chain, while the 3 demands an exact match on the final certificate. Turns out the 3 certificate where actually broken since the first certificate change after setting up the server. I just never noticed because with DANE as long as one entry passes its a success.

Then what changed with the latest switchover? LetsEncrypt gave me a different root. Stalwart puts the parent certificate your current certificate is signed by, in the 2 entries, which for me used to be LetsEncrypt's E5 certificate. But LetsEncrypt has multiple intermediate certificates, and the latest one I was signed by was E6. So it was just by chance that it worked this far, because all my certificates where signed by the same intermediate, which was also contained in the TLSA records. With the latest however suddenly I was also using a different intermediate and none of the entries matched anymore.

Solutions

The best one

The best solution would be to actually pin the exact certificate or key you're using. This way not even LetsEncrypt can mitm you. From the same people that warned me of my misconfiguration comes this approach. Basically to deal with DNS propagation, you should always have two records declaring the public key to accept. One for the key currently in use, and one for the next key. On certificate switch you will generate your new certificate with the private key corresponding to the public key inthe DNS entry, which you generated earlier. You then delete the now old entry and generate a new key-pair for next time, whose public key you add to the DANE records. This way when you switch you don't have a day of downtime, while the DNS servers catch up to your changes.

While this would be optimal it is annyoing to do with letsencrypt certificates, as I would have to make sure the certificates use privates key provided by me and would have to remember on each rollover to change the DNS entries.

For now I'm fine with the somewhat less secure option of just pinning the LetsEncrypt intermediate public keys, 2 1 * entries. This website quite nicely provides a list of all currently in use key. This should work for a while. I just have to keep lookout for when LetsEncrypt ever generates a new public key, which I will then have to add to my DNS entries.

Other sources

wikepedia

common_mistakes

LE forum

https://blog.lel.lol/blog/atom.xml