Thursday, April 17, 2014

software has bugs... now what?

The recent bugs uncovered in TLS/SSL implementations, were received in the blogo-sphere with a quest for the perfectly secure implementations, that have no bugs. That is the modern quest for perpetual motion. Nevertheless, very few bothered by the fact that the application's only security defence line were few TLS/SSL implementations. We design software in a way that openssl or gnutls become the Maginot line of security and when they fail they fail catastrophically.

So the question that I find more interesting, is, can we live with libraries that have bugs? Can we design resilient software that will operate despite serious bugs, and will provide us with the necessary time for a fix? In other words could an application design have mitigated or neutralized the catastrophic bugs we saw? Let's see each case, separately.

Mitigating attacks that expose memory (heartbleed)

The heartbleed attack allows an attacker to obtain a random portion of memory (there is a nice illustration in xkcd). In that attack all the data held within a server process are at risk, including user data and cryptographic keys. Could an attack of this scale be avoided?

One approach is to avoid putting all of our eggs in one basket; that's a long-time human practice, which is also used in software design. OpenSSH's privilege separation and isolation of private keys using smart cards or software security modules are two prominent examples. The defence in that design is that the unprivileged process memory contains only data that are related the current user's session, but no privileged information such as passwords or the server's private key. Could we have a similar design for an SSL server? We already have a similar design for an SSL VPN server, that revealing the worker processes' memory couldn't possibly reveal its private key. These designs come at a cost though; that is performance, as they need to rely on slow Inter-Process Communication (IPC) for basic functionality. Nevertheless, it is interesting to see whether we can re-use the good elements of these designs in existing servers.

Few years ago in a joint effort of the Computer Security and Industrial Cryptography research group of KU Leuven, and Red Hat we produced a software security module (softhsm) for the Linux-kernel, that had the purpose of preventing a server memory leak to an adversary from revealing its private keys. Unfortunately we failed to convince the kernel developers for its usefulness. That wasn't the first attempt for such a module. A user-space security module existed already and called LSM-PKCS11. Similarly to the one we proposed, that would provide access to the private key, but the operations will be performed on an isolated process (instead of the kernel). If such a module would be in place in popular TLS/SSL servers there would be no need to regenerate the server's certificates after a heartbleed-type of attack. So what can we do to use such a mechanism in existing software?

The previous projects are dead since quite some time, but there are newer modules like opendns's softhsm which are active. Unfortunately softhsm is not a module that enforces any type of isolation. Thus a wrapper PKCS #11 module over softhsm (or any other software HSM) that enforces process isolation between the keys and the application using it would be a good starting point. GnuTLS and NSS provide support for using PKCS #11 private keys, and adding support for this module in apache's mod_gnutls or mod_nss would be trivial. OpenSSL would still need some support for PKCS #11 modules to use it.

Note however, that such an approach would take care of the leakage of the server's private key, but would not prevent any user data to be leaked (e.g., user passwords). That of course could be handled by a similar isolation approach on the web server, or even the same module (though not over the PKCS #11 API).

So if the solution is that simple, why isn't it already deployed? Well, there is always a catch; and that catch as I mentioned before is performance. A simple PKCS #11 module that enforces process isolation would introduce overhead (due to IPC), and most certainly is going to become a bottleneck. That could be unacceptable for many high-load sites, but on the other hand that could be a good driver to optimize the IPC code paths.

Mitigating an authentication issue

The question here is what could it be done for the bugs found in GnuTLS and Apple's SSL implementation, that allowed certain certificates to always succeed authentication. That's related to a PKI failure, and in fact the same defences required to mitigate a PKI failure, can be used. A PKI failure is typically a CA compromise (e.g., the Diginotar issue).

One approach is again on the same lines as above, to avoid reliance on a single authentication method. That is, use two-factor authentication. Instead of relying only on PKI, combine password (or shared-key) authentication with PKI over TLS. Unfortunately, that's not as simple as a password key exchange over TLS, since that is vulnerable to eavesdropping once the PKI is compromised. To achieve two factor authentication with TLS, one can simply negotiate a session using TLS-PSK (or SRP), and renegotiate on top of it using its certificate and PKI. That way both factors are accounted, and the compromise of any of the two factors doesn't affect the other.

Of course, the above is a quite over-engineered authentication scenario, requires significant changes to today's applications, and also imposes a significant penalty. That is a penalty in performance and network communication as twice the authentication now required twice the round-trips. However, a simpler approach is to rely on trust on first use or simply SSH authentication on top of PKI. That is, use PKI to verify new keys, but follow the SSH approach with previously known keys. That way, if the PKI verification is compromised at some point, it would affect only new sessions to unknown hosts. Moreover, for an attack to be undetected the adversary is forced to operate a man-in-the-middle attack indefinitely, or if we have been under attack, a discrepancy on the server key will be detected on the first connection to the real server.

In conclusion, it seems that we can have resilient software to bugs, no matter how serious, but that software will come at a performance penalty, and would require more conservative designs. While there are applications that mainly target performance and could not always benefit from the above ideas, there are several applications that can sacrifice few extra milliseconds of processing for a more resilient to attacks design.