nmav's Blog: 2014

Wednesday, December 3, 2014

A quick overview of GnuTLS development in 2014

2014 was a very interesting year in the development of GnuTLS. On the development side, this year we have incorporated patches with fixes or enhanced functionality from more than 25 people according to openhub, and the main focus was moving GnuTLS 3.3.x from next to a stable release. That version had quite a number of new features, the most prominent being:

Initialization on a library constructor,
The introduction of verification profiles, i.e., enforce rules not only on the session parameters such as ciphersuites and protocol version, but also on the peer's certificate (e.g., only accept a session which provides a 2048 bit certificate),
The addition of support for DNS name constraints in certificates, i.e., enforce rules that restrict an intermediate CA to issue certificates only for a specific DNS domain,
The addition of support for trust modules using PKCS #11,
Better integration with PKCS #11 smart cards, and Hardware Security Modules (HSMs); I believe we currently have one of the better, if not the best, support and integration with smart cards and HSMs among crypto libraries. That we mostly own to the support of PKCS#11 URLs, which allowed the conversion of high level APIs which accepted files, to API that would work with HSMs and smart cards, transparently, without changing it,
Support for the required by FIPS140-2 algorithms.

Most of these are described in the original announcement mail, some others were completed and added gradually.

A long-time requested feature that was implemented was the removal of the need for explicit library initialization. That relocated the previously explicit call of gnutls_global_init() to a library constructor, and eliminated a cumbersome requirement the previous versions of GnuTLS had. As a result it removed the need for locks and coordination in a large application which may use multiple libraries that depend on GnuTLS. However it had an unexpected side-effect. Because initialization was moved to a constructor, which is executed prior to an application entering main(), server applications which closed all the file descriptors on startup, did close the file descriptor that was held by GnuTLS for reading /dev/urandom. That's a pretty nasty bug, and was realized because in CUPS the descriptor which replaced the /dev/urandom one, was a non-readable file descriptor. It was solved re-opening the descriptor on the first call of gnutls_global_init(), but that issue demonstrates the advantage and simplicity of the system call approach for the random generator, i.e., getentropy() or getrandom(). In fact adding support for getentropy() reduced the complexity of the relevant code to 8 lines from 75 lines of code in the file-based approach.

An other significant addition, at least for client-side, is the support for trust modules, available using PKCS #11. Trust modules, like p11-kit trust, allow clients to perform server certificate authentication using a method very similar to what the NSS library uses. That is, a system-wide database of certificates, such as the p11-kit trust module, can be used to query for the trusted CAs and intermediate anchors. These anchors may have private attached extensions, e.g., restrict the name constraints for a CA, or restrict the scope of a CA to code signing, etc., in addition to, or in place of the scope set by the CA itself. That has very important implications for system-wide CA management. CAs can be limited on scope, or per application, and can even be restricted to few DNS top-level domains using name constraints. Most importantly, the database can be shared across libraries (in fact in Fedora 21 the trust database is shared between GnuTLS and NSS), simplifying significantly administration, even though the tools to manage it are primitive for the moment. That was a long time vision of Stef Walter, who develops p11-kit, and I'm pretty happy the GnuTLS part of it was completed, quite well.

This was also the first year I attempted to document and publicize the goals for the next GnuTLS release which is 3.4.0 and will be released around next March. They can be seen in our gitorious wiki pages. Hopefully, all the points will be delivered, although some of the few remaining points rely on a new release of the nettle crypto library, and on an update of c-ares to support DNSSEC.

Dependency-wise, GnuTLS is moving to a tighter integration with nettle, which provides one of the fastest ECC implementations out there; I find unfortunate, though, that there seem to be no intention of collaboration between the two GNU crypto libraries - nettle and libgcrypt, meaning that optimizations in libgcrypt stay in libgcrypt and the same for nettle. GNU Libtasn1 is seeking for a co-maintainer (or even better for someone to rethink and redesign the project I'd add), so feel free to step up if you're up to the job. As it is now, I'm taking care of any fixes needed for that project.

On the other hand, we also had quite serious security vulnerabilities fixed that year. These varied from certificate verification issues to a heap overflow. A quick categorization of the serious issues fixed this year follows.

Issues discovered with manual auditing (GNUTLS-SA-2014-1, GNUTLS-SA-2014-2). These were certificate verification issues and were discovered as part of manual auditing of the code. Thanks to Suman Jana, and also to Red Hat which provided me the time for the audit.
Issues discovered by Fuzzers (GNUTLS-SA-2014-3, GNUTLS-SA-2014-5):

Codenomicon provided a few month valid copy of their test suite, and that helped to discover the first bug above,
Sean Buford of Google reported to me that using AFL he identified a heap corruption issue in the encoding of ECC parameters; I have not used AFL but it seems quite an impressive piece of software,

Other/Protocol issues (GNUTLS-SA-2014-4), i.e., POODLE. Although, that is not really a GnuTLS or TLS protocol issue, POODLE takes advantage of the downgrade dance, used in several applications.

Even though the sample is small, and only accounts for the issues with an immediate security impact, I believe that this view of the issues, shows the importance of fuzzers today. In a project of 180k lines of code, it is not feasible to fully rely on manual auditing, except for some crucial parts of it. While there are no security vulnerabilities discovered via static analysis tools like clang or coverity, we had quite a few other fixes because of these tools.

So overall, 2014 was a quite productive year, both in the number of new features being added, and to the number of bugs fixed. New techniques were used to verify the source code, and new test cases were added in our test suite. What is encouraging is that more and more people test, report bugs and provide patches on GnuTLS. A big thank you to all of the people who contributed.

Wednesday, October 15, 2014

What about POODLE?

Yesterday POODLE was announced, a fancy named new attack on the SSL 3.0 protocol, which relies on applications using a non-standard fallback mechanism, typically found in browsers. The attack takes advantage of

a vulnerability in the CBC mode in SSL 3.0 which is known since a decade
a non-standard fallback mechanism (often called as downgrade dance)

So the novel and crucial part of the attack is the exploitation of the non-standard fallback mechanism.What is that, you may ask. I'll try to explain it in the next paragraph. Note that in the next paragraphs I'll use the term SSL protocol to cover TLS as well, since TLS is simply a newer version of SSL.

The SSL protocol, has a protocol negotiation mechanism that wouldn't allow a fallback to SSL 3.0 from clients and servers that both support a newer variant (e.g, TLS 1.1). That detects modifications by man-in-the-middle attackers and the POODLE attack would have been thwarted. However, a limited set of clients, perform a custom protocol fallback, the downgrade dance, which is straightforward but insecure. That set of clients seem to be most of the browsers; those in order to negotiate an acceptable TLS version follow something along the lines:

Connect using the highest SSL version (e.g., TLS 1.2)
If that fails set the version to be TLS 1.1 and reconnect
...
until there are no options and SSL 3.0 is used.

That's a non-standard way to negotiate TLS and as the POODLE attack demonstrates, it is insecure. Any attacker can interrupt the first connection and make it seem like failure to force a fallback to a weaker protocol. The good news is, that mostly browsers use this construct, and few other applications should be affected.

Why do browsers use this construct then? To their defence, there have been serious bugs in SSL and TLS standard protocol fallback implementation, in widespread software. For example, when TLS 1.2 was out, we realized that our TLS 1.2-enabled client in GnuTLS couldn't connect to a large part of the internet. Few large sites would refuse to talk to the GnuTLS client because it advertised TLS 1.2 as its highest supported protocol. The bug was on the server, that closed the connection when encountered with a newer protocol than its own, instead of negotiating its highest supported (in accordance with the TLS protocol). It took few years before TLS 1.2 was enabled by default in GnuTLS, and still then we had a hard time convincing our users that encountered connection failures, that it was a server bug. The truth is that users don't care who's bug it is, they will simply use software that just works.

There has been long time since then (TLS 1.2 was published in 2008), and today almost all public servers follow the TLS protocol negotiation. So that may be the time for browsers to get rid of that relic of the past. Unfortunately, that isn't the case. The IETF TLS working group is now trying to standardize counter-measures for the browser negotiation trickery. Even though I have become more pragmatist since 2008, I believe that forcing counter measures in every TLS implementation just because there used to (or may still be) be broken servers on the Internet, not only prolongs the life of an insecure out of protocol work-around, but creates a waste. That is, it creates a code dump called TLS protocol implementations which get filled with hacks and work-arounds, just because of few broken implementations. As Florian Weimer puts it, all applications pay a tax of extra code, potentially introducing new bugs, and even more scary potentially introducing more compatibility issues, just because some servers on the Internet have chosen not to follow the protocol.

Are there, however, any counter-measures that one can use to avoid the attack, without introducing an additional fallback signalling mechanism? As previously mentioned, if you are using the SSL protocol the recommended way, no work around is needed, you are safe. If for any reason you want to use the insecure non-standard protocol negotiation, make sure that no insecure protocols like SSL 3.0 are in the negotiated set or if disabling SSL 3.0 isn't an option, ensure that it is only allowed when negotiated as a fallback (e.g., offer TLS 1.0 + SSL 3.0, and only then accept SSL 3.0).

In any case, that attack has provided the incentive to remove SSL 3.0 from public servers on the Internet. Given that, and its known vulnerabilities, it will no longer be included by default in the upcoming GnuTLS 3.4.0.

[Last update 2014-10-20]

PS. There are some recommendations to work around the issue by using RC4 instead of a block cipher in SSL 3.0. That would defeat the current attack and it closes a door, by opening another; RC4 is a broken cipher and there are known attacks which recover plaintext for it.

Thursday, April 17, 2014

software has bugs... now what?

The recent bugs uncovered in TLS/SSL implementations, were received in the blogo-sphere with a quest for the perfectly secure implementations, that have no bugs. That is the modern quest for perpetual motion. Nevertheless, very few bothered by the fact that the application's only security defence line were few TLS/SSL implementations. We design software in a way that openssl or gnutls become the Maginot line of security and when they fail they fail catastrophically.

So the question that I find more interesting, is, can we live with libraries that have bugs? Can we design resilient software that will operate despite serious bugs, and will provide us with the necessary time for a fix? In other words could an application design have mitigated or neutralized the catastrophic bugs we saw? Let's see each case, separately.

Mitigating attacks that expose memory (heartbleed)

The heartbleed attack allows an attacker to obtain a random portion of memory (there is a nice illustration in xkcd). In that attack all the data held within a server process are at risk, including user data and cryptographic keys. Could an attack of this scale be avoided?

One approach is to avoid putting all of our eggs in one basket; that's a long-time human practice, which is also used in software design. OpenSSH's privilege separation and isolation of private keys using smart cards or software security modules are two prominent examples. The defence in that design is that the unprivileged process memory contains only data that are related the current user's session, but no privileged information such as passwords or the server's private key. Could we have a similar design for an SSL server? We already have a similar design for an SSL VPN server, that revealing the worker processes' memory couldn't possibly reveal its private key. These designs come at a cost though; that is performance, as they need to rely on slow Inter-Process Communication (IPC) for basic functionality. Nevertheless, it is interesting to see whether we can re-use the good elements of these designs in existing servers.

Few years ago in a joint effort of the Computer Security and Industrial Cryptography research group of KU Leuven, and Red Hat we produced a software security module (softhsm) for the Linux-kernel, that had the purpose of preventing a server memory leak to an adversary from revealing its private keys. Unfortunately we failed to convince the kernel developers for its usefulness. That wasn't the first attempt for such a module. A user-space security module existed already and called LSM-PKCS11. Similarly to the one we proposed, that would provide access to the private key, but the operations will be performed on an isolated process (instead of the kernel). If such a module would be in place in popular TLS/SSL servers there would be no need to regenerate the server's certificates after a heartbleed-type of attack. So what can we do to use such a mechanism in existing software?

The previous projects are dead since quite some time, but there are newer modules like opendns's softhsm which are active. Unfortunately softhsm is not a module that enforces any type of isolation. Thus a wrapper PKCS #11 module over softhsm (or any other software HSM) that enforces process isolation between the keys and the application using it would be a good starting point. GnuTLS and NSS provide support for using PKCS #11 private keys, and adding support for this module in apache's mod_gnutls or mod_nss would be trivial. OpenSSL would still need some support for PKCS #11 modules to use it.

Note however, that such an approach would take care of the leakage of the server's private key, but would not prevent any user data to be leaked (e.g., user passwords). That of course could be handled by a similar isolation approach on the web server, or even the same module (though not over the PKCS #11 API).

So if the solution is that simple, why isn't it already deployed? Well, there is always a catch; and that catch as I mentioned before is performance. A simple PKCS #11 module that enforces process isolation would introduce overhead (due to IPC), and most certainly is going to become a bottleneck. That could be unacceptable for many high-load sites, but on the other hand that could be a good driver to optimize the IPC code paths.

Mitigating an authentication issue

The question here is what could it be done for the bugs found in GnuTLS and Apple's SSL implementation, that allowed certain certificates to always succeed authentication. That's related to a PKI failure, and in fact the same defences required to mitigate a PKI failure, can be used. A PKI failure is typically a CA compromise (e.g., the Diginotar issue).

One approach is again on the same lines as above, to avoid reliance on a single authentication method. That is, use two-factor authentication. Instead of relying only on PKI, combine password (or shared-key) authentication with PKI over TLS. Unfortunately, that's not as simple as a password key exchange over TLS, since that is vulnerable to eavesdropping once the PKI is compromised. To achieve two factor authentication with TLS, one can simply negotiate a session using TLS-PSK (or SRP), and renegotiate on top of it using its certificate and PKI. That way both factors are accounted, and the compromise of any of the two factors doesn't affect the other.

Of course, the above is a quite over-engineered authentication scenario, requires significant changes to today's applications, and also imposes a significant penalty. That is a penalty in performance and network communication as twice the authentication now required twice the round-trips. However, a simpler approach is to rely on trust on first use or simply SSH authentication on top of PKI. That is, use PKI to verify new keys, but follow the SSH approach with previously known keys. That way, if the PKI verification is compromised at some point, it would affect only new sessions to unknown hosts. Moreover, for an attack to be undetected the adversary is forced to operate a man-in-the-middle attack indefinitely, or if we have been under attack, a discrepancy on the server key will be detected on the first connection to the real server.

In conclusion, it seems that we can have resilient software to bugs, no matter how serious, but that software will come at a performance penalty, and would require more conservative designs. While there are applications that mainly target performance and could not always benefit from the above ideas, there are several applications that can sacrifice few extra milliseconds of processing for a more resilient to attacks design.