Thursday, May 16, 2013

Salsa20 and UMAC in TLS

Lately while I was implementing and deploying an SSL VPN server, I realized  that even for a peer-to-peer connections the resources taken for encryption on the two ARM systems I used were quite excessive. These ARM processors do not have instructions to speed-up AES and SHA1, and were spending most of their resources to encrypt and authenticate the exchanged packets.

What can be done in such a case? The SSL VPN server utilized DTLS which runs over UDP and restricts the packet size to the path MTU size (typically 1400 bytes if we want to avoid fragmentation and reassembly), thus wastes quite some resources on packetization of long data. Since the packet size cannot be modified we could possibly improve the encryption and authentication speed.  Unfortunately using a more lightweight cipher available in TLS, such as RC4, is not an option as it is not available in DTLS (while TLS and DTLS mostly share the same set of ciphersuites, some ciphers like RC4 due to constraints cannot be used in DTLS). Overall, we cannot do much with the currently defined algorithms in DTLS, we need to move outside the TLS protocol box.

Some time ago there was an EU-sponsored competition on stream ciphers (which are typically characterized by their performance) and Salsa20, one of the winners, was recently added in nettle (the library GnuTLS uses) by Simon Josefsson who conceived the idea of such a fast stream cipher being added to TLS. While modifying GnuTLS to take advantage of Salsa20, I also considered moving away from HMAC (the slow message authentication mechanism TLS uses) and use the UMAC construction which provides a security proof and impressive performance. My initial attempt to port the UMAC reference code (which was not ideal code), motivated the author of nettle, Niels Moeller, to reimplement UMAC in a cleaner way. As such Salsa20 with UMAC is now included in nettle and are used by GnuTLS 3.2.0. The results are quite impressive.

Salsa20 with UMAC96 ciphersuites were 2-3 times faster than any AES variant used in TLS, and outperformed even RC4-SHA1, the fastest ciphersuite defined in the TLS protocol. The results as seen on an Intel i3 are shown below (they are reproducible using gnutls-cli --benchmark-tls-ciphers). Note that SHA1 in the ciphersuite name means HMAC-SHA1 and Salsa20/12 is the variant of Salsa20 that was among the eStream competition winners.

Performance on 1400-byte packets

The results as seen on the openconnect VPN performance on two PCs, connected over a 100-Mbit ethernet, are as follows.

Performance of a VPN transfer over ethernet
CiphersuiteMbits/secCPU load (top)
None (plain transfer)948%

While the performance difference of SALSA20 and AES-128-CBC isn't impressive (AES was already not too low), the difference in the load of the server CPU is significant.

Would such ciphersuites be also useful to a wider set of applications than VPN? I believe the answer is positive, and not only for performance reasons. This year new attacks were devised on AES-128-CBC-SHA1 and RC4-SHA1 ciphersuites in TLS that cannot be easily worked around. For AES-128-CBC-SHA1 there are some hacks that reduce the impact of the known attacks, but they are hacks not a solution. As such TLS will benefit from a new set of ciphersuites that replace the old ones with known issues. Moreover, even if we consider RC4 as a viable solution today (which is not), the DTLS protocol cannot take advantage of it, and datagram applications such as VPNs need to rely on the much slower AES-128-GCM.

So we see several advantages in this new list of ciphersuites and for that, with Simon Josefsson and Joachim Strombergson we plan to propose to the IETF TLS Working Group the adoption of a set of Salsa20-based ciphersuites. We were asked by the WG chairs to present our work in the IETF 87 meeting in Berlin. For that I plan to travel to the meeting in Berlin to present our current Internet-Draft.

For that, if you support defining these ciphersuites in TLS, we need your help. If you are an IETF participant please join the TLS Working Group meeting and indicate your support. Also if you have any feedback on the approach or suggest another field of work that this could be useful please drop me a mail or leave a comment below mentioning your name and any association.

Moreover, as it is now, a lightning trip to Berlin on these dates would cost at minimum 800 euros including the IETF single day registration. As this is not part of our day job any contribution that would help to partially cover those expenses is welcome.