Thursday, December 15, 2011

Self-modifying code using GCC

One of my research topics last year was self-modifying code mainly for obfuscation. Having seen how self-modification is implemented for a variety of programs, I could say that most existing techniques implement the self-modification in assembly, or in the most high level case, in C using inline assembly.

I'm not a good assembly programmer so I always try to move things to "high-level" C. Unfortunately I haven't managed to implement self modification with standard C, but can be done by using an interesting GCC extension. That is the label to pointer or '&&' extension. This provides the address of a C label as a pointer. Using that we can implement self modification without the use of assembler. The idea is demonstrated in the code snippet below.

One of its problems is that if the labels are not used in the code they are removed by the GCC optimizer (even with -O0) and the address to pointer just returns a dummy value. For that the labels have to be used in dummy code. The better the optimizer in GCC becomes the harder the work around. In this test we use the value of argc (any other external value would do) to ensure the labels stay put.

The code was inspired by a self-modifying code example using inline assembly, but unfortunately I can no longer find it in order to provide proper references.

/* 
 * A self-modifying code snippet that uses C
 * and the GCC label to pointer (&&) extension.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdint.h>

int main(int argc, char **argv)
{
 int (*my_printf) (const char *format, ...);
 void (*my_exit) (int);
 void *page =
     (void *) ((unsigned long) (&&checkpoint) &
        ~(getpagesize() - 1));

 /* mark the code section we are going to overwrite
  * as writable.
  */
 mprotect(page, getpagesize(), PROT_READ | PROT_WRITE | PROT_EXEC);

 /* Use the labels to avoid having GCC
  * optimize them out */
 switch (argc) {
   case 33:
     goto checkpoint;
   case 44:
     goto newcode;
   case 55:
     goto newcode_end;
   default:
     break;
 }

 /* Replace code in checkpoint with code from
  * newcode.
  */
 memcpy(&&checkpoint, &&newcode, &&newcode_end - &&newcode);

checkpoint:
 printf("Good morning!\n");
 return 1;

newcode:
 my_printf = &printf;
 (*(my_printf)) ("Good evening\n");

 my_exit = &exit;
 (*(my_exit)) (0);

newcode_end:
 return 2;
}

Dedicated to the anonymous referee who insisted into adding this example to our article.

Monday, December 12, 2011

Generating Diffie-Hellman parameters

Starting with gnutls 3.0 the Diffie-Hellman parameter generation has been changed. That was mandated by the move from libgcrypt to nettle. Nettle didn't support Diffie-Hellman parameter generation, so I had to implement it within gnutls. For that I generate a prime of the form p=2wq+1, where w and q are also primes and q has the size in bits of the security parameter (could be 160,256 etc. bits, based on the size of p). Then I search for a generator of the q subgroup using an algorithm which typically gives a large generator --few bits less than the size of p.

This method has the advantage that a server when selecting a private key value x, instead of selecting 0 < x < p-1, it can select a smaller x within the multiplicative subgroup of order q, i.e., 0 < x < q-1. The security level of x is that of the security parameter, which in gnutls is calculated as in ECRYPT recommendations. However until now we never wrote the size of the security parameter in the Diffie-Hellman parameters file, so it was impossible for the server to guess the order of q, since the PKCS #3 file only contained the generator and the prime. However PKCS #3 has a  privateValueLength field exactly for this purpose, but it is not used by gnutls or any other implementation I'm aware of.

DHParameter ::= SEQUENCE {
  prime INTEGER, -- p
  base INTEGER, -- g
  privateValueLength INTEGER OPTIONAL 
}

By populating and using it, the performance improvement was quite impressive. The following table demonstrates that.

Prime
length
Private key
length
Transactions/sec
with DHE-RSA
12481248 122.75
1248160 189.91

So starting from 3.0.9 gnutls' generated parameters for Diffie-Hellman should perform better. However one question that I had to answer, is what is more important, keeping x small as we do, or having a small generator? Libgcrypt and openssl generate parameters in a way that the generator is kept small, e.g. g=5 or g=2. For that I generated 1248 bit parameters using openssl which happened to have g=2.

Prime
length
Generator
length
Private key
length
Transactions/sec
with DHE-RSA
12481246 bits1248 122.75
12481246 bits160 189.91
12482 bits1248 125.94

So it seems keeping the generator small doesn't really have an impact to performance comparing to using a smaller but still secure subgroup.

Thursday, December 8, 2011

The price to pay for perfect-forward secrecy

[EDIT: This article was written in 2011 and reflects the facts at the time; fortunately, since then the state of the Internet has improved, and PFS ciphersuites have become the norm]

Ok, the question seems to be what is perfect forward secrecy? Perfect forward secrecy (PFS) is a property of a cryptographic protocol, such as SSL (or better TLS). That property ensures that once the cryptographic keys of a participant are leaked the secrecy of past sessions remains. For example if the amazon.com private key is leaked your previous transactions with this site remain secure. That is a very interesting protocol property, but is actually only effective if the web site wouldn't store any of the sensitive transaction data. In this post we ignore any issues due to storage and focus on the protocol property. So does TLS have this property? That is a yes and a no.

TLS provides the option to use ciphersuites offering perfect forward secrecy. Those are the ciphersuites that use ephemeral Diffie-Hellman and elliptic curve Diffie-Hellman authentication. However those come at a cost very few web sites are willing to pay for. They involve expensive calculations that put some burden on the server and delay the transaction for few milliseconds. Why would a server bother to protect your data anyway? And that seems to be the current attitude. Most servers totally prohibit the usage of ciphersuites that offer PFS. Try for example to connect to amazon.com using:
$ ./gnutls-cli www.amazon.com --priority NORMAL:-KX-ALL:+DHE-RSA:+ECDHE-RSA
Resolving 'www.amazon.com'...
Connecting to '72.21.214.128:443'...
*** Fatal error: A TLS fatal alert has been received.
*** Received alert [40]: Handshake failed

So let's try to evaluate the cost of PFS versus the plain RSA ciphersuites that do not offer PFS, using a simple approach initially. For that we use Diffie-Hellman group parameters of 1024 bits, a 192-bit elliptic curve and a 1024-bit RSA key and a modified gnutls-cli for its benchmark-tls option.


Key exchangeParametersTransactions/sec
DHE-RSA1024-bit RSA key,
1024-bit DH parameters
345.53
ECDHE-RSA1024-bit RSA key,
192-bit ECDH parameters
604.92
ECDHE-ECDSA192-bit ECDSA key,
192-bit ECDH parameters
595.84
RSA1024-bit RSA key994.59

In this test the ciphersuites DHE-RSA, ECDHE-RSA and ECDHE-ECDSA provide perfect forward secrecy, and use a signed ephemeral (Diffie-Hellman) key exchange. The signature in the key exchange is an RSA or ECDSA one, determined by the ciphersuite and the certificate. The plain RSA ciphersuite doesn't offer PFS and utilizes RSA's encryption capability (instead of signing). We can see that the non-PFS ciphersuite is a clear winner with 3x factor from DHE_RSA ciphersuite. The elliptic curve equivalents of DHE also fail to reach plain RSA's performance.

However, our security levels are not consistent. An elliptic curve of 192-bits provides 96-bits of security  according to ECRYPT (don't confuse the bits of an algorithm such as RSA with the bits of the security level, a security level of 96-bits is a good level of security for today's standards). That means it is equivalent to roughly 1776-bits RSA and DH parameters. So let's do the same experiment with more consistent parameters.


Key exchangeParametersTransactions/sec
DHE-RSA1776-bit RSA key,
1776-bit DH parameters
98.26
ECDHE-RSA1776-bit RSA key,
192-bit ECDH parameters
352.41
ECDHE-ECDSA192-bit ECDSA key,
192-bit ECDH parameters
595.84
RSA1776-bit RSA key460.08

So it is obvious that increasing the security level to 96-bits degrades performance for all ciphersuites (except ECDHE-ECDSA which already offered that level of security). The Diffie-Hellman key exchange is very slow over a 1776-bit group, and in addition it includes an RSA signature of 1776-bits. The Elliptic curve equivalent (ECDHE-RSA) is much more efficient, even though it also includes a 1776-bit RSA signature. The clear winner though is the ECDHE-ECDSA variant which uses an ECDSA public key and outperforms even plain RSA.

Nevertheless, if we restrict to RSA keys that are supported by almost every implementation, the winner is plain RSA, with a small but not insignificant difference from the second.

So although I'd suggest ciphersuites offering PFS for almost everything, in practice, when ECDSA keys are not option, they degrade performance, and this should not be underestimated. When it is desired to have PFS, then its shortcomings have to be evaluated against the benefits.

On the other hand if a decent security level is required (such as the 96-bit one used in our example), we see that the switch from plain RSA to ECDHE-ECDSA provides both efficiency and forward secrecy.

So given the fact that any web server may at some point be compromised, my suggestion would be, that if you value the exchanged data in the transactions, only PFS ciphersuites should be used. Otherwise, using the plain RSA ciphersuite may suit better.

PS. Note that protecting the server's private key on the disk using PIN or passwords is futile as the key resides unencrypted in the server's memory.