Thursday, December 15, 2011

Self-modifying code using GCC

One of my research topics last year was self-modifying code mainly for obfuscation. Having seen how self-modification is implemented for a variety of programs, I could say that most existing techniques implement the self-modification in assembly, or in the most high level case, in C using inline assembly.

I'm not a good assembly programmer so I always try to move things to "high-level" C. Unfortunately I haven't managed to implement self modification with standard C, but can be done by using an interesting GCC extension. That is the label to pointer or '&&' extension. This provides the address of a C label as a pointer. Using that we can implement self modification without the use of assembler. The idea is demonstrated in the code snippet below.

One of its problems is that if the labels are not used in the code they are removed by the GCC optimizer (even with -O0) and the address to pointer just returns a dummy value. For that the labels have to be used in dummy code. The better the optimizer in GCC becomes the harder the work around. In this test we use the value of argc (any other external value would do) to ensure the labels stay put.

The code was inspired by a self-modifying code example using inline assembly, but unfortunately I can no longer find it in order to provide proper references.

/* 
 * A self-modifying code snippet that uses C
 * and the GCC label to pointer (&&) extension.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdint.h>

int main(int argc, char **argv)
{
 int (*my_printf) (const char *format, ...);
 void (*my_exit) (int);
 void *page =
     (void *) ((unsigned long) (&&checkpoint) &
        ~(getpagesize() - 1));

 /* mark the code section we are going to overwrite
  * as writable.
  */
 mprotect(page, getpagesize(), PROT_READ | PROT_WRITE | PROT_EXEC);

 /* Use the labels to avoid having GCC
  * optimize them out */
 switch (argc) {
   case 33:
     goto checkpoint;
   case 44:
     goto newcode;
   case 55:
     goto newcode_end;
   default:
     break;
 }

 /* Replace code in checkpoint with code from
  * newcode.
  */
 memcpy(&&checkpoint, &&newcode, &&newcode_end - &&newcode);

checkpoint:
 printf("Good morning!\n");
 return 1;

newcode:
 my_printf = &printf;
 (*(my_printf)) ("Good evening\n");

 my_exit = &exit;
 (*(my_exit)) (0);

newcode_end:
 return 2;
}

Dedicated to the anonymous referee who insisted into adding this example to our article.

2 comments:

  1. Hi --can you explain why in the code following 'newcode:' you used function pointers -- was that necessary or
    could you have done:

    newcode:
    printf ("Good evening\n");
    exit(0);
    newcode_end:
    ...

    thanks
    (actually -- is it to prevent gcc from using a relative jump which would cause problems after the memcopy -- I guess that must be it...)

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete