EipLoader

Malware Technology Research

Monthly Archives: September 2011

Analysing Compiler-Optimised RC4

The same C/C++ source code can always be compiled differently by using different compilers or selecting different optimization mode. This is because the “meaning of the codes” can be retained even after rewriting it differently. For example, a “for-loop” which iterates 3 times can be replaced by repeating the block of statements three times; and redundant codes can also be removed without changing the effect of the program.

This implies that compiling a source code to machine code is a One-To-Many relationship. With that, it is difficult or almost impossible to reverse-engineer compiled codes back to its original state (original source code).

In this article, I would like to illustrate the difference between the source and the decompiled codes of the RC4 cryptographic algorithm implementation. To achieve this, I have written a simple console application which utilizes RC4 to perform simple encryption and decryption. (See source code below.)

Before Compilation

#include <stdio.h>
#include <string.h>
#include <windows.h>
#include "rc4.h"

#define BUF_SIZE 128
void main (void)
{
	unsigned char key_pass [] = "This is a RC4key";
	unsigned char bufPlain[BUF_SIZE];
	unsigned char bufCipher[BUF_SIZE];
        RC4_KEY key;
	int  i;

	//Prepare the buffers for encryption
	memset(bufPlain,0,BUF_SIZE);
	memset(bufCipher,0,BUF_SIZE);
	strncpy ((char *)bufPlain ,
		"This is a secret text to be encrypted using RC4.\n"
		"It will then be decrypted later using RC4 for testing.",
		BUF_SIZE);

	//Encrypt the plain text
	RC4_set_key(&key, strlen((char *)key_pass) ,key_pass);
	RC4(&key, BUF_SIZE,(unsigned char *) bufPlain, (unsigned char *)bufCipher);

	//Print the cipher in hex string
	printf("The encrypted msg\n");
	for (i = 0;i <BUF_SIZE; i++) printf ("%02X",bufCipher[i]);
	printf ("\n\n");

	//Decrypt the cipher to verify
	printf("Decrypting\n\n");
	memset(bufPlain,0,BUF_SIZE);//clear the plain text buffer
	RC4_set_key(&key, strlen((char *)key_pass) ,key_pass);
	RC4(&key, BUF_SIZE,(unsigned char *) bufCipher, (unsigned char *)bufPlain);

	//Print the decrypted message
	printf("The decrypted msg\n%s",(char *)bufPlain);
	getchar();
}

The RC4 implementation is provided by Kungliga Tekniska. See 2 important functions extracted below.

void RC4_set_key(RC4_KEY *key, const int len, unsigned char *data){
    int i, j;

    for (i = 0; i < 256; i++)
        key->state[i] = i;

    for (i = 0, j = 0; i < 256; i++) {
	j = (j + key->state[i] + data[i % len]) % 256;
	SWAP(key, i, j);
    }

    key->x = key->y = 0;
}

void RC4(RC4_KEY *key, const int len, const unsigned char *in, unsigned char *out){
    int i, t;
    unsigned x, y;

    x = key->x;
    y = key->y;
    for (i = 0; i < len; i++) {
	x = (x + 1) % 256;
	y = (y + key->state[x]) % 256;
	SWAP(key, x, y);
	t = (key->state[x] + key->state[y]) % 256;
	*out++ = key->state[t] ^ *in++;
    }
    key->x = x;
    key->y = y;
}

Observations

  1. RC4_set_key function:  It is obvious see that it contains two loops, where the first loop is simply used to initialize the key state, while the second loop is used the schedule the keys using the input RC4 secret key. In this case, the RC4 secret key is “This is a RC4key”.
  2. RC4 function: There is one loop which iterates through plain text byte array and scramble the key state at the same time. And also, a selected key state element is used to “pad” the plain text byte to produce a cipher byte in each iteration.

After Compilation

The codes above are compiled with full optimization (/Ox), the “look” of the assembly codes appears terribly different from how it was written in C for some parts of the codes.

Analysis of RC4 _Set_Key function

Well, let’s examine the RC4_set_key function first. Below is the decompiled code of the first loop inside the SetKey function with the help of HexRay. There is no significant change in this simple loop except that the for loop is replaced with a do-while loop.

However, the “simple” second loop inside RC4_Set_Key has become almost beyond recognition compared to the source.

From the figure above, I have identified 4 repeating blocks within the loop and have boxed in red and green outlines.

The next thing that caught my eye is the loop counter. It is interesting to see that the loop counter is initialized with the value 2 before the do-while loop. For the first “iteration” (first block boxed in red), the loopCounter would take the value of (two minus two) which is zero; and in the second “iteration” (second block boxed in green), the loopCounter would take the value of (two minus one) which is one, and so on so forth. Before the end of the do-while loop, this loopCounter is incremented by the value of four. This is another tell tale sign that this do-while loop has 4 repeating blocks of statement embedded within. As a result, this do-while loop iterates only 64 times instead of 256 times.

However, by comparing the first block of statements (boxed in red) against the source code, it doesn’t look obvious that they are performing the same purpose.

Well, let’s examine this block in detail. Analysis are commented in green.

//Analysis will be using the first loop as an example for easy understanding

//idx4 is a newly introduced variable that will be updated by the fourth repeating block;
//For this first loop it is initialised with the value zero.
LOBYTE(idx4) = 0;
loopCounter = 2;
//pKeyState_Element is used to point to an element in the key state array.
//It is initialised to point to the 2nd element in the key state array.
pKeyState_Element = (int *)&pKeyStateStruct->pKeyState[1];
do
{
	//Since pKeyState_Element is refencing to the 2nd element,
	//By minusing one from this pointer, it will reference to the previous element (pKeyState[0]).
	//Therefore tmp_element1 will store the value pKeyState[0]
        tmp_element1 = *(pKeyState_Element - 1);

	//As pKeyState_Element is currently refencing to the 2nd element,
	//By adding four, it will be referencing to the 6th element.
        pKeyState_Element += 4;

	//The casting of the overall result to unsigned __int8 would be effectively modulus the result by 256.
	//Since loopCounter is initialised as 2, the result of loopCounter - 2 is 0.
	//idx1 = idx4 + pKeyState[0] + key [0 % key_length]
        idx1 = (unsigned __int8)((_BYTE)idx4 + tmp_element1 + key[(loopCounter - 2) % key_length]);

	//pKeyState_Element - 5 will reference to the first element.
	//pKeyState[0] = pKeyState[idx1]
        *(pKeyState_Element - 5) = pKeyStateStruct->pKeyState[idx1];

	//pKeyState [idx1] = original pKeyState [0]
	pKeyStateStruct->pKeyState[idx1] = tmp_element1;

	//Hence it is easy to summarise the statements above as
	//Swap (pKeyState [0] , pKeyState[idx1]), 
	//Where idx1 = idx4 + pKeyState[0] + key [0 % key_length], and idx4 is the "previous idx".

From the analysis above, it is interesting to see that the Swap function is actually subsumed into this loop. Additionally, with the blocks being repeated four times, “redundant” variables are also introduced. The repeated variables are “temp” variables which are used to support the swap, and the derived index are repeated four times. It is also observed that these variables are used in a ring manner.

After breaking down all and analyze the statements, it is not hard to see that this the RC4 key scheduling function.

Analysis of RC4  function

Next, we will proceed to analyse the RC4 function. See decompiled codes below.

In this RC4 loop, it is not hard to see that this do-while loop iterates only 32 times and not 128 times. With the the optimization of the loop, the contents are repeated four times in each loop. As a result, the block of statements are repeated 4 x 32 times. In the figure above, I have tried to visually separate the “repeated” blocks using red and green boxes.

Just like the key scheduling algorithm, the swap function is subsumed within this function with the help of the four newly introduced temp variables.

Below details the analysis for the block using comments in green.

  outBuf = outputBuffer;
  y_4 = pKeyState_struct->y;
  inBuf = inputBuffer;
  x_4 = pKeyState_struct->x;
  loopCounter = 32;
  do
  {
	//x = x + 1
        x_1 = (unsigned __int8)(x_4 + 1);

	//tmp = pKeyState[x]
        keyState_tmp1 = pKeyState_struct->pKeyState[x_1];

	//y = pKeyState [x] + y
        y_1 = (unsigned __int8)(keyState_tmp1 + (_BYTE)y_4);

	//pKeyState[x] = pKeyState [y]
        pKeyState_struct->pKeyState[x_1] = pKeyState_struct->pKeyState[y_1];

	//pKeyState [y] = tmp; 
        pKeyState_struct->pKeyState[y_1] = keyState_tmp1;

	//The statments above suggests Swap(keyState[x],keyState[y]);

	//*outBuf = *inBuf XOR keyState [ (tmp + keyState [x]) % 256]
	//which can be interpreted as: *outBuf = *inBuf XOR keyState [ (keyState[x] + keyState [y]) % 256]
	//since tmp is the old keyState [x] and keyState [x] is keyState [y] after the swap
        *outBuf = *inBuf ^ LOBYTE(pKeyState_struct->pKeyState[
			(unsigned __int8)(keyState_tmp1 + (unsigned __int8)pKeyState_struct->pKeyState[x_1])]);

Again, after breaking them down, it is not hard to see that it is the same implementation as RC4 encryption algorithm.