Kaz's Implementation of Rijndael

Implementation of Rijndael

I wrote this around 2001, not long after the Rijndael cipher became adopted as the Advanced Encryption Standard (AES) by the U. S. Government. See this Wikipedia entry for more information. I unearthed this code when searching through some old hard drives, and so now, in the year 2010, it is published:

rijndael.c

rijndael.h

At the time I wrote this, I validated it against the test cases which accompanied the reference implementation.

A cursory look at the code shows it to be superficially similar to the reference "ref21" implementation from Vincent Rijmen and Joan Daemen. However, there is a key difference (pun intended). Look at the basic cipher block encryption function prototype of the reference implementation:

  /* Reference implementation */
  int rijndaelEncrypt (word8 a[4][MAXBC], int keyBits, int blockBits, 
                       word8 rk[MAXROUNDS+1][4][MAXBC]);

The block to be encrypted is the a parameter. But this is not simple linear byte data; it may contains gaps, which is easily evident if we look for instance at the KeyAddition function:

  /* Reference implementation */
  void KeyAddition(word8 a[4][MAXBC], word8 rk[4][MAXBC], word8 BC) {
          /* Exor corresponding text input and round key input bytes
           */
          int i, j;

          for(i = 0; i < 4; i++)
                  for(j = 0; j < BC; j++) a[i][j] ^= rk[i][j];
  }

The parameter BC depends on the block size (the blockBits parameter in rijndaelEncrypt). But note how it's the last array dimension which varies from zero to BC. The plaintext blocks, cipher blocks and round keys actually reserve column spaces for the maximum size, leaving empty internal positions. What's worse, the block data is not in the native, external order suitable for interfacing with other software.

Looking at this, I instantly realized that if we transpose these matrices (turn rows into columns and vice versa), we can achieve a representation with no gaps, and in the native byte ordering that applications expect to use.

The block size is converted into a number of rows of the C array, and to do the key addition, we can simply operate on the data as if it were flat memory. (The rows are four-byte arrays, and so the numrows_to_blocksize function merely multiplies by four.)

  /* My implementation */
  static void
  key_add(rijn_block_t *out, rijn_block_t *in, rijn_block_t *roundkey, int numrows)
  {
      unsigned char *flat_out = (unsigned char *) out;
      const unsigned char *flat_in = (unsigned char *) in;
      const unsigned char *flat_rk = (unsigned char *) roundkey;
      int numbytes = numrows_to_blocksize(numrows);

      while (numbytes-- > 0)
          *flat_out++ = *flat_in++ ^ *flat_rk++;
  }

Not only does this mean we can operate on raw input blocks, but it also means we could do this exclusive-oring word-wise rather than byte-wise. This could be hidden behind some generic memxor function which is optimized in architecture-specific ways.

This simplification means that I we are able to give the Rijndael module a production-ready API without any additional layer to move the bytes around. In my implementation, basic block cipher function is simply:

  /* My implementation */
  void rijn_encrypt(rijn_keysched_t *, unsigned char *, const unsigned char *);

The inputs are the scheduled key, a pointer to the memory where a plaintext block is stored (simple, contiguous bytes), and a pointer to the output buffer where the cipher block is stored.

By contrast, the reference Rijndael code (at least the snapshot of the "ref21" version that I took in 2001) has a separate wrapper API module whose responsibilty it is to do this conversion between the transposed internal representation and the external representation around the calls to the algorithm module proper.

Here is a snippet of the code from the blockEncrypt function:

  /* Reference implementation */
  switch (cipher->mode) {
  case MODE_ECB:
          for (i = 0; i < numBlocks; i++) {
                  for (j = 0; j < cipher->blockLen/32; j++) {
                          for(t = 0; t < 4; t++)
                          /* parse input stream into rectangular array */
                                  block[t][j] = input[cipher->blockLen/8*i+4*j+t] & 0xFF;
                  }
                  rijndaelEncrypt (block, key->keyLen, cipher->blockLen, key->keySched);
                  for (j = 0; j < cipher->blockLen/32; j++) {
                          /* parse rectangular array into output ciphertext bytes */
                          for(t = 0; t < 4; t++)
                                  outBuffer[cipher->blockLen/8*i+4*j+t] = (BYTE) block[t][j];
                  }
          }
          break;

See how this does a "parse input stream into rectangular array" before calling the encryption function, and then "parse rectangular array into output ciphertext bytes". Ouch!

I suspect the authors may have described the algorithm and implemented it in this obtuse transposed way on purpose: to encourage users to roll their own implementations based on the underlying concept, rather than rely on the reference implementation. Reference implementations can be dangerous. If everyone uses a reference implementation, that creates a monoculture whereby a single defect will contaminate the entire field of deployment.

Return to Home page.