At the time I wrote this, I validated it against the test cases which accompanied the reference implementation.
A cursory look at the code shows it to be superficially similar to the reference "ref21" implementation from Vincent Rijmen and Joan Daemen. However, there is a key difference (pun intended). Look at the basic cipher block encryption function prototype of the reference implementation:
/* Reference implementation */
int rijndaelEncrypt (word8 a[4][MAXBC], int keyBits, int blockBits,
word8 rk[MAXROUNDS+1][4][MAXBC]);
The block to be encrypted is the a
parameter. But this is not
simple linear byte data; it may contains gaps, which is easily
evident if we look for instance at the KeyAddition
function:
/* Reference implementation */
void KeyAddition(word8 a[4][MAXBC], word8 rk[4][MAXBC], word8 BC) {
/* Exor corresponding text input and round key input bytes
*/
int i, j;
for(i = 0; i < 4; i++)
for(j = 0; j < BC; j++) a[i][j] ^= rk[i][j];
}
The parameter BC
depends on the block size
(the blockBits
parameter in rijndaelEncrypt
).
But note how it's the last array dimension which varies from zero
to BC
. The plaintext blocks, cipher blocks and
round keys actually reserve column spaces for the maximum size,
leaving empty internal positions. What's worse, the block data
is not in the native, external order suitable for interfacing
with other software.
Looking at this, I instantly realized that if we transpose these matrices (turn rows into columns and vice versa), we can achieve a representation with no gaps, and in the native byte ordering that applications expect to use.
The block size is converted into a number of rows of the
C array, and to do the key addition, we can simply operate on the
data as if it were flat memory. (The rows are four-byte arrays,
and so the numrows_to_blocksize
function merely multiplies by
four.)
/* My implementation */
static void
key_add(rijn_block_t *out, rijn_block_t *in, rijn_block_t *roundkey, int numrows)
{
unsigned char *flat_out = (unsigned char *) out;
const unsigned char *flat_in = (unsigned char *) in;
const unsigned char *flat_rk = (unsigned char *) roundkey;
int numbytes = numrows_to_blocksize(numrows);
while (numbytes-- > 0)
*flat_out++ = *flat_in++ ^ *flat_rk++;
}
Not only does this mean we can operate on raw input blocks, but it also means
we could do this exclusive-oring word-wise rather than byte-wise.
This could be hidden behind some generic memxor
function
which is optimized in architecture-specific ways.
This simplification means that I we are able to give the Rijndael module a production-ready API without any additional layer to move the bytes around. In my implementation, basic block cipher function is simply:
/* My implementation */
void rijn_encrypt(rijn_keysched_t *, unsigned char *, const unsigned char *);
The inputs are the scheduled key, a pointer to the memory where a plaintext block is stored (simple, contiguous bytes), and a pointer to the output buffer where the cipher block is stored.
By contrast, the reference Rijndael code (at least the snapshot of the "ref21" version that I took in 2001) has a separate wrapper API module whose responsibilty it is to do this conversion between the transposed internal representation and the external representation around the calls to the algorithm module proper.
Here is a snippet of the code from the blockEncrypt
function:
/* Reference implementation */
switch (cipher->mode) {
case MODE_ECB:
for (i = 0; i < numBlocks; i++) {
for (j = 0; j < cipher->blockLen/32; j++) {
for(t = 0; t < 4; t++)
/* parse input stream into rectangular array */
block[t][j] = input[cipher->blockLen/8*i+4*j+t] & 0xFF;
}
rijndaelEncrypt (block, key->keyLen, cipher->blockLen, key->keySched);
for (j = 0; j < cipher->blockLen/32; j++) {
/* parse rectangular array into output ciphertext bytes */
for(t = 0; t < 4; t++)
outBuffer[cipher->blockLen/8*i+4*j+t] = (BYTE) block[t][j];
}
}
break;
See how this does a "parse input stream into rectangular array" before calling the encryption function, and then "parse rectangular array into output ciphertext bytes". Ouch!
I suspect the authors may have described the algorithm and implemented it in this obtuse transposed way on purpose: to encourage users to roll their own implementations based on the underlying concept, rather than rely on the reference implementation. Reference implementations can be dangerous. If everyone uses a reference implementation, that creates a monoculture whereby a single defect will contaminate the entire field of deployment.
Return to Home page.