I took a course in Configurable Computing that was very enlightening as to the different types of configurable and reconfigurable hardware out there. I've worked with FPGAs before, of course, but I did not know about the other types of reconfigurable chips out there in use and in development. There is truly a lot of potential for this technology to take off!
For one of the projects in this class, we were asked to implement an algorithm in the Handel-C that lends itself well to use on a configurable platform. I chose to implement the Data Encryption Standard (DES). In fact, my goal was to gain enough of a performance improvement over software to make cracking DES (using a brute force key attack) feasible. I chose DES because it was a good match for using Handel-C, as the algorithm can be easily described in the C-like syntax and DES has great potential for being accelerated with hardware.
I began by searching for the details of implementing DES encryption/decryption on the Internet. I found some sites with the DES details as well as some sample code. I build my code in C from the ground up, using the examples I had to guide me. My software took me few days to write and I tested it to make sure it matched the results of the OpenSSH open source DES package.
I then created a program that used my DES encrypter/decrypter that would search through the entire keyspace (2^56 keys) and try to find the correct key for a given ciphertext. This was the software version of my DES Cracker. The basic algorithm for my cracker (which was based on the EFF DES Cracker Project) is as follows:
- Accept two 64-bit (8-byte) ciphertext blocks (encrypted with the same key)
- Choose a starting 56-bit key (i.e. 0x00000000000000)
- Decrypt ciphertext #1 with the chosen key
- Check the decrypted text for “normal” ASCII characters (‘a’-‘z’, ‘A’-‘Z’, ‘0’-‘9’, ‘.’, ‘,’, ‘?’)
- This set of characters can be easily changed by altering the contents of a look-up table
- If all eight characters are part of the chosen set of “normal” characters, go to step 6, otherwise, go to step 9
- Decrypt ciphertext #2 with the chosen key
- Check the decrypted text for “normal” ASCII characters
- If all eight characters are “normal,” print out the key and both plaintexts
- Increment the key and return to step 2
A follow-up program would then take all of the keys and plaintexts printed in step 8 and examine them further.
This algorithm will produce many false positives, but will eliminate most of the keys from the search space. This is done this way because nobody has found a way to mathematically crack DES using some type of algorithm for reducing the search space (without having many ciphertext/plaintext pairs to begin with). The probability of getting a false positive using two ciphertexts and the above character set is (65/256)^16 (69 out of the 256 possible ASCII codes are valid characters and 8 characters in each ciphertext). Thus there will be 2^55 * (65/256)^16 keys that need to be followed up on. This is about 10.7 million keys, which is quite feasible for a software implementation of DES.
I timed my software DES Cracker and calculated that it would search the entire keyspace in about 180,000 years on a 3 Ghz Pentium 4 processor. On average, it would finish in 90,000 years. I was pretty sure I could reduce this time in hardware.
Porting to Handel-C
In order to port my code from standard C to Handel-C, I pasted all of my code into a Handel-C file and kept making changes until it compiled. This took a few days to do. The major problem I had was in making my array look-ups work. I wanted these (which were mainly bit permutations) to be done as wires in hardware, so Handel-C needed to treat them as compile-time constants. I didn’t realize that I could do this with macro expressions would do this and ended up writing a C program to create the Handel-C code I wanted.
My first version of my hardware DES Cracker ran at 30 Mhz (limited by my VGA output circuits) and would crack DES in an average of 13,000 years—a speedup of 7X over my software implementation.
I optimized my hardware to parallelize many of the operations, such as the subkey generation step (previously done in 16 clock cycles, I shrunk it to a single clock cycle) and doing one DES round in single cycle instead of 8. With these and other optimizations, I measured my hardware at cracking DES in 945 years—a speedup of 95.2X over software.
Next, I replicated my DES Cracker circuit and placed 4 of them on the RC-10 board running in parallel. This didn’t slow down the circuit at all, so that cuts execution time by another 75%, a 380X speedup over software.
Next, I removed my clock-rate limiter—the VGA console controller—and saw that my circuit could be clocked higher-at about 60Mhz, for a total of 760X speedup over software. This translates to cracking DES in an average of 118 years. I had hoped to get closer to a measurement in weeks (or at least months!), but for a single FPGA, I guess that is pretty good.