Optimized Hardware Implementation of The Advanced Encryption Standard Algorithm

Optimized Hardware Implementation of the
Advanced Encryption Standard Algorithm

Ahmed Fathy Abd Elfatah
Ibrahim F. Tarrad
Ali Ismail Awad
Hesham F. A. Hamed
Faculty of Engineering
Minia University
Al Azhar University
Al Azhar University
Al Azhar University
Minia, Egypt
Qena, Egypt
Cairo, Egypt
Qena, Egypt
Email: ahmedf.abdelfatah@gmail.com Email: tarradif@gmail.com Email: aawad@ieee.org Email: hfah66@yahoo.com
AbstractData encryption has become a vital need for protecting the user data in most of communication areas. Advanced
Encryption Standard (AES) algorithm has become the optimum
choice for various security services in numerous applications due
to its reliability and flexibility. The AES algorithm faces two main
challenges which included in both encryption/decryption speed,
and the consumed implementation area. This paper presents an
optimized implementation of the AES algorithm with respect to
the consumed implementation area by combining both data and
key expansion approaches. The optimized implementation of AES
increases its applicability in the small sized devices such as mobile
phones and smart cards. The experimental outcomes prove the
superiority of the proposed optimization approach compared
to the available approaches in the literature with acceptable
frequency and throughput for low throughput applications.
I. I NTRODUCTION
Data encryption is an important process in almost all data
transaction applications such as e-commerce, electronic banking and even over simple web-based applications. Data encryption is the process of transferring data into a scrambled format,
but at the same time, it allows the intended recipient to restore
the original data by using secret key. Data encryption and
decryption are the two major functions in any cryptography
system. Encryption process transfers data into unintelligible
format by secret key to guarantee the user privacy. Decryption
is the opposite process that is used to restore the scrambled
data into its original format using the same secret key [1], [2].
Advanced Encryption Standard (AES) algorithm is one of
the symmetric key block ciphers with fixed block size as 128
bit, and different key lengths as 128 bit, 192 bit, and 256
bit [3], [4]. However, the AES provides moderate security
level with 256 bit key length, some AES applications are
keep struggling for small size implementation area such as
smart card and cellular phones. Therefore, the implementation
area is considered as an important factor for the real time
deployment of the AES algorithm. The optimization of the
AES consumed area is an interesting problem, and reducing
the AES implementation area is a compulsory requirement.
Field Programable Gate Array (FPGA) [5] is an Integrated
Circuit (IC) that can be repeatedly reconfigured according
to the need of the implemented applications. It produces
different behaviors related to simple configuration changes [6].
According to the previous property and its low cost, FPGA is
978-1-4799-0080-0/13/$31.00 2013 IEEE
considered as an appropriate environment for simulating the

hardware implementation of the AES encryption algorithm [7].
This paper presents an optimized hardware implementation
of the AES algorithm on the FPGA hardware. The optimization process emphasizes the reduction of the consumed
area, the number of used gates on the Complementary MetalOxide Semiconductor (CMOS) technology, or the number
of used slices on FPGA technology. The philosophy of the
optimization approach depends on the expansion of both input
data block and the key length. The novelty of the presented
approach lies on dividing the 128 bit input block length into
16 parts with 1 byte length each, and pass them in series to a
single S-box. The 128 bit key is also divided into 4 words with
32 bit length each, and the last word (31:0) is further divided
into 4 bytes. Finally, the last word is passed serially to a single
S-box byte by byte. The presented approach optimizes the
AES implementation area by reducing the number of used Sbox from 20 to 2 for both expansions. However, this approach
reduces the implementation area, it consumes more cycles to
accomplish the data encryption process.
The contribution in this paper is two folds: First, it presents
a new optimization methodology for the AES algorithm that
works well with the hardware deployment, and it is suitable
for small sized applications with low throughput and low
frequency. Second, the AES area optimization in terms of
the consumed area opens doors to deploy AES algorithm on
small sized chips such as smart cards. Driven from the reported
results, the possibility to compromise the tradeoff between the
consumed area and the speed of the AES-based encryption and
decryption processes will become a near goal.
The reminder part of this paper is organized as follows:
Section II covers the background of the AES algorithm, and
its way of working. The related work and the tackled research
problem are documented in Section III. Section IV theoretically explains the proposed optimization approach. Section V
illustrates the FPGA implantation, and the evaluation phase
of the proposed AES area optimization approach. Finally,
conclusions and future work are reported in Section VI.
II. A DVANCED E NCRYPTION S TANDARD
Data Encryption Standard (DES) [8] was considered as a
model for the symmetric key encryption which has a key
197
Fig. 1.
A block diagram of a complete AES encryption and decryption modules.
length of 56 bit. This key length is considered small, and the

DES can easily be hacked [9]. According to the National Institute of Standards and Technology (NIST) [8] released contest
in order to choose a new symmetric cryptography algorithm
that would replace the DES algorithm, a five algorithms have
been chosen as Mars, RC6, Rijndael, Serpent, and Twofish.
Later on, and after a detailed evaluation, NIST announced
Rijndael as a proposed AES [10], [11]. The AES algorithm has
a fixed size of input data block as 128 bit. The AES algorithm
can encrypt and decrypt with the three different key lengthes
as 128 bit, 192 bit, and 256 bit which defined as AES128,
AES192, and AES256 [12]. The key length, the input block
size, and the number of rounds for each AES mode (128,
192, and 256) are reported in Table I. The difference between
Rijndael algorithm and the AES algorithm is the that Rijndael
has variable input block size with minimum 128 bit. However,
the AES algorithm supports only 128 bit input block size [3].
The data encryption operation is carried out using two
dimensional array of bytes called State matrix. The State
matrix is formed as 4 rows of bytes and each row has Nb
of bytes [4]. In the AES128 algorithm, as the principal AES
algorithm, the regular round consists of four main operations
which called SubBytes, ShiftRows, MixColumns, and
AddRoundKey. In the last round, only three operations are
found while MixColumns operation is eliminated [13]. The
block diagram of the operated modules in the encryption
process of the AES algorithm are shown in Fig. 1.
SubBytes is a nonlinear transformation that uses 16
identical 256 byte substitution tables (S-box) for independently
mapping each byte of State matrix into another byte. S-box
inputs are generated by computing multiplicative inverses in
Galois Field GF(28 ), and applying an affine transformation.
SubBytes can be implemented either by computing the substitution or using look up table. ShiftRows is responsible for
shifting the bytes in the second, the third, and the fourth rows
by one, two, and three, respectively. It is worth mentioning
that this function requires no hardware resource, and it can be
executed on the FPGA as plain wiring [10]. MixColumns
is a linear transformation, and it is conducted on the State
matrix column by column. The key schedule operation of the
AES algorithm generates a total number of words equals to
Nb (Nr + 1) in order to accomplish the encryption and the
decryption operations [14], [15], [16].
The decryption process is the inverse operation of the
encryption one which inverse the round transformations in
order to restore back the original plain data. The round
transformations of the decryption process have four functions as AddRoundKey, InvMixColumn, InvShiftRows
and InvSubBytes, respectively. AddRoundKey is an
XOR function. InvShiftRows have the same function as
ShiftRows but only in the inverse direction. Thus, the first
row is not going to be changed, but the second is shifted
by one, the third is shifted by two, and last is shifted by
three. The InvSubBytes transformation is performed using
a permutation table called InvS-box that has 256 numbers
(from 0 to 255) [9]. Fig. 1 demonstrates the block diagram
of the operated modules in the decryption process. It is worth
noticing that the AES algorithm can operate in four modes,
Cipher Block Chain (CBC), Cipher Feedback (CFB), Output
Feedback (OFB), and Electronic Code Block (ECB) [1], [17].
III. R ELATED W ORK
Optimizing the implementation area of the AES algorithm is
an interesting problem, and many tries to tackle this problem
are found in the literature. Hamalainen et al. [13] reduced
the implementation area (number of gates) by parallelizing
the AES operations on the FPGA. The high level architecture
consists of byte permutation, MixColumn multiplier, parallel
198
Fig. 2. The proposed key expansion approach. The 128 bit key length is
divided into 4 words with 32 bit length, and processed using a single S-box.
Fig. 3. The Register Transfer Level (RTL) block diagram of the proposed
area optimization approach with key expansion and data expansion modules.
to serial converter, S-box, and key scheduler. The proposed

design was implemented on a 0.13 m CMOS technology
with total 3100 consumed gates, and 121 Mbps maximum
throughput. Therefore, this approach is appreciated for low
cost and low power applications.
Rady et al. [18] built an AES core architecture that consists
of three units as controller unit, interfaces unit, and a main
encryption/decryption unit with key expansion and storage.
The proposed architecture introduced two ways to improve the
AES area. The first way is by iterative the key expansion and
ordinary round. The second way is by sharing specific resource
in the ordinary round and key expansion. The maximum
consumed area was 2699 slices. However, the throughput was
equal to 10 Mbps, and the frequency was equal to 45 Mbps.
Granado-Criado et al. [19] used the combination of three
hardware language (Handel-C, VHDL, and JBit) with partial
and dynamic reconfiguration. The VHDL element has been
synthesized in advance, and an interface with Handel-C is
carried out. This approach achieved an area of 3576 slices
on FPGA, frequency equals to 194.7 MHz, and throughput
equals to 24.922 Gbps.
Rachh et al. [20] proposed two approaches for efficient AES implementation. The first approach is based on
optimized S-box followed by bit-wise implementation of
MixColumns and AddRoundKey for encryption, and optimized InverseS-box followed by bit-wise implementation of
InvMixColumns and AddMixRoundKey for decryption.
The second approach is based on combining the S-box with
the MixColumns and AddRoundKey modules to build a
complete and integrated encryption module. The best archived
results for FPGA implementation were reported as 1838 slices
for area, 50.191 MHz for the frequency, and 0.642 Gbps for
the throughput.
Admittedly, the reported contributions in Table II provide

AES implementations optimized in consumed area. However,
the consumed implementation area is still very huge in each.
This makes these approaches unsuitable for small scale applications such as access cards. Therefore, this research focuses
on optimizing the required area for the AES implementation.
The optimized AES area presented in this research works well
in small sized and low throughput applications.
IV. P ROPOSED O PTIMIZATION S CHEME
The idea behind the proposed optimization approach in this
paper is to apply the expansion technique not only on the AES
input block, but also on the AES key in order to reduce the
number of the used S-box modules. In the data expansion, a
normal data expansion approach is used. The 128 bit input data
block into 16 parts with 1 byte length each, and the 128 bit
key is also divided into 4 words with 32 bit length each, and
the last word (31:0) is further divided into 4 bytes. The word
in the key is passed sequentially to a single S-box byte by
byte. The combination of data expansion and key expansion
reduces the AES implementation area because of both data
path and the key path are optimized by reducing the number
of S-box from 20 to 2 for both phases.
The presented approach focuses on expanding the 128
bit AES key by dividing it into 4 words with 32 bit
each. The divided words are defined as W (127 to 96),
W (95 to 64),W (63 to 32), and W (31 to 0). Due to the
behavioral property of the S-box that it processes large amount
of bits, up to 256 bit, a single S-box is used for key expansion
instead of using 4 S-box in the traditional key management.
The architecture of the proposed approach is shown in Fig. 2.
TABLE II
A
COMPARISON BETWEEN THE PERFORMANCE OF THE PROPOSED

APPROACH AND THE OTHER APPROACHES IN THE LITERATURE .
TABLE I
T HE SPECIFICATIONS OF AES WITH DIFFERENT KEY LENGTHES . T HE Nk
AND THE Nb ARE MEASURED BY WORD THAT CONTAINS 32 BITS [4].
AES128
AES192
AES256
Key length (Nk )

4.0
6.0
8.0
Block size (Nb )

4.0
4.0
4.0
[13]
[18]
[19]
[20]
Proposed
# of rounds (Nr )
10.0
12.0
14.0
199
Area
3100
2699
3576
1838
1226
Frequency (MHz)
152
45
194.7
50.2
157
Throughput (Gbps)
0.121
0.010
24.922
0.642
0.050
Fig. 4. The timing diagram of the encryption process produced from the simulation of the optimized AES algorithm. The encryption process consumes 169
clock cycles with clock period as 6.37 ns. The total consumed time is (169 6.37 = 10.76 s). The figure shows the input data, the key, and the cipher text.
V. P ERFORMANCE E VALUATION
VI. C ONCLUSIONS AND F UTURE W ORK
The proposed algorithm is synthesized and implemented

using Xilinx ISE 9.2i. The experiments in this paper have been
R
conducted using PC with Intel
CoreTM i5-M480 running
at 2.67 GHz, and 4 GB of RAM. The PC is empowered by
R
64 bit.
Windows
This paper has presented an optimization to the AES algorithm in terms of the consumed implementation area. The
presented optimization reduces the consumed implementation
area by combining the input block expansion with the key
expansion. The combination of the two expansions reduces
the number of total used S-box from 20 to 2 for both input
block and key management. The simulation of the proposed
approach on FPGA hardware proves its superiority compared
to the other approaches. However, the presented approach
optimizes the consumed area, it takes much clock cycles to
accomplish the encryption process. Therefore it is suitable
for the applications with low throughput. Compromising the
tradeoff between the AES area and the encryption speed as a
complete AES optimization will be tackled as a future work.
A. Evaluation Results
The proposed approach has been implemented using FPGA
Spartan3 (XC3s400-4pq208) board, and VHDL programming
language. The Register Transfer Level (RTL) block diagram
of the implemented approach is shown in Fig. 3. The figure demonstrates the three major functional blocks as the
key expansion, the input data expansion, and the main data
encryption block. The key expansion and the input data
expansion blocks work as supporting blocks in priori to the
main encryption block.
The simulation of the implementation shown in Fig. 3
produces the timing diagram represented in Fig. 4. The timing
diagram expresses the input data as {32 43 F6 A8 88 5A 30
8D 31 31 98 D2 E0 37 07 34}, the key in hexadecimal as
{2B 7E 15 16 28 AE d2 A6 AB F7 15 88 09 CF 4F}, and the
output cipher text as {39 25 84 1D 02 DC 09 FB DC 11 85 97
19 6A 0B 32}. Additionally, it demonstrates that the execution
process consumes 169 clock cycles with 6.37 ns clock period.
Therefore, the total consumed time can be calculated as 169
6.37 = 10.76 s. The maximum achieved frequency is the
reciprocal of the clock period which is equal to 157 MHz.
A comparative study between the performance of the proposed optimization approach and the other approaches in the
literature in terms of consumed area, achieved frequency, and
archived throughput is shown in Table II. The table confirms
that the proposed AES optimization approach consumes the
lowest area, 1226 slices, with achieved frequency higher than
the one that consumes 1838 slices. The problem with the
proposed optimization is the low throughput value, 0.05 Gbps,
but this value is acceptable in the low throughput applications.
200
R EFERENCES
[1] H. C. v. Tilborg, Encyclopedia of Cryptography and Security. Secaucus,
NJ, USA: Springer-Verlag New York, Inc., 2005.
[2] C. Paar and J. Pelzl, Understanding Cryptography: A Textbook for
Students and Practitioners, 1st ed.
Springer Publishing Company,
Incorporated, 2009.
[3] J. Daemen and V. Rijmen, The Design of Rijndael: AES - The Advanced
Encryption Standard. Springer-Verlag, 2002.
[4] National Institute of Standards and Technology, Advanced
encryption standard (AES), FIPS Publication 197, pp. 1
51, 2001, last visit on 24/08/2013. [Online]. Available:
http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
[5] S. Kilts, Advanced FPGA Design: Architecture, Implementation, and
Optimization. Wiley-IEEE Press, 2007.
[6] O. Gomes, R. Moreno, and T. Pimenta, A fast cryptography pipelined
hardware developed in FPGA with VHDL, in the 3rd International
Congress on Ultra Modern Telecommunications and Control Systems
and Workshops (ICUMT), October 2011, pp. 16.
[7] M. H. A. Mijalli, Efficient realization of S-Box based reduced residue
of prime numbers using Virtex-5 and Virtex-6 FPGAs, American
Journal of Applied Sciences, vol. 8, no. 8, pp. 754757, 2011.
[8] National Institute of Standards and Technology, last visited on
24/08/2013. [Online]. Available: http://www.nist.gov/index.html
[9] T. Hoang and V. L. Nguyen, An efficient FPGA implementation of
the advanced encryption standard algorithm, in the IEEE RIVF International Conference on Computing and Communication Technologies,
Research, Innovation, and Vision for the Future (RIVF), March 2012,
pp. 14.
[10] J. Zambreno, D. Nguyen, and A. Choudhary, Exploring area/delay

tradeoffs in an AES FPGA implementation, in the 14th Annual International Conference on Field-Programmable Logic and Applications
(FPL04). Springer, 2004, pp. 575585.
[11] M. Wali and M. Rehan, Effective coding and performance evaluation
of the Rijndael algorithm (AES), in the Student Conference on Engineering Sciences and Technology (SCONEST 2005), 2005, pp. 17.
[12] S. Tillich, M. Feldhofer, T. Popp, and J. Groschadl, Area, delay, and
power characteristics of standard-cell implementations of the AES SBox, Journal of Signal Processing Systems, vol. 50, no. 2, pp. 251261,
February 2008.
[13] P. Hamalainen, T. Alho, M. Hannikainen, and T. Hamalainen, Design
and implementation of low-area and low-power AES encryption hardware core, in the 9th EUROMICRO Conference on Digital System
Design: Architectures, Methods and Tools (DSD 2006), 2006, pp. 577
583.
[14] R. Elumalai and A. R. Reddy, Improving diffusion power of AES
Rijndael with 8 8 MDS Matrix, International Journal of Scientific
and Engineering Research, vol. 2, no. 3, pp. 251261, March 2011.
[15] M. H. Rais and S. M. Qasim, Efficient fpga realization of S-Box using
[16]
[17]
[18]
[19]
[20]
201
reduced residue of prime numbers, International Journal of Computer

Science and Network Security (IJCSNS), vol. 10, no. 1, pp. 9674, 2010.
J. Huang, J. Seberry, and W. Susilo, A five-round algebraic property of
the advanced encryption standard, in the 11th International Conference
on Information Security, ser. ISC08. Springer-Verlag, 2008, pp. 316
330.
K. Gaj and P. Chodowiec, Hardware performance of the AES finalistssurvey and analysis results, Tech. Rep., 2000, last visit on 24/08/2013.
[Online]. Available: http://teal.gmu.edu/crypto/AES survey.pdf
A. Rady, E. El Sehely, and A. El Hennawy, Design and implementation
of area optimized AES algorithm on reconfigurable FPGA, in the
International Conference on Microelectronics (ICM 2007), December
2007, pp. 3538.
J. M. Granado-Criado, M. A. Vega-Rodrguez, J. M. Snchez-Prez,
and J. A. Gmez-Pulido, A new methodology to implement the AES
algorithm using partial and dynamic reconfiguration, Integration, the
VLSI Journal, vol. 43, no. 1, pp. 7280, 2010.
R. R. Rachh, P. Mohan, and B. Anami, Efficient implementations
for AES encryption and decryption, Circuits, Systems, and Signal
Processing, vol. 31, no. 5, pp. 17651785, 2012.

Optimized Hardware Implementation of The Advanced Encryption Standard Algorithm

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Optimized Hardware Implementation of The Advanced Encryption Standard Algorithm

Cargado por

Copyright:

Formatos disponibles

Optimized Hardware Implementation of the

Advanced Encryption Standard Algorithm

Ali Ismail Awad

978-1-4799-0080-0/13/$31.00 2013 IEEE

considered as an appropriate environment for simulating the

A block diagram of a complete AES encryption and decryption modules.

length of 56 bit. This key length is considered small, and the

to serial converter, S-box, and key scheduler. The proposed

Admittedly, the reported contributions in Table II provide

COMPARISON BETWEEN THE PERFORMANCE OF THE PROPOSED

Key length (Nk )

Block size (Nb )

VI. C ONCLUSIONS AND F UTURE W ORK

The proposed algorithm is synthesized and implemented

[10] J. Zambreno, D. Nguyen, and A. Choudhary, Exploring area/delay

reduced residue of prime numbers, International Journal of Computer

También podría gustarte