Está en la página 1de 5

Modern Optimisation Algorithms for Cryptanalysis

Andrew Clark * Information Security Research Centre & Distributed Systems Technology Centre Queensland University of Technology Email: acZark@fit.qut.edu. au

Abstract

In recent years a number of optimiiation algorithms have emerged which have proven to be effective in solving a variety of NP-complete problems. Examples of such methods include simulated annealing, genetic algorithms and the tabu search. This paper will describe each of these three algorithms and Simple Ciphers overview their use in the field of cryptology. In 2 particular, the application to cryptanalysis of simple substitution and transposition ciphers is consid- This section gives a brief description of substitution and transposition ciphers. Although the ciphers ered. on their own are relatively simple, they form the building blocks for many popular and more complex ciphers - for example DES,the Data Encryp1 Introduction tion Standard. For the purposes of this paper it is Many researchers in the field of cryptanalysis are assumed that the alphabet consists of 27 characters interested in developing automated attacks on en- - A,B, . ..,Z, and the space character (denoted -). cryption algorithms (ciphers). When analysing ciphers it is advantageous that a proposed attack 2.1 Substitution Ciphers will run without human intervention, finishing eiThe substitution cipher simply involves substitutther when the message has been successfully d e crypted or the key has been determined (this is ing each letter in the message for another. Typiwhat is meant by an automated attack). Previous cally a key is represented as a permutation of the work in this area ([l], [4], [SI)has shown that sim- characters in the alphabet. For example the key ulated annealing and genetic algorithms can prcQWERTWIOPASDFGHJKL-ZXCVBNM vide successful automated attach on both substitution and transposition ciphers. The purpose of would encrypt the message this paper is to summarise the previous work and I-THINK-THEREFORE-LAM to present new work by applying the tabu search in cryptanalysing these ciphers. The tabu search ([2]) t o is a new and innovative technique which warrants OM-IOFAM-ITKTYGKTMOMQD further investigation and for this reason is included The key above works in the followingway: plaintext in the following discussion. Section 2 briefly describes the two cipher types A becomes ciphertext Q, B becomes W, C becomes E, being considered and gives a short summary of their etc. An important property of the substitution cipher properties. Section 3 discusses suitability assessment= which is an issue pertinent t o each of the al- is that the n-gram (nconsecutive letters in the mesgorithms described in the following three sections sage or the cipher) statistics are maintained. For (simulated annealing, genetic algorithms and the example, if an I occurs three times in the message, tabu search, respectively). Suitability assessment then the corresponding ciphertext letter (0 in the provides a means for determining the suitability example above) will occur the Same number of times in the encrypted message. This property is imporThe work reported in this paper h a been funded in part tant from the point of view of a cryptanalyst and is by the Cooperative Research Centres Program through the Department of the Prime Minister and Cabinet of the Com- the basis of each of the attacky on the substitution cipher described below. monwealth Government of Australia

of an arbitrary solution. Finally a comparison of these techniques is presented. The comparison is not intended to highlight one particular method but rather to illustrate positive and negative aspects of each of the algorithms. In each case experimental results will be given to substantiate any claims.

258
0-7803-2404-8/94/ $4.00 01994 IEEE

2.2 Transposition Ciphers p allows the weighting in favour of either the single character frequencies of the digram frequencies. The transposition (or permutation) cipher works on Variations of (1) are possible. For example, blocks of the message at the same time. The size of the block is constant and is usually chosen by the Forsyth and Safavi-Naini [I] choose a = 0. In user. Let M denote the length of a block. Each of his genetic algorithm attack Spillman [6] uses a far the M letters in the block is rearranged according more complicated function which normalises the r e to some fixed permutation (the key). An example sult to a value between zero and one. The fitness of the key of a transposition cipher key (where M = 6) is is more dficult to determine. T h i s is because the 4 1 6 3 2 5 relative frequencies of the single characters do not change upon encryption. Thus any fitness assesswhich transforms the message used in the example ment can not involve comparison of single charabove to acter frequencies. Furthermore, comparison of ngram statistics (where n > 1)is less powerful since HINT-IHKRT-ERE-OFEMITA-R the characters are already there in their natural where the random characters RThave been a p frequencies. For example, a digram common in pended to the original message to increase the the plaintext language (say TH) may also occur f r e length to be a multiple of M. quently in the encrypted message (or in a message The attack methodology used on the transposi- decrypted with an incorrect key). tion cipher is a s follows: Despite these properties of the ciphertext, it is still possible to mount a successful attack by com1. Propose a key, K,,. paring digram and trigram frequencies using an 2. Decrypt the ciphertext using Kp. equation similar to (1). An alternative approach was used by Matthews 3. Compare the n-gram (n > 1) statistics of the decrypted message with the known language [4] who calculated a fitness by assigning scores to a small list of the possible digrams and trigrams. statistics to evaluate a given key. The score assigned to a particular digram or trigram reflected its desirability in the decrypted message. As an example, consider the score table proposed 3 Suitability Assessment by Matthews. In each of the algorithms to be described a method of assessing each proposed solution is required. The standard assessment methods used by cryptanalysts when investigating substitution and permutation ciphers are discussed in this section. The assessment methods (sometimes known as fitness functions) have proven to be effective in attacking these cipher types. When determining the fitness of a key to a substito three consecutive The negative score tution cipher, the single character and digrams fre quencies are usually compared. A general formula spaces has a very powerful effect. Since the pace in Plain the most coonly Occurring for the fitness of key k is as follows (A denotes the set of characters in the alphabet ie. A,B, . . . ,Z, -) English text it is highly likely that this configuration will occur in a message - encrypted using the transposition cipher or in an unseccessful attempt ot decrypt it. The fitness function obtained using the score table might look as in (2).

where SF and SDF denote the relative frequencies of single charactem and digrams in the English l a guage (respectively), and DF and DDF denote the relative frequencies of single characters and digrams in the message decrypted using k. Varying a and

where s denotes the set Of di/trigrams in the score table, fi is the relative frequency of the ith di/trigram in the decrypted message and si is the corresponding score.

259

In this case perturbing the current solution simply involved swapping two randomly chosen e l e AS its name suggests, the simulated annealing al- ments of the key. In the case of the transposition gorithm is modeled on the process of annealing in cipher a rotation of the key by a random amount is a metal. Metals are annealed (heated to a high also used as a perturbation mechanism. temperature and then slowly cooled) to produce a molecular structure which is crystalline. The movement of atoms in a metal (between dif- 5 Genetic Algorithms ferent energy levels) is governed by the Metropolis criterion (51 which says that a particle moves from The second of the three algorithms to emerge was 6 ] ) . It uses ideas from energy level E1 to & with probability P given by the genetic algorithm ([4], [ the evolutionary process to iteratively breed supe(31, rior solutions to the problem at hand. Unlie simulated annealing which is only concerned with one solution (or key) at a time, the genetic algorithm maintains a pool of keys. This gene pool is manipulated using functions such a s selection, mating where k is the Boltzmann constant (ignored in the and mutation to evolve a near-optimal solution. The selection process typically involves choosing following discussion) and T is the temperature. [3]) mimics this random pairs from the current solution pool which The optimisation algorithm ([l], process by starting with a (usually) random solu- will mate t o produce ofbpring for the next genertion (which is analogous to the structure of a metal ation. Although the selection is random it is biased at high temperature) and then iterating according towards the fittest of the current pool. Mating is designed so that two parents will p r e to the Metropolis criterion and reducing the temperature according to a cooling schedule to arrive duce two children. Each child should have characteristics of both its parents. As an example conat a near-optimal solution. An implementation of the simulated annealing al- sider the the transposition cipher with a key of length M = 7. Firstly generate a random bit string gorithm follows these steps: of length M . 1. Generate an initial solution to the problem Parent 1: 4 5 1 7 3 2 6 (usually random). Parent 2: 4 3 1 2 7 6 5 2. Calculate the fitness of the initial solution. Bit String: 1 0 0 1 1 0 1
3. Set the initial temperature T = To

Simulated Annealing

Child 1 is created in two steps:


1. Take the elements in Parent 1corresponding to a 1 in the bit string.

4. For temperature, T , do many times


Generate a new solution - this involves perturbing the current solution in some manner.
0

Childl:

**

73

Calculate the cost of the modified solution. Determine the difference in cost between the current solution and the proposed solution.

2. Place the missing elements (denoted by *) in the order they appear in Parent 2.

Child 1: 4 1 2 7 3 5 6 Consult the Metropolis criterion to decide if the proposed solution should be ac- Child 2 is found in a similar manner. cepted. Child 2: 4 3 1 5 7 6 2 If the proposed solution is accepted, the required changes are made to the current Mutation is performed simply by swapping cersolution. tain elements within a key, or rotating the elements 5 . If the stopping criterion is satisfied the algo- in the key. Mutation usually occurs with a low probrithm ceases with the current solution, other- ability. The overall genetic algorithm can be summarised wise decrement the temperature, T , and return as follows: to Step 4.

1. Generate a random, initial gene pool. Here a gene represents a possible solution to the problem at hand - for example a possible permutation which decrypts a transposition cipher.

4. Update the tabu list and the 'best SO far' (if

necessary).
S. &peat Step 2,

2. A fitness is allotted to each gene in the pool.

3. A mating pool is then generated by selecting parents from the current gene population. This 5~l~tio Pn r o m , although random, is biased towards the fittest of the current solutions. 4. The genes in the mating pool are combined to produce a set of "children".
5. Each child then undergoes a mutation process with small probability (predetermined). The fitness is then calculated for each child.

and until a 6xed number of iterations have been performed, or there has been no improvement in the best solution for a number of iterations.

6. The best of the new generation and the old generation are then combined in some manner and the algorithm returns to Step 3.

7 . Stop after a certain number of iterations, or


when the message decrypts.

Tabu Search

the others &cussed in t h i paper is that in iteration only variations on one key are considered. ~h~makes the algorithm powerful since although the keys can change dramatically over a number of iterations, the difference between successive iterations is small but ever improving. The advantage of this becomes obvious when one considers what happens when a solution of the current configuration is close to the required answer. Both simulated annealing and the genetic algorithm can be led away from the optimum (perhaps with small probability), however the tabu search is more likely to find the optimum since most steps are upwards (towards Feater fitness). The tabu search achieves a good balance between the ability to jump out of regions of local minima and iterative improvement.

onedifference of this algorithm from

The most recent of the three techniques is the tabu


. search [2] which has proven to be very effective in

Results Compared

solving many optimiisation problems. The main aim of the tabu search is to provide an heuristic for finding a good solution to the problem at hand without becoming trapped in a local minimum. The algw rithm has the concept of a short term memory in the form of a tabu l i t . At each iteration the current key is added to the tabu list. This key w i l l remain 'tabu' for a fixed number of iterations. The algorithm is:
1. Generate a random initial solution and calculate its fitness. Record this as the best solution found so far.

Lack of space prevents further details of the algorithms being given. In this section a review of the three algorithms is presented. The results in Figure 1and Table 1 were obtained by running attacks on the substitution cipher using each of the three methods. Attacks on the transposition cipher were also implemented, with similar results. For each algorithm there are a number of different parameters which need to be varied to %netune" the optimisation process. Determining the optimal values for these parameters is a non-trivial task and is usually performed experimentally. In some cases guidelines are given. For example, the initial temperature in the simulated annealing algorithm should be chosen high enough such that every proposed move is accepted by the Metropolis criterion. Readers should refer to the references section for more detailed reports of implementing these algorithms. It is unreasonable to expect these methods to arrive at the correct solution on their first run. It is usually necessary to perform a number of runs, each starting from a different region (randomly chosen) of the solution space. Each algorithm was run with initial keys being chosen randomly. Figure 1 gives an indication of the convergence rates of each of the algorithms as a function of the number of iterations. Of course, this result can not be used as the only indicator of the superiority of any algorithm since the complexity

2. Create a list of possible moves: Here a 'move' consists of swapping two randomly chosen e l e ments of the current key. The size of this list is a parameter of the algorithm and is not necessarily k e d . The fitness of the solutions obtained by making each of the moves are calculated.

3. Choose the best admissible candidate. Of the candidate moves, which one is not tabu and yields the best improvement in the fitness of the current solution? A move which is tabu may be accepted if it satisfies the aspimtion criteria. In this case the aspiration criterion was that the fitness be at least as high as the best solution found so far.

261

' U

. ..

...

Tabusearch Genetic Algortihm . . . Simulated Annealing


0
I I I I I

- 350

50

100

150 200 250 Number of Iterations

300

Figure 1: Comparison of Algorithms of one iteration for each of the algorithms varies. It does, however, give a comparison of the complexities of the various algorithms. Simulated annealing is expected to require fewer iterations since each iteration is very computationally intensive many keys are considered and the fitness o f each key must be calculated. As can be seen from Table 1, the computation time required by the simulated annealing algorithm is the greatest, reinforcing the fact that each iteration considers many keys. The times presented in Table 1 were averaged over a number of mns of each of the algorithms. the application of techniques such as these to more complex ciphers is envisaged. The dXerent properties of each algorithm means that the use of one algorithm over another may be preferred for particular applications. In any case, these methods provide a reliable problem solving technique with many useful properties.

References
[l] W. S. Forsyth and R. Safavi-Naini. Automated cryptanalysis of substitution ciphers. Cryptolo. . gia; 17(4)1407418, October 1993. [2] Red Glover. Tabu search: A tutorial. Interfaces, 20(4):74-94, July 1990. [3] S. Kirkpatrick, C. D.Gelatt, Jr., and M. P. Vec-

Method
Simulated Annealing Genetic Algorithm Tabu Search

Average Time 242s 220s 94s


Of

ence, i20(4598):671-680,1983.

': Time comparison

rithms in cryptanalysis. Cqptologia, 17(2):187The genetic algorithm appears to "learn" more 201, April 1993. slowly although the convergence rate improves as the gene pool collects more fit solutions. The re- [5] N. Metropolis, A. W. Rosenblunth, M. N. sults in Figure 1show that the tabu search required Rosenblmth, A.H. Teller, and E. Teller. h u a -

_[41 _ Robert A. J. Matthews. The use of genetic alge

También podría gustarte