Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Mike Knee
Snell & Wilcox, UK
Abstract: This tutorial paper gives a fairly gentle introduction to the basic mathematics required
to understand encryption systems. Three main areas are covered: modular arithmetic, prime
numbers and probability theory. These should help the reader to understand how conventional and
public-key encryption systems work, how they might be attacked and how to improve their security.
TM
Modular division
Look at figure 4 , which shows modulo- 10 multiplication by 2 .
If we trace the arrows in the opposite direction, we have
Modulo- 10 arithmetic would delight schoolchildren, because the inverse operation, which is division by 2 . But we have
it’s the same as ordinar y arithmetic but without having to a problem. Dividing an even number by 2 has two possible
remember to carr y anything over. results, while dividing an odd number by 2 isn’t possible.
This does not fit our need for a one-to-one mapping. We need
Modular arithmetic as a one-to-one mapping a unique result for ever y division, and we don’t want to start
The “XYZ” problem above gives us a hint as to why modular introducing fractions to deal with the odd numbers. It turns out
arithmetic is so important in encr yption. As a general rule, that we don’t get this problem if the divisor and the modulus
we do not want encr yption to have a big effect on the size are relatively prime (this is defined later), for example in
of a message. Making a message smaller is the domain of figure 3. It’s even better if the modulus is prime, because then
compression, while making it bigger is the domain of error the condition will be met for all divisors. When we meet this
protection. In encr yption, we would like a reversible mapping condition, we can do modular division. This is one reason why
between the message (plaintext) and its encr ypted version modular arithmetic for encr yption is done using prime or
(ciphertext) and we don’t want to add too much redundant relatively prime numbers; more on this in the next chapter.
information. When we come to multiplication and
exponentiation using integers, the problem with ordinar y Figure 4: Multiplying by 2 modulo-10
arithmetic is that the range of outputs from fairly small input
numbers can be huge. The nice thing about modular arithmetic
is that ever ything is kept within the range of the chosen
modulus.
Modular multiplication
Again, modular multiplication is just like ordinar y
multiplication with the added step of dividing the result by the
modulus and taking the remainder. For example, in modulo- 7
arithmetic, 3 x 6 = 4 , because 4 is the remainder on dividing
18 by 7 . Modular exponentiation
Modular exponentiation - taking a number to a power - is
Modular multiplication has a pleasing symmetr y about it. For done by repeated multiplication, just as in normal arithmetic.
example, figure 3 illustrates multiplication by 3 in modulo- 10 Modular exponentiation is important in encr yption because in
arithmetic. the right conditions it is a good example of a supposed one-
way function: fairly easy to do, but ver y hard to undo. There
Figure 3: Multiplying by 3 modulo-10 are a couple of tricks which make modular exponentiation
“fairly easy”. These are best described by an example.
Suppose we wish to calculate 7 18 in modulo- 31 arithmetic.
A slightly better approach would be to take the remainders at Figure 5: Linear feedback shift register
intermediate stages in the repeated multiplication. The first few
steps would be:
1x7=7
7 x 7 = 49, remainder = 18
18 x 7 = 126, remainder = 2
2 x 7 = 14 Careful choice of the arrangement of adders will ensure that
14 x 7 = 98, remainder = 5 the machine generates a maximal-length sequence, in this case
5 x 7 = 34, remainder = 3 2 7 - 1 = 127 bits. Each time the shift register is clocked within
etc. this period, it will contain a unique seven-bit word. (Exercise
for the reader: can you design a shift register to give a
We have removed the need to handle large numbers but there maximal-length sequence as above but using only one adder?)
are still many multiplications to do. Here we are in the realm of Galois field theory. The modulo- 2
arithmetic system is known as a Galois field of order 2 , or
GF ( 2 ). We can define polynomials with this arithmetic, and
An even better approach is to recognize that 7 18 = 7 16+2 = 7 16 x 7 2.
these can be mapped onto the shift register architecture. In
These partial powers of 16 and 2 can be calculated easily
the example of figure 5, the equivalent generator polynomial
by repeated squaring, and as usual we can deal with the
is 1 + X 2 + X 3 + X 4 + X 7 , and the shift register is actually
remainder whenever we wish. 7 2 = 49 which has a remainder
doing arithmetic with polynomials of order 7, known as
of 18. To calculate 716, we square this result three times more, so:
GF (2 7 ), modulo the generator polynomial. It turns out that if
the generator polynomial is irreducible (the equivalent of
18 2 = 324 , remainder = 14
“prime” for polynomials) then the resulting sequence will be
14 2 = 196 , remainder = 10
maximal-length.
10 2 = 100 , remainder = 7
Fermat and Euler ver y difficult if only the product pq is known; this number
would first have to be factorized and we are working on the
Between them and unwittingly, these two gentlemen laid the
assumption that it is ver y difficult to factorize large numbers.
foundations for modern public-key encr yption. A ver y
So we have met our first condition. It remains to show that
important theorem is Euler ’s theorem, which is a bit difficult
m ed = m. Choosing d as the reciprocal of e means that there
to explain here. I shall just give two corollaries of the theorem,
must be an integer t with ed = 1 + t(p - 1 )(q - 1 ) (in normal
one of which was actually proved a centur y or so earlier than
arithmetic). So med = m 1+t(p-1)(q-1) = m(m t ) (p-1)(q-1)
Euler’s theorem.
Letter distributions How can the cr yptographer get over this problem and
transmit messages more securely? There are two (overlapping)
You don’t need to look at much English text to see that the
approaches: design more secure algorithms, or change the
letter e occurs often and the letter z occurs far less often.
message.
The following histogram shows the approximate probabilities
of the 26 letters of the alphabet.
More secure algorithms
Figure 8: Histogram of English letters One simple approach to improving security is to make sure
the eavesdropper doesn’t receive enough information to make
a reliable histogram. Unfortunately, this means that the key
has to be changed more often, which brings about its own
security implications. Nevertheless, this is the principle behind
the one-time pad, which we shall come back to in a moment.
Polyalphabetic cipher
A slight variation on the above, which overcomes the key-
changing problem, is to use a polyalphabetic cipher, in which
several substitution ciphers are used in a predetermined,
repeating sequence (which actually becomes par t of the key).
This has the advantage that there is no longer a one-to-one
correspondence between plaintext and ciphertext letters.
But with a long enough piece of ciphertext, it is still possible
using statistical tests to determine the repeat period of a
It is useful to order this histogram by probability, starting with polyalphabetic cipher. To give a flavour of how such tests work,
the most common letter, as shown here. we will now take a quick look at an example, the Friedman
Test, which uses the Index of Coincidence.
Figure 9: Ordered histogram of English letters
Index of Coincidence
The Index of Coincidence (IC) is defined as the probability
that two letters randomly selected from the text are equal.
This is something that we can measure on plaintext, and for
normal English text the answer is about 0.065 . If our language
was such that all the letters had the same probability, the
Index of Coincidence would be about 1/26 = 0.038 (this can
be calculated using the basic probability arithmetic introduced
above). Now let us see what happens if we measure the IC of
some ciphertext. With a simple (monoalphabetic) substitution
cipher, we expect the IC to be close to 0.065 , since all we have
done is relabelled the letters. The effect of a polyalphabetic
cipher is to even out the probabilities of the letters, so the IC
would get closer to 0.038 . It is even possible to estimate the
repeat period of the polyalphabetic cipher from the IC ; the
Now suppose we receive a message that has been encr ypted longer the period, the closer the IC will approach 0.038 .
using a substitution cipher, in which ever y letter has been
replaced by another letter, the same one each time it occurs.
So we can see how probabilistic techniques can help the
The Caesar cipher is a simple example of a substitution cipher,
cr yptanalyst to break a cipher.
but any one-to-one mapping of the alphabet can be used
provided information about the mapping can be transmitted to
Polygraphic ciphers
the intended recipient (this becomes the key). An eavesdropper
could make an ordered histogram of the letters in the One early approach to overcoming the letter distribution
ciphertext. If she is lucky, she should obtain something with problem was to group letters together to make a polygraphic
a similar shape to Figure 9, but with different letters. To some substitution cipher. For example, the letters could be grouped
degree of accuracy, she should be able to match the letters one in pairs to make an “alphabet” with 26 x 26 = 676 symbols.
by one according to their relative frequency of occurrence,
and break the cipher.
Clearly, handling a histogram with more symbols is more The message is expressed as a sequence of bits and the key is
difficult, but the fundamental problem that the symbols have an a random sequence of the same number of bits. The encr yption
uneven distribution does not go away. For example, the pair algorithm is simply the modulo-2 addition (exclusive- OR ) of the
“th” is ver y common and the pair “zy” ver y rare. Nevertheless, message with the key. It is possible to prove that the one-time
the bigger the group of letters chosen as the basis for the pad is completely secure; without knowledge of the key, the
algorithm, the more difficult the cr yptanalyst’s task will be. ciphertext gives no information whatsoever about the plaintext.
Any probabilistic attack on the ciphertext will show that ever y
Transposition ciphers bit will be independent of ever y other (provided the key is
Another approach is to combine substitution with transposition indeed random.)
- changing the position of the ciphertext as well as its
representation. Here, attacks based simply on the distribution There are two problems with the one-time pad. The obvious
of letters are of little help because they say nothing about one is that huge amounts of key information have to be
where the letters are. But more sophisticated use of probability exchanged before communication can take place, and this
theor y can still help the cr yptanalyst. For example, we know key information must be random. This problem can be
that adjacent letters in English text are not independent. overstated. For example, it is possible to generate a matched
The probability of qu is not the probability of q multiplied by pair of DVD s containing random data and give one to the
that of u, but is much closer to the probability of q alone, sender and one to the receiver. If the messages are limited to
because q is almost always followed by u (apologies to any text only, it will be possible for that pair of people to encr ypt
Iraqis here!) This fact can be used when measuring and decr ypt around a million pages of information before
probabilities of pairs of letters in the ciphertext spaced by exhausting the one-time pad.
different inter vals, to get some idea of the pattern that has
been used to rearrange the letters. The other problem with the one-time pad, which should be
noted in passing, is that it is not secure against the kind of
The well-known Data Encryption Standard (DES) and the attack where the eavesdropper knows some plaintext, intercepts
newer Advanced Encryption Standard (AES, but nothing the corresponding ciphertext, recovers that part of the key and
to do with audio!) are essentially combinations of multiple uses it to generate alternative “fake” ciphertext.
polygraphic ciphers (with a block size of 64 bits or 8 letters)
with transposition ciphers. It is a testimony to the security of Changing the message
the DES that up to now the only way to break that cipher has The other set of approaches to improving security is to change
been an exhaustive search over all possible keys. Techniques the message so that there is less useful information about the
based on probability theor y have not worked. probability distribution of letters or groups of letters in the
plaintext. This can happen inadvertently, for example if the
The one-time pad text is actually in Polish while the cr yptanalyst thinks it is in
An important class of encr yption algorithm is the one-time English, or if we are transmitting the famous novel “Gadsby”
pad, where the key is as long as the message and any part by Ernest Vincent Wright, which does not contain the letter e.
of the key is only used once. A simple example of a one-time A more generally applicable technique is to tr y to hide the
pad, but one that is as secure as any other, is shown in letter distribution.
figure 10.
One way to do this with English language plaintext is to give
Figure 10: Simple one-time pad the more frequent letters a series of symbols, one of which is
chosen at random each time the letter is used. For example,
we could assign three symbols to e, two to a, two to o, and so
on. This would have the effect of flattening out the histograms
shown in Figures 8 and 9, making an attack based on letter
distributions much more difficult. However, this approach
would still be susceptible to attacks based on the probabilities
of groupings of letters, and it also has the disadvantage that it
increases the size of the message because there are more
symbols in the “alphabet”.
snellwilcox.com
Bibliography
As this is a tutorial paper, explicit references to books and
papers are not made within the paper. However, the following
short list of useful books and Internet resources is offered:
Lewand, R . E .
Cr yptological mathematics.
Mathematical Association of America, 2000 .
http://mathworld.wolfram.com
snellwilcox.com