Documentos de Académico
Documentos de Profesional
Documentos de Cultura
In this paper the propertics of Iincar-feedback shift- TXffcrent srqttenccs of integer random numbers are
register scquenccs are shortly rcviewcd and a sequence generated by starting with diffcrcnt values of the seed.
of order p = 1279 is proposed as a source of pseudo- I‘hc value of the modulus, M, is an upper limit for
randotn nutnbcrs. The production rule, by itself the period 1’ of the scqucncc, i.c. the number of con-
recursive, is efficiently vectorized on existing pipelincd sccu~ivc positions after which the scyuence is rcpcated.
vector processors, due to the large distance between if the constant (.T is zero the gcncrating recurrence is
the feedback positions and the positions to be gcner- called “tnuttipticativc-congrucntiat.” In this case the
atcd. The algorithm null scqucncc is produced when starting with a zero
may be implctncnted in
FORTRAN and may be easily ported on computers seed, while scqucnccs of rnaxitnal length P = M -1
with different word-lengths. 1Jsing a vectorized are producrd when siat-ting with any seed in the
IWR’I’RAN implementation of the algorithm on the interval [ 1,M- I] if M is a prime nutnbcr and A is a
113M 3090 tnodcl E Vector Facility, a rate of about primitive root of M [2].
6.5 million nortnalizcd random nutnbcrs per second
(Mrands) is achicvcci. A rate of about 3 Mrands is Implctncntitig an cficicnt pseudo-random number
obtained by an itnplcmcntation of the algorithm on gcncrator with a very large period using the recurrence
Ihe IUM RISC/6000 model 530. Simple statistical (I), hcncc ctnploping a lnrgc modulus M, is a difftcult
tests have shown the superiority of the proposed gcn- task, at least on computers using less than 64-bits for
erator with respect to the widely used multiplicative- infcgcr arithmetic, and this motivates the starch for
congruential gcncrator with modulus M = 23’ - 1 and altcrnativc production rules. In the following we try
tnultiplier A = 75 . to illustrate sotnc of the reasons why a very large
period P is n desirable propct-ty for a random number
generator.
677
In the following section some arguments are given then holds for the cfcmcnts xh = x Atp if the sign of aff
which suggest the use a high-or&r linear-feedback indicts is rcverlrd.
shift-regist.cr as the base of an algorithm for generating
pseudo-random floating-point numhcrs. Some per- ‘I‘hc l&it seed IV” of a maximal-fcngth scquencc may
formnncc data obtained with a FORTRAN impfe- bc reprcscntcd as a binary row vector and the effect of
mentation of the algorithm on the If3M 3090 Vector advancing k steps in the shift-rcgistcr sequence may bc
Facility and on the IBM RISC/6000 are discussed in described as the result of applying k times a linear
the third section. The results of some statistical tests, p x r matrix operator C, from the right, to w”, i.e.:
which demonstrate the quality of the generator, are
rcportcd in the last section. (6) ,$ zz w”,k , .
All the fincar-feedback shift-rcgistcrs xk = xk P@xk q , IA {x} br thr shift-rcpistcr SC~IICIICC of pbit column
associated with trinomials listed in the above table, vectors associated to a primitive trinomial of dcgrcc p.
product maximal-fcngth scqucnccs. 7’hc cnr@<~atc I,ct W’) hc tfw sctd matrix. If W” is nonsingular, each
.%?quctzce.uk= xh p@x,<,,,,, associatcd to a shift-register of the 2p - I nonzcro cofunin vectors occurs exactly
also has maximal fcngth. This is easily checkctf by once during one period of the scqocnce, and conversely.
observing that the production rule may be revcrtcd licncc, if fy) is a shift-rcgistcr sequence of m bit
and that the conjugate scqucnce is the revcrsc of the colmnn vectors, m < p, and the m p bit rows of the
original sequcncc if the seed is also revcrtcd. InJced, srcd matrix arc linearly indcpendcnt, then all nonzcro
from .xX= xa ,,@x.., it follows that xr. p = xA@xA s , colrml~~ vrctors occur cqaalfy oftal and the all-zero
since (x@y)@.v = x. The conjugate recurrence law colr~im vcctor occurs one time Icss.
678
From the maximal-length property of the sequence it computers hecomcs possihlc when the shifts relative
follows that all the nonzero pbit binary configurations to both t.hc feedback positions arc larger than the
occur once and only once along the entire sequence computer vector rcgistcr size. For example, the shift-
{w} because C” # C” for any pair of integers A, p register scqucnce xk = xk ,z&xk .IOh3,which is the con-
falling in the interval [I ,2p-11 and such that A# cc. jugate scqucncc associated to the trinomial
x’2’9 + xl’6 + 1 , may bc implcmcnted to fully exploit
Diffcrcnt recipes have been proposed by several the vector hardware of supcrcomputers with up to
authors to enforce the linear indepcndcnce of the seed 1024 vector rcgistcr clcmcnts.
matrix rows without introducing strong disuniform-
itics in the seed binary pattern. I,cwis and Payne [6) All the hut&n associated to the rcgulariaation of the
suggested the use of dclaycd replicas of the basic seed may bc eliminated, when using a generating
recurrence (4), starting from a p bit seed of all ones, to trinomial with a large dcgrec, since the probability of
initialize the m columns of the seed matrix W”. In generating a pseudo-random seed matrix with linearly
this case, the “dumping” of the strong seed regularity, dcpcnclcnt rows bccomcs negligible. In fact, it may be
which is inhcrcnt in having a first column of all ones, observed that when m is greater than one the number
is obtained by performing a large number of steps as of different maximal-length scqucnces {x} is much
part of the initialization procedure. Another simple larger than 2~- I, being cquat to the number of all
procedure, suggested by Kirkpatrick and Stall [3], possible linearly indcpcndcnt sets of m pbit vectors
employs a standard multiplicative-congential gcner- which can be used as the initial seed W”. When rnep
ator for the random initialization of the seed matrix, this number is of t.hc order of 2,nxp. The initialization
followed by a rcgularization that guarantees the linear procedure should not cnforcc the linear independence
independence of the rows. Also in this case the of the seed rows, bccausc the probability that a linear
rcgularization requires some care to avoid disuniform- depcndencc would occur when generating a m x 1279
ities of the seed binary pattern. binary seed matrix for the shift-register sequence
xk = xk- 1279@~k1o63is cnsily cstimatcd to bc less than
A great advantage of the crocedure proposed by 2m- 1279
9 and thercforc it is absolutely negligible for any
Kirkpatrick and Stall is that many diffcrcnt sequences reasonable value of the word size m .
may be generat.cd by specifying different values of a
scalar seed. It is important that this initialization pro- A very simple procedure for the initialization of the
cedurc be clearly and fully specified in order to seed matrix may be implemcntcd in FORTRAN and
compare the results obtained with different implemen- it may bc easily adapted to any computer in which
tations of a shift-rcgistcr algorithm. Indcecl, the algo- intcgcr arithmetic opcrdions arc pcrformcd on words
rithm itself is suitable to produce idcntica! numbers on of at lcast 32 bits and in which the floating-point
different computers using the same number of bits for mantissa contains up to 64 binary digits. The proce-
the floating-point mantissa. In the general case of dif- durc cmploycs the multiplirativc congrucntial gcner-
fercnt computers using mantissas of diffcrcnt length, ator with modulus 2”’ - 1 and multiplier A = 75 to
the differences could still be forced to appear only in simulate F x m “coin-tossing” cvcnts which determine
the least significant bits of the larger floating-point the random initialization of the p x m seed matrix.
word when consistent procedures arc employed for the The m x p bits of the seed matrix arc initial&d, pro-
initialization of the seed. Therefore, a standardization cccding in column-major order, to 0 or 1 depending
of the initializntion procedure may offer great advan- on the value of the random number produced by the
tagcs in terms of portability, making it possible to auxiliary gcncrator. l‘his initialization procedure guar-
rcproducc the results of a statistical simulation on dif- antccs that the most significant binary digits of the
fcrcnt computers. mantissa rcprescnting the k - th random number in
the scqucnccs obtained on two different computers
Cotnpagncr and I Ioogland [X] have recently analyzed with diffcrcnt mantissa lengths arc identical if both
the properties of the correlation functions of any order scqucnccs arc initial&d with the same scalar seed.
for maximal-length shift-register binary sequences. IJsing the above initialization strategy, identical
‘I’heir findings support the somehow intuitive statc- results, except for the last bit, arc obtained in single
mcnt that longer scqucnces, which are associated to precision on the IBM S/370 with a 24 bits mantissa
high dcgrcc gcncrating trinomials, have better random- and on the IRM RlSC/6000, with a 23 bits mantissa
ness propcrtics. An additional reason for using high (conforming to the ll?F,IY 754 standard where the
dcgrcc generating trinomials is that an efficient imple- most. significant bit is hidden).
tncntation of the shift-rcgistcr algorithm on vector
679
Vecforization of the shift-register Vector Facility the normalization takes about 30% of
the total time required to gcncrate a normalized
algorithm random number when using a highly tuned assembly
language implcmcntation of the algorithm, while it
The shift-register generator with p = 1279 and q = 2 16
takes about one half of the total time when using the
was implemented in FORTRAN on the IBM 3090 vectorizcd I;C)R’l’RAN code. As a result, the normal-
Vector Facility. The conjugate production rule has ized IY’IR7’RAN code products about 6.5 millions of
been adopted in order to get the maximum efftciency 32 bit normalized floating point numbers per second,
on pipelined vector computers with very long vector
while the production rate of unnormalized numbers is
registers. Apart from details concerning the exponent
about twice as large. ‘I’ablc 2 illustrates the peak pcr-
setting and normalization of the floating-point words
formance of the algorithm, measured in Mrands
receiving the generated random numbers, the (millions of pseudo-random values generated per
FORTRAN code kcrncl is very simple: second), for single and double precision, on the IBM
3090 model 6OOr-3 with a clock-period of 17.2
..... nanoseconds and on the IIJM RISC/6000 model 530
DO 1 K= . . . . with a clock-period of 40 nanoseconds. The pcrform-
..... ancc actually depends on the number N of pseudo-
X(K) = IEOR(X(K-P),X(K-P+Q)) random values to bc gcncrated in a single subroutine
.....
call, and the penk performance is approached when N
1 CONTINUE
becomes larger than the seed length p = 1279.
Ilowever, with a careful implementation it is possible
to achicvc a very good performance, on average, also
Although the above FORTRAN loop defines a
when the subroutine is consecutively calicd several
recursion it is easy to recognize that I consecutive itcr-
times with rctatively small values of N.
ations could be safely exccutcd in parallel, between
two synchronization steps, provided that the offsets p
and p - q of the feedback positions, with respect to
the position being generated, are larger than or equal T’able 2. Pcrformancc of the FORTRAN subroutines.
to I. Therefore, the loop can be vectorized on pipe- Precision Computer Mrands
lined vector-register computers with vector-size up to
p - q and also on pipelined architectures without
vector-registers, provided that the maximum length of
the pipeline, i.e. the maximum number of different
consecutive elements of the sequence being tempo-
rarily allocated within the pipeline, does not exceed
p- q. The choice that has been made for p and q is
such that up to 1063 new elements could be generated
in parallel between two synchronization steps (it may
be observed that on a vector processor this synchroni- Statistical tests
zation does not require any programming effort, as
The ability to reproduce the results of a statistical sim-
this is inherent in the completion of the vector
ulation is the main advantage of using dctcrministic
instruction).
computer algorithms for gcncmting pseudo-random
numbers. Pacudo-random number generators are
Different versions of the code may be used to
therefore cxtrcmcly useful in many diffcrcnt applica-
produced normalized and unnormalized numbers.
tion arcas like optimization theory and statistical
The USCof unnormalizcd floating point operands is
physics. At the same time, the pseudo-random
not allowed in S/370 multiply and divide vector
numbers gcncratcd by such algorithms may show
instructions. Nonetheless in many statistical simu-
departures from true randomness clue to the regularity
lations the random data are not involved in multipli-
intrinsic in the generating procedure.
cations but rather in compare type instructions for
which normalized and unnormalized formats can be
Assessing the statistical properties of a pseudo-random
equivalently used. A random number generator which
source is usually a diflicult task which may be
produces normal&d floating point numbers is usually
achieved by two complcmcntaly approaches. The
slower because the normalization is an extra operation
first approach uses empirical tests to compare the frc-
performed on data which have ahcady been generated
qucncy of occurrcncc of some pseudo-random events
in the unnormalized format. On the IBM 3090
680
with the probability of their occurrence computed on overlapping scgmcnts of Icngth d of the scqucnce, i.e.
the basis of an assumed theoretical distribution. In SIk= (XCkI)Xdl1, ... , XkXd) , thus requiring d x n numbers
the second approach one tries to derive the more rele- for sampling II positions.
vant parameters of the statistical distribution of the
pseudo-random values based on the actual form of the
generating algorithm. An example of the latter
approach in the investigation of maximal-length shift-
register sequences is found in the previously men-
tioned work of Compagncr and IIoogland [Q. Their
analysis shows that the relative number of correlation
functions of any order of a maximal length shift-
register sequence of order p that show a nonrandom I ,atticc dimension
behavior decreases as 2--p+They also report the results
of some empirical tests for the sequence with
t.p, q) = (1279,216) , upon which the present generator
is based, which show only very small deviations from
statistical uniformity in the observed distribution of
the binary digits.
‘l‘ahle 4. N-dimensional cquidist.rihution kst. Conscc- ‘I’ahlc 5. N-dimensional equidistrihution test. Conscc-
ul.ivc, partially overlapped, d-tuplcs. Values utivc, partially overlapped, d-tuplcs. Values
of the control variable b. of Ihc control variable h.
A) Muttiplicativc-congrucntiat gcncrator. A) Multipticativc-congrucntial generator.
- .-
Satnplc size
J,attice dimension
Sample I ,attice dimension (No= 5”x 16)
size
1 2 3
1 2 3 4 5
Nox 1 1.o -1.1 0.4
1 x 22” 1.4 2.3 0.2 -1.4 1.9 Nox 8 -2.5 3.1 -2.7
8 x 220 -4.9 4.3 -2.9 -4.x -4.1
Nox64 -1.5 1.2 -0.2
64 x 220 -35. 24. -23. -37. -24.
R) Shift-register gcncrator.
J!l) Shift-register generator.
1
S;\mplc size
J,atticc dimension
(No= 5”x 16)
Sample 1atticc dimension 2 3
size .--
No x-.-__
1 1.5 1.X 3.6
1 2 3 4 5 --- --_
1 x 22”
8 x 22”
64 x 22”
0.0
I.7
0.4
-0.4
-0.2
0.2
-0.2
-3.3
-1.1
-0.1
-0.9
-0.9
I.1
-0.2
-0.7
-7
-~_-
Nox 8
---.
No x 64
--
------
-1.0
3.0
-0.7
-1.5 -1.9
2.4
682
and finally followed by a last run of length 3 (1, 4, 5).
Thcsc arc caiiccl runs down and similarly one could
‘I‘ahlc 6. N-dirncnsional equidistribution test. Conscc-
ntive, partially overlapped, d-tupks. Values analyst runs up which are simply obtained by
of the control variable b. changing the inequality sign. It is not possible to
apply directly the 12 test to run lengths because of the
A) Multiplicative-congrcntiai generator. dependency of the consccutivc runs and more eiabo-
Sample size rate tests should hc applied (see, c.g. [2])+ On the
I Atice dimension other hand Knuth [2] suggests to simplify the
(No= 7”x 16)
problem, again using the x2 test, but “after throwing
I 2 3
away the elcmcnt which immecliatcly follows a run, so
Ni,x 1 -3.5 -3.8 -6.6 that when Xj is greater than Xj+l the new test is
started with &,2 and the run lengths are independent.
No x 8 -4.3 -9.1 -12.
Now the x2 can be applied and the probability of a
N,,x 64 -22. -25. -30. run of length r is easily shown to be i/r! - i/(r+ l)!
I3) Shift-register generator. [2]. This test has been applied to three large (- lo”,
- 8 x 10” , and - 64 x IO” ) vectors of random
Sample size numbers by allowing the recognition of runs down, up
Lattice dimension
(N,, = 7” x 16) to the improbable length 8 plus the tail of the distrib-
1 2 3 ution function. As in all previous cases all tests have
been repeated 40 times to strengthen the statistics.
Nox 1 0.5 -5.2 2.0 The run length tests wcrc passed well by both genera-
No x 8 2.7 -2.9 2.5 tors.
3 4 8
able 0. The results for intcrmediatc values oft arc not -0.1 2.7 -0.4 1.1 1.1
shown because they only reline a tendency already
shown by the table; same results are obtained for the -2.0 0.8 1.3 0.8 1.5
minimum oft.