Documentos de Académico
Documentos de Profesional
Documentos de Cultura
one test in two hundred, and 'Failed' pvalues should occur one
test in a million with the default thresholds - that's what p
MEANS. Use them at your Own Risk! Be Warned!
Or better yet, use the new -Y 1 and -Y 2 resolve ambiguity or
test to destruction modes above, comparing to similar runs on
one of the as-good-as-it-gets cryptographic generators, AES or
threefish.
DESCRIPTION
dieharder
Welcome to the current snapshot of the dieharder random number tester.
It encapsulates all of the Gnu Scientific Library (GSL) random number
generators (rngs) as well as a number of generators from the R
statistical library, hardware sources such as /dev/*random, "gold
standard" cryptographic quality generators (useful for testing
dieharder and for purposes of comparison to new generators) as well as
generators contributed by users or found in the literature into a
single harness that can time them and subject them to various tests for
randomness. These tests are variously drawn from George Marsaglia's
"Diehard battery of random number tests", the NIST Statistical Test
Suite, and again from other sources such as personal invention, user
contribution, other (open source) test suites, or the literature.
The primary point of dieharder is to make it easy to time and test
(pseudo)random number generators, including both software and hardware
rngs, with a fully open source tool. In addition to providing
"instant" access to testing of all built-in generators, users can
choose one of three ways to test their own random number generators or
sources: a unix pipe of a raw binary (presumed random) bitstream; a
file containing a (presumed random) raw binary bitstream or formatted
ascii uints or floats; and embedding your generator in dieharder's GSLcompatible rng harness and adding it to the list of built-in
generators. The stdin and file input methods are described below in
their own section, as is suggested "best practice" for newbies to
random number generator testing.
An important motivation for using dieharder is that the entire test
suite is fully Gnu Public License (GPL) open source code and hence
rather than being prohibited from "looking underneath the hood" all
users are openly encouraged to critically examine the dieharder code
for errors, add new tests or generators or user interfaces, or use it
freely as is to test their own favorite candidate rngs subject only to
the constraints of the GPL. As a result of its openness, literally
hundreds of improvements and bug fixes have been contributed by users
to date, resulting in a far stronger and more reliable test suite than
would have been possible with closed and locked down sources or even
open sources (such as STS) that lack the dynamical feedback mechanism
permitting corrections to be shared.
To apply only the diehard opso test to the AES_OFB generator, specify
the test by name or number:
dieharder -g 205 -d 5
or
dieharder -g 205 -d diehard_opso
Nearly every aspect or field in dieharder's output report format is
user-selectable by means of display option flags. In addition, the
field separator character can be selected by the user to make the
output particularly easy for them to parse (-c ' ') or import into a
spreadsheet (-c ','). Try:
dieharder -g 205 -d diehard_opso -c ',' -D test_name -D pvalues
to see an extremely terse, easy to import report or
dieharder -g 205 -d diehard_opso -c ' ' -D default -D histogram -D
description
to see a verbose report good for a "beginner" that includes a full
description of each test itself.
Finally, the dieharder binary is remarkably autodocumenting even if the
man page is not available. All users should try the following commands
to see what they do:
dieharder -h
(prints the command synopsis like the one above).
dieharder -a -h
dieharder -d 6 -h
(prints the test descriptions only for -a(ll) tests or for the specific
test indicated).
dieharder -l
(lists all known tests, including how reliable rgb thinks that they are
as things stand).
dieharder -g -1
(lists all known rngs).
dieharder -F
(lists all the currently known display/output control flags used with
-D).
Both beginners and experts should be aware that the assessment provided
by dieharder in its standard report should be regarded with great
suspicion. It is entirely possible for a generator to "pass" all tests
as far as their individual p-values are concerned and yet to fail
utterly when considering them all together. Similarly, it is probable
that a rng will at the very least show up as "weak" on 0, 1 or 2 tests
in a typical -a(ll) run, and may even "fail" 1 test one such run in 10
or so. To understand why this is so, it is necessary to understand
something of rng testing, p-values, and the null hypothesis!
P-VALUES AND THE NULL HYPOTHESIS
dieharder returns "p-values". To understand what a p-value is and how
to use it, it is essential to understand the null hypothesis, H0.
The null hypothesis for random number generator testing is "This
generator is a perfect random number generator, and for any choice of
seed produces a infinitely long, unique sequence of numbers that have
all the expected statistical properties of random numbers, to all
orders". Note well that we know that this hypothesis is technically
false for all software generators as they are periodic and do not have
the correct entropy content for this statement to ever be true.
However, many hardware generators fail a priori as well, as they
contain subtle bias or correlations due to the deterministic physics
that underlies them. Nature is often unpredictable but it is rarely
random and the two words don't (quite) mean the same thing!
The null hypothesis can be practically true, however. Both software
and hardware generators can be "random" enough that their sequences
cannot be distinguished from random ones, at least not easily or with
the available tools (including dieharder!) Hence the null hypothesis is
a practical, not a theoretically pure, statement.
To test H0 , one uses the rng in question to generate a sequence of
presumably random numbers. Using these numbers one can generate any
one of a wide range of test statistics -- empirically computed numbers
that are considered random samples that may or may not be covariant
subject to H0, depending on whether overlapping sequences of random
numbers are used to generate successive samples while generating the
statistic(s), drawn from a known distribution. From a knowledge of the
target distribution of the statistic(s) and the associated cumulative
distribution function (CDF) and the empirical value of the randomly
generated statistic(s), one can read off the probability of obtaining
the empirical result if the sequence was truly random, that is, if the
null hypothesis is true and the generator in question is a "good"
random number generator! This probability is the "p-value" for the
particular test run.
For example, to test a coin (or a sequence of bits) we might simply
count the number of heads and tails in a very long string of flips. If
we assume that the coin is a "perfect coin", we expect the number of
heads and tails to be binomially distributed and can easily compute the
probability of getting any particular number of heads and tails. If we
compare our recorded number of heads and tails from the test series to
this distribution and find that the probability of getting the count we
obtained is very low with, say, way more heads than tails we'd suspect
the coin wasn't a perfect coin. dieharder applies this very test (made
mathematically precise) and many others that operate on this same
principle to the string of random bits produced by the rng being tested
to provide a picture of how "random" the rng is.
Note that the usual dogma is that if the p-value is low -- typically
less than 0.05 -- one "rejects" the null hypothesis. In a word, it is
improbable that one would get the result obtained if the generator is a
good one. If it is any other value, one does not "accept" the
generator as good, one "fails to reject" the generator as bad for this
particular test. A "good random number generator" is hence one that we
haven't been able to make fail yet!
This criterion is, of course, naive in the extreme and cannot be used
with dieharder! It makes just as much sense to reject a generator that
has p-values of 0.95 or more! Both of these p-value ranges are equally
unlikely on any given test run, and should be returned for (on average)
5% of all test runs by a perfect random number generator. A generator
that fails to produce p-values less than 0.05 5% of the time it is
tested with different seeds is a bad random number generator, one that
fails the test of the null hypothesis. Since dieharder returns over
100 pvalues by default per test, one would expect any perfectly good
rng to "fail" such a naive test around five times by this criterion in
a single dieharder run!
The p-values themselves, as it turns out, are test statistics! By
their nature, p-values should be uniformly distributed on the range
0-1. In 100+ test runs with independent seeds, one should not be
surprised to obtain 0, 1, 2, or even (rarely) 3 p-values less than
0.01. On the other hand obtaining 7 p-values in the range 0.24-0.25,
or seeing that 70 of the p-values are greater than 0.5 should make the
generator highly suspect! How can a user determine when a test is
producing "too many" of any particular value range for p? Or too few?
Dieharder does it for you, automatically. One can in fact convert a
set of p-values into a p-value by comparing their distribution to the
expected one, using a Kolmogorov-Smirnov test against the expected
uniform distribution of p.
These p-values obtained from looking at the distribution of p-values
should in turn be uniformly distributed and could in principle be
subjected to still more KS tests in aggregate. The distribution of pvalues for a good generator should be idempotent, even across different
test statistics and multiple runs.
FILE INPUT
The simplest way to use dieharder with an external generator that
produces raw binary (presumed random) bits is to pipe the raw binary
output from this generator (presumed to be a binary stream of 32 bit
unsigned integers) directly into dieharder, e.g.:
cat /dev/urandom | ./dieharder -a -g 200
Go ahead and try this example. It will run the entire dieharder suite
of tests on the stream produced by the linux built-in generator
/dev/urandom (using /dev/random is not recommended as it is too slow to
test in a reasonable amount of time).
Alternatively, dieharder can be used to test files of numbers produced
by a candidate random number generators:
dieharder -a -g 201 -f random.org_bin
for raw binary input or
dieharder -a -g 202 -f random.org.txt
for formatted ascii input.
A formatted ascii input file can accept either uints (integers in the
range 0 to 2^31-1, one per line) or decimal uniform deviates with at
least ten significant digits (that can be multiplied by UINT_MAX = 2^32
to produce a uint without dropping precition), also one per line.
Floats with fewer digits will almost certainly fail bitlevel tests,
although they may pass some of the tests that act on uniform deviates.
Finally, one can fairly easily wrap any generator in the same (GSL)
random number harness used internally by dieharder and simply test it
the same way one would any other internal generator recognized by
dieharder. This is strongly recommended where it is possible, because
dieharder needs to use a lot of random numbers to thoroughly test a
generator. A built in generator can simply let dieharder determine how
many it needs and generate them on demand, where a file that is too
small will "rewind" and render the test results where a rewind occurs
suspect.
Note well that file input rands are delivered to the tests on demand,
but if the test needs more than are available it simply rewinds the
file and cycles through it again, and again, and again as needed.
Obviously this significantly reduces the sample space and can lead to
completely incorrect results for the p-value histograms unless there
are enough rands to run EACH test without repetition (it is harmless to
reuse the sequence for different tests). Let the user beware!
BEST PRACTICE
A frequently asked question from new users wishing to test a generator
they are working on for fun or profit (or both) is "How should I get
its output into dieharder?" This is a nontrivial question, as
dieharder consumes enormous numbers of random numbers in a full test
cycle, and then there are features like -m 10 or -m 100 that let one
effortlessly demand 10 or 100 times as many to stress a new generator
even more.
Even with large file support in dieharder, it is difficult to provide
enough random numbers in a file to really make dieharder happy. It is
therefore strongly suggested that you either:
a) Edit the output stage of your random number generator and get it to
write its production to stdout as a random bit stream -- basically
create 32 bit unsigned random integers and write them directly to
stdout as e.g. char data or raw binary. Note that this is not the same
as writing raw floating point numbers (that will not be random at all
as a bitstream) and that "endianness" of the uints should not matter
for the null hypothesis of a "good" generator, as random bytes are
random in any order. Crank the generator and feed this stream to
dieharder in a pipe as described above.
b) Use the samples of GSL-wrapped dieharder rngs to similarly wrap your
generator (or calls to your generator's hardware interface). Follow
the examples in the ./dieharder source directory to add it as a "user"
generator in the command line interface, rebuild, and invoke the
generator as a "native" dieharder generator (it should appear in the
list produced by -g -1 when done correctly). The advantage of doing it
this way is that you can then (if your new generator is highly
successful) contribute it back to the dieharder project if you wish!
Not to mention the fact that it makes testing it very easy.
Most users will probably go with option a) at least initially, but be
aware that b) is probably easier than you think. The dieharder
maintainers may be able to give you a hand with it if you get into
trouble, but no promises.
WARNING!
A warning for those who are testing files of random numbers. dieharder
is a tool that tests random number generators, not files of random
numbers! It is extremely inappropriate to try to "certify" a file of
random numbers as being random just because it fails to "fail" any of
the dieharder tests in e.g. a dieharder -a run. To put it bluntly, if
one rejects all such files that fail any test at the 0.05 level (or any
other), the one thing one can be certain of is that the files in
question are not random, as a truly random sequence would fail any
given test at the 0.05 level 5% of the time!
To put it another way, any file of numbers produced by a generator that
"fails to fail" the dieharder suite should be considered "random", even
if it contains sequences that might well "fail" any given test at some
specific cutoff. One has to presume that passing the broader tests of
the generator itself, it was determined that the p-values for the test
involved was globally correctly distributed, so that e.g. failure at
the 0.01 level occurs neither more nor less than 1% of the time, on
average, over many many tests. If one particular file generates a
failure at this level, one can therefore safely presume that it is a
random file pulled from many thousands of similar files the generator
might create that have the correct distribution of p-values at all
levels of testing and aggregation.
To sum up, use dieharder to validate your generator (via input from
files or an embedded stream). Then by all means use your generator to
produce files or streams of random numbers. Do not use dieharder as an
accept/reject tool to validate the files themselves!
EXAMPLES
To demonstrate all tests, run on the default GSL rng, enter:
dieharder -a
To demonstrate a test of an external generator of a raw binary stream
of bits, use the stdin (raw) interface:
cat /dev/urandom | dieharder -g 200 -a
To use it with an ascii formatted file:
dieharder -g 202 -f testrands.txt -a
(testrands.txt should consist of a header such as:
#==================================================================
# generator mt19937_1999 seed = 1274511046
#==================================================================
type: d
count: 100000
numbit: 32
3129711816
85411969
2545911541
etc.).
To use it with a binary file
dieharder -g 201 -f testrands.bin -a
or
cat testrands.bin | dieharder -g 200 -a
suites. This is especially true where he has seen fit to modify those
tests from their strict original descriptions.
COPYRIGHT
GPL 2b; see the file COPYING that accompanies the source of this
program. This is the "standard Gnu General Public License version 2 or
any later version", with the one minor (humorous) "Beverage"
modification listed below. Note that this modification is probably not
legally defensible and can be followed really pretty much according to
the honor rule.
As to my personal preferences in beverages, red wine is great, beer is
delightful, and Coca Cola or coffee or tea or even milk acceptable to
those who for religious or personal reasons wish to avoid stressing my
liver.
The Beverage Modification to the GPL:
Any satisfied user of this software shall, upon meeting the primary
author(s) of this software for the first time under the appropriate
circumstances, offer to buy him or her or them a beverage. This
beverage may or may not be alcoholic, depending on the personal ethical
and moral views of the offerer. The beverage cost need not exceed one
U.S. dollar (although it certainly may at the whim of the offerer:-)
and may be accepted or declined with no further obligation on the part
of the offerer. It is not necessary to repeat the offer after the
first meeting, but it can't hurt...