Está en la página 1de 35

SOURCE: http://exploringdata.

net/

Welcome to the Exploring Data website. This website provides curriculum support materials for teachers of
introductory statistics.

TABLE OF CONTENTS
Read Me First!
What's here, and how to find it. Also copyright information.
Introduction to Exploring Data
Statisticians do it! And so should you.
Looking for Patterns
The most valuable feature of a dataset may be that which is unexpected.
Looking at the data in a variety of ways may reveal interesting and surprising
patterns.
Stemplots
All you need to know about this useful graphical display. Activities, worksheets
and extension material are available from this page.
Dotplots
Dotplots are often the neglected cousin in the family of graphical displays of
data. But they are easy to construct and can tell us much about a dataset.
Histograms
Histograms are very useful, but care needs to be taken in constructing and
interpreting them. Remarkably, research about histograms is still being
conducted.
Measures of Location
So you think the mean, median and mode are boring? Well, maybe, but there
are some interesting little side alleys to this topic that are worth exploring.
Measures of Spread
Visit here, and you will learn about some vary useful statistics.
Boxplots
Visit this page and you may possibly learn more than you ever wanted to know
about boxplots.
Normal Plots
Not included in Queensland syllabuses, a normal plot of a dataset shows at a
glance if a dataset is approximately normally distributed. A very useful display
to put in your display cabinet.
Scatterplots
When working with bi-variate data, the absolutely first thing a statistician does
is construct a scatterplot. So the absolutely first thing a student working with bi-
variate data should do is construct a scatterplot.
Assessment
Some nice assignments and test questions are available here.
Datasets
Students should work with real data. So here is some, available in tab-delimited,
Excel 4.0 and NCSS 6.0 Jr formats.
Linear Regression
Linear regression has the potential to integrate the topics of Introduction to
Function and Applied Statistical Analysis in Maths B. This page explains how it
can be done.
Normal Distribution
Discover the link between the 1.5*IQR Rule and the normal distribution, and
why you should use light bulbs to burn those traditional statistics textbooks.
Probability
Probability is a wonderful subject to teach! There are so many activities for
teaching concepts, puzzles and problems with non-intuitive answers and a
variety of contexts for the exercises. This page contains a small collection.
Sampling
Contains a nice little activity - the JellyBlubbers - which was modified from a
problem in a leading textbook which was modified from an activity in Activity
Based Statistics. And the original activity started as a bucket of rocks on the
desk of a statistics teacher. An activity with impeccable lineage.
Confidence Intervals
Are you 95% confident that you can correctly teach your students the correct
meaning of confidence intervals?
Hypothesis Testing
Teaching students to understand hypothesis testing is a difficult business indeed.
This page contains some activities that will give students some hands-on
experiences with the underlying concepts.
Curve Fitting
Contains Anscombe's famous dataset, and a comprehensive manual on using
technology to fit functions to data.
Read This First!

This website is an outcome of the 1997 Raybould Tutorial Fellowship. Each year the Raybould Fellow spends
the second semester on a project to support senior secondary mathematics in Queensland. For my project I have
chosen to provide curriculum support for the topic of Exploring Data and have chosen a website rather than a
booklet as my mode of publication.

Resources

This website contains activities, worksheets, overhead transparency masters, datasets and assessment to support
data exploration. It also contains an extensive collection of articles designed to enhance the statistics knowledge
of the teacher. There is a resources page that gives a select list of the finest resources available to support
introductory statistics, including texts, websites, datasets, java applets and mailing lists.

To make the resources accessible to a wide range of people, the majority of the resources are available as web

pages and as Word 2.0 documents. A web page can be accessed by pointing to the   image while the 
image leads to a Word 2.0 document.

FYI

Many secondary mathematics teachers in Queensland completed their formal study of statistics years ago, or
have never studied statistics. Their knowledge of modern statistics may not extend much beyond what is in the
high school textbooks and thus they may feel uncomfortable when a more inquisitive student wants to know
more about a topic than what is in the text.

To assist these teachers, many pages of this website include a section headed FYI (For Your Information) that
contains articles that discuss topics beyond those listed in the syllabuses. Many of the articles originated from
discussions between the knowledgeable statistics educators who populate two Internet mailing lists devoted to
statistics education.

Datasets

Datasets will be available in three formats:

Tab delimited
This is a format that any spreadsheet or statistics program should be able to read. This format is also suitable for
uploading datasets to the TI graphics calculators.

Excel 4.0
Excel is widely used in Queensland high schools.

NCSS Jr. 6.0


NCSS is a professional statistics program written by Jerry Hinze. NCSS Jr. 6.0 is a 'light' version of the full
NCSS 6.0 program. It provides excellent software support for all of the statistics in Mathematics A, B and C. It
has an Excel-like interface, is easy to use and is absolutely free! Click here and follow the link to the downloads
page to download it now.
Copyright

Most of the resources on this site are the property of Education Queensland. You are welcome to use the
resources on this site freely for educational purposes if you acknowledge Education Queensland as the owner of
the resource. You are not permitted to sell these materials for commercial gain without the express written
consent of Education Queensland.

If the resource you wish to use contains an acknowledgement (e.g. a dataset from another source), then you
should acknowledge that source also.

Exceptions to this are material that is owned by another person or organisation and for which permission has
been granted for use on this site. If you wish to use these resources you must seek permission of the owner of
the copyright.

Introduction to Exploring Data
Statistics is a fascinating subject, to both learn and teach! It is also an important one, as we are bombarded with
statistics every day of our lives. Knowledge in the subject allows us to make informed judgements about the
statistics presented by others to persuade us.

As teachers we need to give our students an understanding of the place of statistics in modern society, an
interest in the subject and a solid grounding for further study. It is worth noting that at the tertiary level more
students study statistics than study calculus subjects.

Setting the Scene


(from How to Make Statistics Boring)

Ah! I’ll bet some of you thought, ‘Hey, statistics is already boring, it doesn’t need to be made boring." Given
the sort of statistics to which we’ve been subjecting ourselves and our students over the years, such an attitude
would not be surprising. Consider the following exercise on constructing a boxplot, which is from a popular
Math A text.

Construct a box-and-whisker graph for the following data which are the masses in kilograms of nine Year-11
girls: 
35 47 48 50 51 53 54 70 75

This was chosen only because it is a typical example of the statistics that many of us are teaching our students. I
am not picking on this particular textbook. All of the Maths A and B texts that I have examined are loaded with
similar examples. If this exercise doesn’t convince you that statistics can be boring, there are many more where
this came from.

Actually, boring is not the most important issue, despite the title. There are other things wrong with this
exercise, other than the fact that it is boring. It is trivial. It is pointless. The data are fake.
Source: Boggs, R., (1996). How to Make Statisics Boring, Teaching Mathematics, QAMT, Brisbane.

Themes

One of the first tasks of the statistician when analysing a set of data is viewing the data in a variety of ways,
both graphically and numerically, looking for intriguing patterns, unusual observations and the general
characteristics of the dataset. This aspect of statistics is a focus of this website.

This website has four underlying themes:

 Data should be central to the study of statistics, and our students should study real problems with real
data.

 Computers and graphing calculators have an essential role in statistics as they excel at drawing graphs
and doing calculations. Students should be concentrating on the underlying concepts, looking for
patterns and notable features in data, learning to make appropriate decisions on the choice of summary
statistics and analyses, and justifying these decisions.

 Students need to be actively involved in the study of statistics. Whenever possible, a collaborative,
activity-based approach to statistics should be used.

 Teachers must know more statistics than that outlined in the syllabus or contained in the textbook. This
website has extensive extension material which can give the teacher a broader background to the subject.

Statistics and Chocolate

One of my favourite Internet publications is ZiMaths, which is published by the University of Zimbabwe and
which is meant to be 'a device that would educate Zimbabwean schoolkids about the virtues of real maths, as
opposed to the spiceless variety taught in most schools'. An article that threads it way through the first three
issues follows a bar of chocolate from its raw materials to its marketing, noting how statistics is used in
assisting the process. It gives a nice real-world example of queueing, process control, sampling, calibrating
machinery, surveying and forcasting.

And I'm the Point 3!

Jane Watson from the University of Tasmania talks on the Australian Broadcasting Corporation's Radio
National program about the need for statistical literacy in the Australian community.

From the AP-Statistics Guidebook

I thought this was nicely written, so I will share it with you.

Exploratory analysis of data makes use of graphical and numerical techniques to study patterns and departures
from patterns. In examining distributions of data, students should be able to detect important characteristics,
such as shape, location, variability, and unusual values. From careful observations of patterns in data, students
can generate conjectures about relationships among variables. The notion of how one variable may be
associated with another permeates almost all of statistics, from simple comparisons of proportions through
linear regression. The difference between association and causation must accompany this conceptual
development throughout.
Looking For Patterns
On January 28, 1986 the space shuttle Challenger exploded. Seven astronauts died because two rubber O-rings
leaked during takeoff. These rings had lost their resiliency because of the low temperature at the time of the
flight. The air temperature was about 0 0 Celsius, and the temperature of the O-rings about 6 degrees below that.

The link between O-ring damage and ambient temperature had been established prior to the flight. The
engineers at Morton Thiokol, Inc had recommended that the flight be delayed, but they failed to display the link
between ambient temperature and O-ring damage in a clear and unambiguous fashion to those making the final
decision. The launch proceeded with disastrous consequences.

A simple scatterplot showing the link between O-ring damage and ambient temperature during previous
launches may have changed the decision about launching. How much damage would you have expected at
0 0Celsius?

adapted from: Tufte, E.R., Visual Explanations

An Unusual Incident

High school textbooks often start a unit on statistics with the calculation of the mean of a set of numbers
(usually made up, and put into a trivial setting, but that is a different issue). For a number of reasons, that is the
wrong place to start an analysis of data. And it certainly isn’t what a statistician does.

After gathering data, statisticians like to look at the data in as many ways as possible. Any unusual or
interesting patterns in the data should be flagged for further investigation. An Unusual Incident is an engaging
activity to introduce a unit on statistics. Students are asked to discover the many intriguing patterns in the data
and hence to deduce what was the incident.

Weather Data for Capital Cities


Weather Data for Queensland Towns

These are follow-up activities to An Unusual Incident. Students are asked to match each city or town with its
climate chart. For each location the mean monthly maximum and minimum temperatures and the median
monthly rainfall are given in graphical form. The Queensland Towns activity also contains the original data in
table format. Due to the large number of graphics these activities are only available as Word 2.0 documents.

The Six Characteristics of a Dataset


From the Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/
© Education Queensland, 1997

The Six Characteristics of a Dataset


Once some data have been gathered, the first step in working with the data is to look at it in a
variety of ways. These six characteristics of a dataset are a good starting point in analysing a
dataset, although a fuller analysis extends to looking for unexpected anomalies and patterns in the
data. For example, in the Metric dataset which consists of forty-four estimates of the width of a
lecture hall, there are a large number of estimates of 10 m and 15 m. This is almost certainly due to
subjects rounding off their estimate and is a feature more of our number system than the size of the
hall.

The Old Faithful Dataset

The Old Faithful dataset has some interesting features and hence will be the example used in this
article. Old Faithful is a geyser in Yellowstone National Park in Wyoming. The graphical displays
below are based on 222 measurements of the duration of the geyser, in minutes.

Shape

The shape of a dataset will be the main factor is determining which set of summary statistics best summarises
the dataset, so it should be the first characteristic to be noted. Shape is commonly categorised as symmetric,
left-skewed or right-skewed, and as uni-modal, bi-modal or multi-modal.
The shape of the Old Faithful dataset is bi-modal. Note that both the histogram and the dotplot do a good job of
showing this while the boxplot doesn’t indicate this at all.

Location

Statisticians often use the term 'location' for what Queensland texts often call the ‘measure of central tendency’.
'Location' is both simpler and more descriptive than 'measure of central tendency', so it is the term I've adopted
for this website.

When initially examining a dataset only an approximate location is needed, often just estimated by eye. After
further analysis the choice of measure of location should become clearer. Common measures of location are the
mean and median. Less common measures of location are the mode (the most frequent value), the mid-range
(the value midway between the minimum and maximum values) and the truncated mean (where a fixed
percentage of the largest and smallest scores are deleted from the dataset and the mean of the remaining data is
calculated)

For the Old Faithful dataset, I would say none of these are a good measure of location! The mean is 3.6 and the
median is 4, and it is fairly obvious that these values tell us very little about the data. A more sophisticated
description of location would be to say that the data is bi-modal with one peak about 2 and the other about 4.5.
If more accurate values are wanted then the dataset could be broken into two sections and the mean or median
of each section calculated independently.

This example illustrates an important point - blindly following a procedure will not always give the best results.
Looking at the data and using judgement about how to describe the location of the data are needed.

Spread

This is a measure of the amount of variation in the data. Again, an approximate value is sufficient initially, with
the choice of measure of spread being informed by the shape of the data, and its intended use. Common
measures of spread are variance, standard deviation and the interquartile range. Less commonly used is the
range, as it is not very robust.

For the Old Faithful dataset the standard deviation doesn’t give a good picture of the spread of the data, as it
usually is used when the data be approximately normally distributed, or at least uni-modal and reasonably
symmetric. The interquartile range again is unsatisfactory as it doesn’t give a true picture of how the data is
distributed. Probably the best description of the spread would be found by dividing this dataset in two sections,
and discussing the spread of each section. Either the standard deviation or the interquartile range could be used,
depending on which measure of location was chosen.

Outliers

Outliers are data values that lie away from the general cluster of other data values. Each outlier needs to be
examined to determine if it represents a possible value from the population being studied, in which case it
should be retained, or if it is non-representative (or an error) in which case it can be excluded. It may be that an
outlier is the most important feature of a dataset. There is a true story that the ozone hole above the South Pole
had been detected by a satellite years before it was detected by ground-based observations, but the values were
tossed out by a computer program because they were smaller than thought possible. Read the Ozone and
Outliers article at this website to learn more about this fascinating story.
The best choice of display when looking for outliers is the boxplot. A glance at the boxplot of the Old Faithful
dataset shows that this dataset contains no outliers. Note that the three displays complement each other in the
information they provide about the data. One strong argument for the need to use computers and graphing
calculators when studying statistics is the necessity of viewing the data in a variety of ways. Without technology
to draw the graphs this would be impossible to do efficiently.

Clustering

Clustering implies that the data tends to bunch up around certain values, eg annual wages for a factory may
cluster around $20 000 for unskilled factory workers, $35 000 for tradespersons and

$50 000 for management. Clustering shows up most clearly on a dotplot.

The Old Faithful dataset shows two clusters centred around 2 minutes and 4.5 minutes.

Granularity

Granularity implies that only certain discrete values are allowed, eg a company may only pay salaries in
multiples of $1,000. A dotplot shows granularity as stacks of dots separated by gaps. By default, discrete data
has some granularity as only certain values are possible. Continuous data can show granularity if the data is
rounded.

The Old Faithful dataset shows evidence of granularity. By examining the original data it becomes clear that
this is the result of the data being rounded to one decimal place and is not a feature of the data itself.

Other Features

With the availability of computers and low cost statistics software it is possible to calculate summary statistics
and generate graphical displays very rapidly. The choice of bin width of a histogram can markedly alter the
apparent shape of the data, especially if the data is not uni-modal. As they are so quick to generate, it may be
worth our while looking at some alternative histograms to see what they show.
The choice of bin width (and hence the number of bins) does change the appearance of the histogram. Which
one ‘best’ gives a true picture of the data is subjective. It is a worthwhile exercise to give students a dataset that
is not unimodal and ask them to choose the best histogram and then defend their decision.

Stemplots
The purpose of displaying data graphically is to give a visual display of the interesting and important features of
the dataset. Which particular displays are best is not a question that can necessarily be answered before the data
is viewed, hence a statistican will view the data in different ways.

A stemplot shows the shape of the distribution and indicates whether there are potential outliers. Constructing a
stemplot is often the first step in analysing a dataset, and helps to determine what analysis is appropriate.

Stemplots are useful for displaying small datasets with only positive values. They are also appropriate if it is
important to retain the original data. They are quicker and easier to construct by hand than histograms.

The choices of ‘bin width’ are limited, so for some datasets that otherwise meet the criteria, a stemplot may not
be very useful. The bins may be too large or too small to properly display the distribution of the data. For these
datasets, a histogram is preferable.

How to displaying a particular dataset with a stemplot often requires judgement. How to split the stems, how to
represent outliers and whether to truncate the data are decisions that often have to be made. The main point is
that the plot should quickly inform us about the salient features of the dataset.

Once a stemplot is constructed, students should consider these questions:

What is the location of the data?

How much is the data spread out?

What is its overall shape; is it symmetric, skew or bi-modal?

Are there any unusual data values such as outliers?


Is there evidence of clustering?

Students should also note any unusual features of the dataset not highlighted by the above questions. For
example one row may have many more elements in it than the other rows. The student might ask, ‘Is this a
random occurance or is it a relevant feature of this dataset?’ Often answering such questions isn’t easy.

Constructing a Stemplot

If you have never constructed a stemplot, visit the webpage Greed!. It is a nice activity where amongst other
things students will learn how to construct a stemplot.

Advanced Stemplots

Some might say 'advanced stemplots' is an oxymoron. However with some datasets it may be necessary to split
the stems, and with others to truncate the data. There is also the decision on what to do with outliers - do we
include a large number of empty rows between an outlier and its nearest value? Finally how do we handle very
large and very small numbers? The worked solutions to the worksheet Advanced Stemplotsillustrates some
common practices.
From the Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/
© Education Queensland, 1997

Advanced Stemplots
‘Advanced stemplots’ is really a contradiction - stemplots by their nature should be simple to
construct. Nonetheless, there may be times when a stemplot is desired and constructing it involves
a greater effort than usual.

An advanced stemplot includes one or more of these features:

Split stems One purpose of a stemplot is to display the shape of the distribution. To achieve
a satisfactory display of some datasets, the stem is best split into two parts, eg.
with one part containing unit values from 0 to 4 and the other part from 5 to 9.
Other datasets may benefit if the stem is split into five parts: 0-1, 2-3, 4-5, 6-7,
8-9.

Truncated For data with a large number of significant digits, it is common to decide how
data values many digits are needed and then truncate the data. Not much is lost by doing
this, as the essence of the original data is still retained. Data is truncated rather
than rounded as it is easier to do.

Outliers Imagine a dataset that contains an extreme outlier. It isn't sensible to extend the
stem to include the outlier, which means including row after row of empty
stems. Most computer-generated stemplots display the outlier as a data value
outside of the stemplot proper, at the top or bottom of the stemplot as
appropriate, and labelled as HIGH or LOW. It is a matter of judgement when to
adopt this approach.
Scaling If the values to be plotted are extremely large or extremely small the data has to
be scaled, by multiplying or dividing by a power of 10. For example NCSS Jr.
scales the data to remove decimal points.

Three Examples
The Density of the Earth Dataset
In 1798, Henry Cavendish measured the density of the earth using an instrument called a torsion
balance. While the density of the earth is obviously not uniform, the value of the mean density is
important in determining the earth’s composition. The units are grams / cm 3.

Density Measurements 

5.5 5.57 5.42 5.61 5.53 5.47 4.88 5.62 5.63 4.07 5.29 5.34
5.26 5.44 5.46 5.55 5.34 5.3 5.36 5.79 5.75 5.29 5.1 5.86
5.58 5.27 5.85 5.65 5.39              

Here is the NCSS 6.0 Jr stemplot of this display along with some comments.

Stem-Leaf Plot Section of Density Comments


Depth Stem | Leaves
The Depth column records the cumulative
  Low | 407
number of data values, counting in from each
2 48 | 8 end. The entry in brackets locates the row
2 49 |   that contains the median, and gives the
number of entries in that row.
2 50 |  
3 51 | 0 NCSS has chosen a two-digit stem with
7 52 | 6 7 9 9 single digit leaves.
12 53 | 0 4 4 6 9
The outlier is labelled as ‘Low’ and the entire
(4) 54 | 2 4 6 7 value (407, representing 4.07) is given in the
13 55 | 0 3 5 7 8 'Leaves' column.
8 56 | 1 2 3 5
The scale is given at the bottom. For this
4 57 | 5 9 dataset NCSS multiplies each value by 100 to
2 58 | 5 6 remove the decimal point. For example, 54 |
Unit = .01 Example: 1 |2 Represents 0.12 2 represents a value of 542. Multiply this by
the unit (.01) to return the original value: 
542 x .01 = 5.42
The Metric Dataset

Shortly after metric units were introduced in Australia, a group of 44 students was asked to guess,
to the nearest metre, the width of the lecture hall in which they were sitting. The true width of the
hall was 13.1 metres.

Guesses (Metres) 

8 9 10 10 10 10 10 10 11 11 11 11 12 12 13 13 13
14 14 14 15 15 15 15 15 15 15 15 16 16 16 17 17  
17 17 18 18 20 22 25 27 35 38 40            

Stem-Leaf Plot Section of Guess Comments


Depth Stem   Leaves
To achieve the best display, NCSS Jr has
2 . | 89
split the stems into five parts. The labels used
12 1* | 0 0 0 0 0 0 1 1 1 1 are as follows:
17 T | 22333 '*’ represents 0-1
‘T’ represents 2-3
(11) F | 44455555555
‘F’ represents 4-5
16 S | 6667777 ‘S’ represents 6-7, and
9 . | 88 ‘.’ represents 8-9. 
This is a common method of splitting stems.
7 2* | 0
6 T | 2 The data exhibits two peaks, which are due to
5 F | 5 students choosing 10 and 15 more often than
  High | 27, 35, 38, 40 numbers near to those. It is a reflection of our
Unit = 1 Example: 1 |2 Represents 12
number system and the rounding inherent in
estimation. Since the original data are
retained, the reason for the two peaks can be
determined from the stemplot.

There are four high outliers which are given


in the 'High' row at the end of the stemplot.

The Fleas Dataset 

Researchers at the Purdue University School of Veterinary Medicine deposited 25 female and 10
male fleas in the fur of a cat, in order to study the egg production of the flea. The number of eggs
produced by the fleas over 27 consecutive days is given below.

Source: Introduction to the Practice of Statistics, David Moore and George McCabe, p.27
Day No. of eggs Day No of eggs
1 436 15 550
2 495 16 487
3 575 17 585
4 444 18 549
5 754 19 475
6 915 20 435
7 945 21 523
8 655 22 390
9 782 23 425
10 704 24 415
11 590 25 450
12 411 26 395
13 547 27 405
14 584    

Stem-Leaf Plot Section of No_Fleas Comments


Depth Stem   Leaves
The stemplot shows the data has two clusters.
2 3 | 99
9 4 | 0112334 The last digit of the data was truncated.
13 4 | 5789 While some detail from the original data is
lost, the stemplot is easier to read.
(3) 5 | 244
11 5 | 57899 This stemplot shows an alternative method of
6 6 |   displaying split stems. Instead of using the
6 6 | 5 symbols '*' and '.' , the leading digit was
repeated. Both methods are common, though
5 7 | 0 the '*' and '.' are traditional.
4 7 | 58
  High | 91, 94 The two high outliers are listed at the bottom
Unit = 10 Example: 1 |2 Represents 120
of the stemplot to two significant figures.

As the data was gathered over time a timeplot


should also be constructed as this shows how
the number of eggs changed over time.
STEPS

The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics,
including the graphical display of data. Visit the STEPS page for further information and a list of the modules
available.

Dotplots
A traditional dotplot resembles a stemplot lying on its back, with dots replacing the values on the leaves. It does
a good job of displaying the shape, location and spread of the distribution, as well as showing evidence of
clusters, granularity and outliers. For smallish datasets a dotplot is easy to construct, so the dotplot is a
particularly valuable tool for the statistics student who is working without technology.

Here is an assessment item from a test by Al Coons' website to illustrate these features. His website supports AP
(Advanced Placement) Statistics, a course designed to give successful high school students university credit for
introductory statistics.

Two machines, C1 and C2, are making pins which must have a diameter of 8 cm ±
.01 cm or they are rejected. Dotplots of 50 pins from each machine are displayed
below. They are both on the same scale.

1. By simply looking at the dotplots, i.e. without doing any calculations or


counting, compare C1 and C2 in light of "the six features that are often of
interest when analyzing a distribution of data. - centre, variation,
symmetry, outliers, clustering and granularity.
2. In what sense is machine C1 ‘better’ at producing pins? Justify your
argument.
3. In what sense is machine C2 ‘better’ at producing pins? Justify your
argument.
An Alternative Method of Constructing a Dotplot

Here is a dotplot from NCSS 97 of the time between eruptions from the Old Faithful dataset. As there are over
two hundred data values it would not have been feasible to use a more traditional dotplot.

This plot displays the scale along a vertical axis. The value of each dot is given by its vertical component. The
horizontal component is randomised so that not all points are plotted at exactly the same location. The darker
points represent two or more values plotted at the same location

Which charactistics of the dataset does this dotplot highlight? This dotplot shows that the data is bimodal, and
gives a good feel for the spread of the data. There is some granularity evident, and there are no outliers. This
type of dotplot doesn’t give a good feel for the shape of the distribution of the data or allow the student to
accurately estimate the location of the centre.

For many real datasets a single type of display doesn’t suffice, but each display adds to the overall picture that
we are trying to form. Access to statistics software is vital if the student is to generate these displays without
getting bogged down in this stage of the analysis.

Histograms

As a teacher of junior maths and Maths in Society, I used to think that a histogram was a rather trivial statistical
object, sort of a bar graph with the gaps removed to save space. I never realised that statisticians actually find
histograms to be useful!
A modern data-centred approach to statistics starts with viewing the data in a variety of ways. What is meant by
viewing the data? Features of interest to a statistician are the overall shape of the data, symmetry, the location
and the spread, existence of outliers and evidence of clusters or gaps. A histogram with a scale on the horizontal
axis is generally useful for showing all of these features, though for some distributions the features of a dataset
can be disguised or distorted due to a particular choice of bin width.

One application of the humble histogram is determining if a set of data is approximately normally distributed,
though a histogram is most effective with for this purpose if the dataset is large. Normality is a pre-condition for
certain analyses of data, including many hypothesis tests. While there are formal tests of normality, often a
quick look at a histogram of the data is sufficient. And no statistican would rely strictly on formal tests without
viewing the data also.

STEPS

The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics,
including the graphical display of data. Visit the STEPS page for further information and a list of the modules
available.

Histograms and Stemplots Compared

A histogram shows much the same information as a stemplot, though for a given dataset one or the other of
these methods of displaying the data may be preferable. Some points to note:

 Histograms are preferable for larger datasets as stemplots become unwieldy;


 With histograms, the original data are usually lost;
 The choice of bin size or number of bins is not restricted, unlike the stemplot;
 Histograms take more time than a stemplot to construct by hand; therefore stemplots are preferable for a
small dataset.

Matching Histograms and Boxplots


Matching Histograms and Summary Statistics

Students will improve their ability to interpret the information given in a boxplot by matching boxplots of
sample data drawn from different distributions with their associated histograms.

Students will improve their ability to visualise the shape of a distribution given the summary statistics.

Bin Width

Statistics computer programs and graphical calculators will generate a default histogram if bin width or the
number of bins is not specified. It is interesting that there is no clear winner in the choice of algorithm used for
choosing the number of bins or the bin width. The article How Wide Is Your Bin? contains an interesting thread
(i.e. a discussion topic) from the Ed-Stats mailing list.

The Density Trace


With the widespread use of computers in modern statistics, new
methods of displaying a dataset have been invented. NCSS 6.0
Jr allows the user to add a display called a density trace to a
histogram. The density trace can be thought of as a smoothed
histogram in which the problems caused by fixed bin widths are
obviated. It is displayed as the curved line in the diagram. The
article (which is the NCSS Jr help file on this topic) The Density
Trace discusses this plot further.

The Histogram and Stemplot Compared

A histogram is an alternative to a stemplot for displaying data. A


stemplot is restricted by our number system to certain bin widths;
a histogram is under no such restriction. However, you usually
lose the actual data values, and constructing a histogram by hand is a tedious process.

When constructing a histogram by hand, a decision about bin sizes and the number of bins has to be made when
tabulating the data. A poor decision can result in a histogram that either gives misleading information about the
data or fails to inform the viewer about some aspect of the data. A computer is of value here, as a variety of
histograms, each with a different bin width, can be constructed. Which histogram is preferred depends upon
which aspects of a dataset are to be featured.

Beware the Humble Histogram!

Ideally a histogram should show the shape of the distribution of the data. For some datasets, the choice of bin
width can have a profound effect on how the histogram displays the data. To see this for yourself, have a look at
the Histogram Applet, from R. Webster West, Dept. of Statistics, Univ. of South Carolina (you will need a java-
enabled browser to see the applet). It is a histogram of the interruption time (i.e. time between eruptions) of the
Old Faithful Geyser in Wyoming, USA. Slide the bar to change bin widths, and watch how that effects the
shape of the histogram. Will you ever trust a histogram again?

As most classrooms don’t have Internet access on tap, the Word document Old Unfaithful contains a series of
histograms of the interruption time of the Old Faithful geyser. The series nicely shows the effect of bin size on
the appearance of the histogram.
From the Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/
© Education Queensland, 1997

Histograms Worksheet
Datasets and Stories for Histograms

There is benefit in students using the same datasets for different analyses. It is efficient, as students
don’t need to acquaint themselves with a new story for each display. If they are using a graphing
calculator the students don’t need to enter a new set of data into the lists. (I recommend you read
the article by Al Coons,Efficient Storing of Data on the TI-83, about storing data in a program for
later use.) Another benefit is the opportunity to contrast the features of the data highlighted by each
display. For these reasons I suggest the students use the datasets and stories on the stemplots
worksheet when learning about histograms as well as the data generated when the students
played Greed!.

Other datasets that are appropriate for histograms include Air Pollution, Bradmanesque, Oscar


Winners, Speed of Light and Wild Horses. Follow the links to the datasets and from there to their
stories.

Introducing Histograms

Give students a set of data and the accompanying story. Students should realise that a dataset
doesn't have a single histogram but many histograms, one for each choice of bin width. For 'nice'
data, i.e. data that is symmetric and with no clustering or outliers, the set of histograms may all give
the same general picture so the choice of histogram is not critical. With data that isn't so nice,
different choices of bin widths may give histograms that look markedly different. For such datasets
students will need to produce a variety of histograms, and then make and defend their choice as to
which is 'best'.

Note that graphical calculators and computer statistics programs don't necessarily choose the best
display by default and hence it is an unwise student who doesn't construct a few histograms of
varying bin widths as part of their analysis.

Analysing a Histogram

After the histogram is drawn, students should

locate the approximate centre of the distribution by eye;

determine the spread of the data, and look for potential outliers;

note the overall shape of the distribution;

look for any other features of interest such as clustering or gaps.

Students need to practice writing a short report on the interesting features brought out by a
graphical display. One approach would be to give each small group a different dataset and story
and have them produce the display (say on a graphical calculator), discuss within the group the
characteristics of the data brought out by the display, and then report to the class.

Using Technology

As noted elsewhere, it is quite time-consuming to draw a histogram, especially to match the quality
and accuracy of a histogram drawn by even the simplest computer statistics program. Drawing a
single histogram by hand should be sufficient for the students to get a feel for the mechanics of
drawing a histogram, so additional histograms should be constructed using a computer or a
graphical calculator. Let the technology shine in its sphere (repetitive algorithmic processes) and let
the students shine in their sphere (looking for patterns, and deciding what the data say).

These remarks apply equally to other graphical displays, of course. It follows that students
shouldn’t be required to construct any of these displays by hand for assessment purposes - it is a
trivial exercise, and possibly the least important task you could ask a student to do in statistics.
From the Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/
© Education Queensland, 1997

Matching Histograms and Boxplots


Objective: Students will improve their ability to interpret the information given in a boxplot, by
matching boxplots of sample data drawn from different distributions with their associated
histograms.

Materials: One worksheet per student or small group.

Time: 20 minutes.

Instructions: Students are given a worksheet which contains a series of histograms in the left column
and a series of boxplots in the right column. They are to match each boxplot with its
associated histogram. They need to be able to defend their decisions in a subsequent
whole class discussion.

The distribution from which each sample was drawn in given in the solutions, for your
interest.

Note: the sample data for this worksheet was generated with a nifty little freeware
Windows program called PQRS, which is available from the Resources page of the
Exploring Data website.

From the Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/


© Education Queensland, 1997

Matching Histograms and Boxplots


Match each histogram with its corresponding boxplot, by writing the letter of the boxplot in the
space provided.

1._______ A.
2._______ B.

3._______ C.

4._______ D.

5._______ E.

[To Instructions] [To Solutions]

© Education Queensland, 1997

Matching Histograms and Boxplots - Solutions

D - Normal(0,1) A - Geometric(1)
A - Geometric(1) B - Weibull(4,1.5)

C - Weibull(4,9) C - Weibull(4,9)

E -Uniform(1,4) D - Normal(0,1)

B - Weibull(4,1.5) E - Uniform(1,4)

[To Instructions] [To Worksheet]

Measures of Location
o you think teaching about the mean, median and mode are boring? Well, maybe, but there are some interesting
little side alleys to this topic that are worth exploring.

Before I go into detail on these, I must say I was intrigued to see that the topic of finding an average
inStatistics, Concepts and Controversies by David S. Moore is delayed until page 237! This illustrates two very
important points - calculating summary statistics is a waste of time until the user decides what is important
about the data and which summary statistics may be useful. Well maybe that is only one important point.

STEPS
The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics,
including summary statistics. Visit the STEPS page for further information and a list of the modules available.

Which Average?

The choice of measure of location requires understanding of the properties of each measure. The A Rather
Average Worksheet contains three nice problems on this topic.

Simpson's Paradox

Have you ever noticed that a government can give tax cuts to the population and still earn more money than
ever before? Did you realise that it is possible for Steve Waugh to have a better batting average than his brother
Mark in each of two Ashes Series and yet have a worse average overall? Read the article Simpson's Paradox to
learn about these and other intriguing examples of this phenomenon.

Sex and Dating

The results of a sex survey conducted in the Chicago area gave the average number of lifetime sex partners for
men as 6, and for women as 2. This statistic wasn't questioned until someone posting to the rec.puzzles
newsgroup asked, 'Hey, is this possible?' Read the article Sex Survey to find out more. Note: you may need to
change the context before you introduce this little puzzle to students!

Abolish the Mean!

I once had a clever idea - we can ignore the mean as a descriptor of a dataset and put the entire burden of
locating a dataset onto the median. So I told some statisticians about it. If you are interested in their responses
read The Mean? Who Needs It!

Finally, did you know that the great majority of people have more than the average number of legs? Amongst
the 19 million people in Australia there are probably 2 000 people who have only one leg and no one has three
or more legs. Therefore the average number of legs is: (2000 x 1 + 18 998 000 x 2) / 
19 000 000= 1.999895. Since most people have two legs...

Vary Useful Statistics
The measure of spread of a dataset is a vary useful statistic!

When summarising a dataset, at least two measures are needed - one to locate the dataset and another to indicate
the spread of the data. The mean and the median are the common measures of location, while the standard
deviation and interquartile range are commonly used to summarise the spread of the data.

Range

One measure of the usefulness of a statistics is its robustness. A robust measure is one that is little affected by
outliers. On this basis the range, which is simply calculated as

Range = largest data value - smallest data value


is obviously not very robust and hence is not particularly useful.

Mean Deviation

Until recently I was never able to satisfactorily answer the question, "The mean deviation is simple. Why is the
standard deviation used rather than the mean deviation?" An email by Paul Gardner from Monash University
gives a clear explanation, and is the basis of the article, I'm Not Mad about MAD.

Standard Deviation

The Measures of Spread worksheet contains some lovely questions on standard deviation, courtesy of Pat
Ballew. In fact I would rate these questions as being at least 1.5 standard deviations above the average question.

Interquartile Range

The interquartile range, while simple in concept, has caused much grief to introductory statistics teachers since
different respectable sources define it in different respectable ways! The article Ticky-Tacky Boxesdiscusses the
different methods of finding Q1 and Q3, in the context of constructing a boxplot, and makes a recommendation
as to which is 'best'.

STEPS

The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics,
including summary statistics. Visit the STEPS page for further information and a list of the modules available.

Boxplots
The treatment of boxplots in current senior secondary textbooks highlights the need for Queensland high school
teachers to use resources other than the textbook when teaching introductory statistics.

There are two basic flavours of boxplot. The 'simple' boxplot has the whiskers drawn out to the maximum and
minimum values. While this is suitable for a quick analysis, it doesn't give as much information about the data
as the standard boxplot, which draws the whiskers no longer than 1.5 IQRs from the box and locates points
beyond that individually. Such points are called outliers.

Unfortunately boxplots in our texts tend to only be simple boxplots. This shouldn't be surprising as the
Mathematics A syllabus only makes mention of simple boxplots. It could have been worse - an early model of a
graphical calculator available in Queensland used the mean rather than the median to mark the centre of the
dataset. Even the TI-83 graphical calculator, the calculator of choice for AP-Statistics students in the US, draws
boxplots two ways, and the default boxplot ignores outliers.

Nonetheless statisticians almost exclusively draw boxplots with outliers, if they exist. The process isn't difficult,
even if done by hand, so I recommend that your students learn to draw standard boxplots as well as simple ones.
The article How to Construct a Boxplot explains how to do it.

When to Choose the Boxplot


Boxplots are most useful when comparing two or more sets of sample data. Differences in the centres and
spread of the datasets are clearly visible with a boxplot.

A boxplot also gives a picture of the symmetry of a dataset, and shows outliers very clearly. Both of these
features are important when deciding which summary statistics would best describe the dataset. A condition of
many hypothesis tests is that the data is approximately normally distributed and a boxplot can assist in
determining this. Prior to conducting a hypothesis test, a statistician looks at the data, and histograms and a
boxplot would be the displays most often chosen.

The Boxplots worksheet contains data drawn from physics, cricket and biology. The Codeine
Concentrationsworksheet has some data suitable for displaying using boxplots. There is assessment available
from theAssessment page, where a variety of graphical displays, including boxplots, may be needed for a
solution.

STEPS

The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics,
including the graphical display of data. Visit the STEPS page for further information and a list of the modules
available.

Matching Histograms and Boxplots

Students will improve their ability to interpret the information given in a boxplot by matching boxplots of
sample data drawn from different distributions with their associated histograms.

Constructing Boxplots

It is interesting that there is general agreement among statisticians about how to construct the whiskers and
determine outliers (which is where the problem lies with our texts) but very little agreement on how to construct
the box. Using the KISS principle, I teach students a method that is easy to remember and easy to do. Read the
article How to Construct a Boxplot for details.

Ticky-Tacky Boxes

If you are interested in learning about different methods of calculating the 1st and 3rd quartiles (and the angst this
has caused among AP-Stats teachers), you may find the article Ticky-Tacky Boxes interesting. The article is
based on emails from the AP-Stat and Edstat mailing lists. Warning - whatever you do, please don't try to tell
your students about all of this. You will only confuse the cherubs.

Thanks to Bob Hayden, who has provided much of the information for this article.

Ozone and Outliers

The 'ozone hole' above Antarctica provides the setting for one of the most infamous outliers in recent history. It
is a great story to tell students who wantonly delete outliers from a dataset merely because theyare outliers.
Visit the Ozone and Outliers page for all of the fascinating details.

The 1970 Draft Lottery


No discussion of boxplots should leave out the story of the 1970 Draft Lottery, the first lottery held to select
those chosen to serve in Vietnam, which gave rise to possibly the single most famous set of boxplots in
existence. Yours truly was given a free ticket in the lottery, so the story on this page is of uncommon interest to
me. Based on what the boxplots show, it turns out that this October-born lad was even luckier than was thought
at the time.

Normal Plots
The normal probability plot (sometimes called a quantile plot) is a useful tool for determining the normality of a
dataset. The article Normally I Wouldn't Reveal the Plot ... introduces the normal plot, discusses why it is a
valuable tool in your arsenal of graphical displays and gives you a recipe for making your very own plot.

Normal plots are not mentioned in the Mathematics A, B or C syllabus. However they are a valuable tool for
determining if a dataset is normal, which is one of the assumptions on which the t-test is based. At a minimum a
student studying the Probability and Statistics optional unit in Mathematics C should be able to interpret a
normal plot, and preferably understand how to construct one.

Scatterplots
The Challenger Disaster

A worksheet that gives a brief background to the Challenger disaster and the dataset that gave warning of the
disaster. A nice introduction to scatterplots and the importance of displaying data clearly.

Anscombe's Datasets

Two overhead transparencies - the first has F.J. Anscombe's famous quartet of datasets and the second has the
scatterplots of these datasets. Absolutely convincing proof of the need to look at the data first. It convinced me
anyway. The scatterplots were produced using NCSS 6.0 Jr.

Using a Scatterplot to Find a Friend

Peter Smith from Mechanicsburg High School in Pennsylvania shares this great introductory activity that helps
students learn about scatterplots and correlation.

3-D Scatterplot Java Applet

Given three columns of data, this site generates a VRML file for viewing the data in 3-dimensions, including
'flying' right in the middle of the data. Great fun. You will need a VRML browser to view the scatterplot.

Assessment
As one person's worksheet is another's assignment, it is often rather difficult to classify a particular document as
one or the other. Nonetheless the collection here consists of documents that were specifically intended to be
assessment.

Bradmanesque (available in Word 2.0 only)


A lovely assignment that requires students to draw from their pool of knowledge about descriptive statistics.
Topics: graphical display of data, summary statistics.

The Age of Female Actor Oscar Winners

This dataset has some intriguing patterns. Students are asked to determine if the average age of female actor
Oscar winners is increasing. Topics: graphical display of data, summary statistics.

Pecking Order in Chickens

A researcher on animal behavior wants to study the relationship between pecking order and weight. He places
four chickens in each of seven pens and observes the pecking order that emerges in each pen.

As the researcher’s assistant, you have been asked to analyse this data (and possibly generate some graphical
displays) and write a report on the relationship, if any, between pecking order and weight. Topics: summary
statistics, graphical displays.

Galileo's Gravity and Motion Experiments

This dataset may need some dusting off as it is over 400 years old! Galileo produced this data when he was
studying motion under gravity. The assignment includes a graphic of some of Galileo's original notes. Topic:
curve fitting.

Pricing Diamond Rings

Pricing diamond rings in Singapore can be viewed as an interesting exercise in statistical modelling. The price
equals the current market value of the gold content of the ring, a craftsmanship fee plus the cost of the diamond.
In this assigment the student trys to find a mathematical function which allows the price of a diamond ring to be
determined from the size of the diamond. Topics: linear regression, curve fitting.

More Stories

Here are some more stories, with their associated datasets available from the Datasets page. These can be turned
into assessment items or further examples or exercises, as you desire. Topics: various

Introduction to Business Statistics at Georgia State University

An absolutely enormous item bank of statistics questions, many of them multiple choice from Georgia State
University. They are categorised for convenience. As an indication of the size of the item bank the Descriptive
Statistics section (one section of eight sections) has a filesize of about 150K.

AP Stats Assessment

A number of teachers of Advanced Placement Statistics (a first year tertiary statistics course taught in high
schools, mainly in the US) maintain websites with worksheets, datasets and assessment. The AP-Statistics
course covers all of the statistics in Maths A, B and C and more.

Al Coons at Buckingham Browne & Nichols School


Al's website is a real treasure for anyone teaching AP Statistics for the first time.Follow the link to
Projects/Student Papers. Note that some of the projects are outside of our syllabus areas (eg Chi-Square).

Paul Myers at Woodward Academy

Follow the link to Assessment. Paul is posting his tests in html format. His complete set of tests from 1997 is
currently available.

Datasets
These datasets support the activities, worksheets, assessment and articles in the Exploring Data website.
Datasets are available in three formats - Excel 4.0  , NCSS Jr. 6.0   and Tab Delimited  .

Notes:
Clicking on the name of the dataset will give you the story behind the dataset.
NCSS Jr 6.0 datasets require that you download two files, with extensions .s0 and .s1
Students should work with real data gathered by others for purposes of solving problems. But they should also
gather data themselves. The articleStudent-Generated Data discusses a few ways that this can be done.
Formats
Dataset Topics Excel NCSS Tab
1970 US Draft Lottery boxplot
1971 US Draft Lottery boxplot
AIDS / HIV scatterplot, nonlinear regression
Air Pollution boxplot, summary statistics
Alligator! nonlinear regression
Anscombe's Dataset data display, scatterplot
Bradman - an Outlier? boxplot, graphical display
Bradmanesque graphical display, summary statistics
Carbon Dioxide curve fitting
Carbon Emissions curve fitting
Challenger scatterplot
Cloud Seeding boxplots, dotplot
Codeine Concentration boxplots, t-test
Cricket (the Insect) linear regression
Density of the Earth graphical display, summary statistics
Density of Nitrogen boxplot
Diamond Rings linear regression, curve fitting
Fleas scatterplot, time series
Galileo's Experiments polynomial regression
Global Temperature 1 scatterplot, regression
Global Temperature 2 scatterplot, regression
Metric Estimates graphical display, summary statistics
Oil Production exponential regression
Old Faithful histogram
Olympic Gold linear regression
Oscar Winners graphical display, summary statistics
Pecking Order graphical display, summary statistics
Pottery boxplots
Smoking and Cancer scatterplot, linear regression
Speed of Light graphical display, summary statistics
Stride Rate linear regression, curve fitting
Wild Horse graphical display, summary statistics
World Population exponential regression
Year 10 Certificates graphical display, summary statistics

Linear Regression
The study of functions in Maths B can be enriched by including authentic applications which illustrate how
mathematics can model aspects of the world. In real life functions often arise from data gathered from
experiments or observations, and such data rarely falls neatly into a straight line or along a curve. There is
variability in real data that needs to be explained and measured, and it is the task of the student to find the
function that best 'fits' the data in some sense.

The first functions we study in Maths B are linear, so it makes sense to start with problems that are whose data
are linear in nature.

Looking at the Data

When fitting a function to data, the student MUST first plot the data, and this activity shows why. F.J.
Anscombe invented these datasets to demonstrate the importance of graphing the data before finding the
correlation and line of regression. They present a very striking picture.

Using Statistics in Human Movements

One measure of form for a runner is stride rate, defined as the number of steps per second. A runner is
considered to be efficient if the stride rate is close to optimum. The stride rate is related to speed; the greater the
speed, the greater the stride rate. This article gives a fully-worked solution to finding the stride rate as a function
of speed using the statistics functions of the TI-83 graphics calculator.

Student Generated Linear Data

At times we should use real data, gathered to give insight into real problems, as this illustrates how fitting a
function to data may be done in real-life. But we should also get our students to generate their own data, which
gives them ownership of the data and an understanding of the process (often difficult) of collecting reliable and
valid data. This article contains examples from high schools in the U.S.

Olympic Gold Medal Performances

The Olympics coming to Australia in the year 2000, so datasets about the Oympics are worth their weight in
gold medals. In this worksheet the data for the gold medal performances in long jump, high jump, discus throw
since 1896 are supplied. Students are asked to find a linear model for each set of data, and predict the gold
medal performance in Sydney in the year 2000.

Linear Regression Java Applet

This applet teaches students the effect on a regression line of adding an additional point.

Normal Distribution
Why 1.5?

Many students are curious about the ‘1.5*IQR Rule’, i.e. why do we use Q1 - 1.5*IQR (or Q3 + 1.5*IQR) as
the value for deciding if a data value is classified as an outlier? Paul Velleman, a statistician at Cornell
University, was a student of John Tukey, who invented the boxplot and the 1.5*IQR Rule. When he asked
Tukey, ‘Why 1.5?’, Tukey answered, ‘Because 1 is too small and 2 is too large.’

It has been shown that this is a reasonable rule for determining if a point is an outlier, for a variety of
distributions. This worksheet asks the student to demonstrate this for the normal distribution.

Light Bulbs and Dead Batteries

Don Kerr, of Brisbane-based Zeno Educational Consultants, once told me that he believed that the lifetime of
light bulbs and car batteries both have a decaying exponential distribution. I was intrigued by this, as every
statistics textbook I have ever used always had a question that started, ‘Assume the lifetime of light bulbs is
normally distributed, with mean....’. So I decided to ask my colleagues, which resulted in this interesting
exchange.

Normal Approximation to the Binomial

A Java applet that visually demonstrates how accurately the normal distribution approximates the binomial
distribution for given values of n and p.

Probability
Probability is a wonderful subject to teach! There are so many activities for teaching concepts, puzzles and
problems with non-intuitive answers and a variety of contexts for the exercises. This page contains a small
collection.

Unders and Overs


Now having thrown out that challenge, the first activity I am going to suggest to you involves dice! But here the
dice are being used in the context of a once popular gambling game and not as a dry as toast exercise with little
relevance.

Unders and Overs was once a popular game at school fetes in Queensland. It was illegal, as was all gambling,
but people turned a blind eye as the money raised went to a school. I have used this activity in the past, but not
with the flair that Bill Simpson demonstrated at a Fun of Mathematics night at the University of Queensland.

A Special Set of Dice

Here is a neat little trick to play on your students, based on a special set of dice. Bring into your class a special
set of dice, and then explain the rules -

'I have this set of four dice. We will each chose a die; in fact since I am such a generous person, I'll let you
choose first. We'll roll the two dice and the winner is the person who die has the highest number. The first
person to record five wins is the champion. Now if I am the champion, you have to do an extra hour of
homework tonight. If you are the champion, then you are excused from homework for one week. Any takers?
After all, you've got to be in it to win it.'

Of course, the student will be excused from doing the extra homework if the class can figure out why the
teacher wins almost all of the time.

The Monty Hall Problem

The rec.puzzles newsgroup regulars get very annoyed when a newbie finds the newsgroup and immediately
posts his favourite puzzle, which usually turns out to be a puzzle that was posted last week and the week before
that and.... The puzzle most commonly posted by newbies is the 'Three Men and the Bellboy' puzzle, but
breathing down the neck of the bellboy is the 'Monty Hall' puzzle, which deals with elementary probability.
Now the reason these puzzles are so popular is because they are great puzzles! The Monty Hallpuzzle can even
boast about a little collection of websites devoted to it.

STEPS

The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics,
including the binomial distribution and conditional probability. Visit the STEPS page for further information
and a list of the modules available.

Dice Difference

Dice Difference is really a dice game with a difference! Rather than add the numbers on the two dice together,
subtract the smaller from the larger. Person A gets the totals 0, 1 and 2, while person B gets the totals, 3, 4 and
5. Is this game fair?

A Dice-Free Worksheet

What message is being given to students about the importance of understanding probability when a large
proportion of the exercises in our texts are based on coins, cards, dice, marbles and urns? A Dice-Free
Worksheet gives examples of realistic applications of probability that are suitable for Maths A and Maths B
students. While such exercises take some effort to create, I think the effort is necessary if we are to help
students realise why we study this topic.

Chance and Basic Probability

A website containing a collection of articles from the Hobart Mercury newspaper that illustrate various aspects
of probability in the news.

Sampling

JellyBlubbers

A hands-on introduction to simple random samples and the importance of sample size. The worksheet
containing the JellyBlubbers population may be useful for hypothesis testing as well.

Note that this worksheet is in Word 6.0 format as I haven't been able to convert the graphic to Word 2.0 format
(yet).

STEPS

The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics,
including simple random samples and the distribution of sample means. Visit the STEPS page for further
information and a list of the modules available.

Data Collection and Sampling

A website containing a collection of articles from the Hobart Mercury newspaper that illustrate both good and
poor methods of sampling from a population.

Confidence Intervals
The concept of a confidence interval is quite difficult for beginning statistics students, and sometimes for
beginning statistics teachers! For example, assume that our population parameter of interest is the population
mean. What is the meaning of a 95% confidence interval in this situation?

Many students want to say that a 95% confidence interval means that there is a 95% chance that the confidence
interval contains the population mean. But any particular confidence interval either contains the population
mean, or it doesn’t. The confidence interval shouldn’t be interpreted as a probability.

The correct interpretation is based on repeated sampling. If samples of the same size are drawn repeatedly from
a population, and a confidence interval is calculated from each sample, then 95% of these intervals should
contain the population mean.

Understanding Confidence Intervals is an activity which helps students understand confidence intervals. The
activity requires a TI-83 graphing calculator.

STEPS
The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics,
including confidence intervals. Visit the STEPS page for further information and a list of the modules available.

Confidence Intervals Java Applet

The applet helps students understand confidence intervals. Each of the 50 lines on the graph represents one
confidence interval for the mean.

Hypothesis Testing
Introduce the Concept Early!

The concept of hypothesis testing should be introduced informally when first constructing stemplots and
histograms. The terminology relating to populations and samples can be introduced early in the study of
statistics and used consistently throughout the unit.

For example, after students have constructed a stemplot of Henry Cavendish's data on the density of the earth,
the following issues can be discussed:

 The data is only a sample of all possible measurements.


 Why measurements, even of the same quantity, are not identical.
 The true value of the density can be estimated. The idea of a point estimate and an interval estimate
arises quite naturally.
 The likelihood of the true density being as low as 5.3. A conclusion might be, 'It is highly unlikely that
the actual density of the earth is 5.3 or lower.' Students who understand this conclusion are well are their
way towards understanding hypothesis testing.

It is worth mentioning at this early stage that there are statistical procedures that allow us to make precise
statements about liklihood and that the students will meet these later in the unit.

STEPS

The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics,
including hypothesis testing.. Visit the STEPS page for further information and a list of the modules available.

The 'Barramundi' Dataset

This dataset contains 1000 integers (having a normal distribution with mean=55 and sigma=12. The data fit
nicely onto both sides of an A4 sheet of paper. I have used this data to simulate a population of barramundi in
the Fitzroy river, but you of course can make up your own scenario.

N.B. The HTML version of the dataset contains 800 integers as that was as many as would fit onto two pages.

Inference in the News

A website containing a collection of articles from the Hobart Mercury newspaper that illustrate various aspects
of drawing conclusions from data.
Central Limit Theorem

This applet demonstrates the central limit theorem using simulated dice-rolling experiments.

Hypothesis Testing Joke

This joke is from Mark Eakin (eakin@omega.uta.edu), who has kindly given permission for me to include it on
the website.

Most of you do not know that when Santa was a young man he had to take a statistics course. When the class
started covering two-sided hypothesis tests, he had a lot of trouble remembering where to put the equal sign. He
started repeating to himself "The equal sign goes in the null hypothesis. The equal sign goes in the null
hypothesis. The equal sign goes in the null hypothesis."

Eventually Santa had to shorten this phrase to make it easier to remember. In fact to this day you can still hear
him say "Ho, Ho, Ho."

Curve Fitting
Nonlinear regression, also known as curve fitting, nicely integrates statistics and the study of functions. And if
in the study of polynomials, exponential or log functions or periodic functions students in your classroom are
working with real problems containing real data (and I hope they are) then there is no choice about including
this topic. It's already there.

Curve Fitting

The paper Curve Fitting, available as a Word 2 document only, discusses the mathematics needed to understand
nonlinear regression. It includes some fully worked examples of how to determine which nonlinear function
best fits a set of data as well as a sample assignment. Note that the file is quite large (888K) as it contains
numerous screen graphics from a graphics calculator and statistics software.

Anscombe's Dataset

F.J. Anscombe was a pioneer in demonstrating the importance of looking at a set of data before choosing which
analyses were appropriate. He created a quartet of paired datasets that wonderfully illustrate this. Anscombe's
Datasets contain masters of two overhead transparencies - the first has F.J. Anscombe's famous quartet of
datasets with some summary statistics and the second has the scatterplots of these datasets.

Pricing Diamond Rings Assignment

This document is an assignment on finding a regression equation relating the price of a diamond ring with the
size of the diamond. It looks at both linear and nonlinear models. Students will need to be familar with both
linear and nonlinear regression, including using a curve fitting software program such as CurveExpert.

The data and the idea for this assignment came from the article Diamond Ring Pricing Using Linear
Regression in the Journal of Statistics Education v.4, n.3 (1996) by Singfat Chu.

Future Developments
There has been an interesting discussion on the ap-stat mailing list about the interpretation of r and r2 when
dealing with nonlinear data. I haven't absorbed it all yet, but when I do I will make a document that discusses
this issue available from this page .

También podría gustarte