Está en la página 1de 42

42442

94
8
02
0
0
486057806
2 6 9

19
CH A P T E R

Australian Curriculum
content descriptions:
ACMSP 248
ACMSP 249
ACMSP 250
ACMSP 251 Statistics and Probability
ACMSP 252
ACMSP 278
ACMSP 279

34
125
Statistics

78
6
42
In previous books in this series we have looked at the measures of central
tendency, such as the mean and the median.

9
In this chapter we discuss two measurements of spread the interquartile

2
range and standard deviation. The representation of numerical data by

0
4
boxplots is also introduced.
In our study of statistics up to now, we have often associated one measurement

8
with an item. For example, the height of each person in a class, the number of
possessions obtained by a player in a football match or the number of marks

0
obtained by a student in a test.
In the last two sections of this chapter we look at associating a pair of numbers

26
with an item; for example, the height and weight of a person or the age and
salary of an employee. This is called bivariate data.
When a measurement is collected or recorded at successive intervals of time, it

5
is referred to as time series data. This type of bivariate data is also introduced
in this chapter.
Suggestions for statistical projects and references to other suitable information
can be found at www.cambridge.edu.au/GO.

CHAPTER 19 STATISTICS 2 87
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
19A The median and the
interquartile range
The median has been introduced and discussed in earlier books in this series. We review it
here, because it is the measure of central tendency used when working with the interquartile
range as a measurement of spread.

Median
We often see the median value being used to describe the housing market in a city. The
median is the middle value when all values are arranged in numerical order.
Here are 13 numbers in numerical order:
2, 2, 3, 3, 3, 4, 5 , 11, 13, 18, 18, 19, 21
This data set has an odd number of values. The middle value is 5, since it has the same
number of values on either side of it. Hence, the median of this data set is 5.
Here is a set of 12 numbers, arranged in numerical order:
1, 3, 4, 4, 5, 7 , 9 , 11, 13, 13, 19, 21
This data set has an even number of values. The middle values are 7 and 9. We take the
average of 7 and 9 to calculate the median.
7+9
median =
2
=8
Hence, the median of this data set is 8, even though this value does not occur in the data set.

Median
When a data set has an odd number of values and they are arranged in numerical
order, the median is the middle value.
When a data set has an even number of values and they are arranged in numerical
order, the median is the average of the two middle values.

Example 1

Calculate the median of the data sets.


a 33 35 43 29 53 39 45 b 5 7 9 5 12 10

Solution

a To locate the median, first put the values in numerical order. This gives:
29 33 35 39 43 45 53
Median = 39

288 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
b Again, the values are placed in numerical order.
5 5 7 9 10 12
7+9
Median =
2
=8

Quartiles and the interquartile range


The interquartile range (IQR) measures the spread of the middle 50% of the data in an
ordered data set.
We use the interquartile range to see how closely the data are grouped around the median.
When we calculate the interquartile range, we organise the data into quartiles, each
containing 25% of the data. The word quartile is related to quarter.
Olivia has been playing Sudoku on the internet. Her last 11 games were all rated diabolical,
and her times, correct to the nearest minute and arranged in ascending order, were
8, 12, 14, 14, 16, 18, 19, 19, 25, 78, 523
The range of these times is 523 8 = 515.
Clearly the range does not give a clear picture of Olivias considerable skills, because the last
two times, 78 and 523, are outliers. An outlier is a single data value far away from the rest of
the data. That is, it is much larger or much smaller than all of the other values. Outliers have
a huge influence on the value of both the mean and the range. (In fact, the time of 78 minutes
occurred when Olivia left the game running over dinner, and the time of 523 minutes occurred
when Olivia left the game running overnight.)
Because of situations like this, the interquartile range is often a better measure of the spread
of the data than the range. Here is the procedure for finding it.
Step 1: Find the median. Divide the data into two equal groups. Omit the median (middle
value) if there is an odd number of values. In Olivias case, there are 11 values so,
omitting the median 18, the two groups of 5 are
8, 12, 14, 14, 16 and 19, 19, 25, 78, 523
Step 2: The lower quartile is the median of the lower set of values. In Olivias case, the lower
quartile is 14.
Step 3: The upper quartile is the median of the upper set of values. In Olivias case, the
upper quartile is 25.
Step 4: The interquartile range is the difference between the two quartiles. In Olivias case,
interquartile range = 25 14
= 11
Thus, the middle 50% of Olivias times have a spread of 11 minutes.

CHAPTER 19 STATISTICS 2 89
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Notice that the interquartile range is unaffected by the lower quarter and the upper quarter of
the values. Hence, the large sizes of two of Olivias times, when she left the game running to
eat dinner and to sleep, do not affect the interquartile range.
The calculations begin slightly differently when there is an even number of results. For
example, suppose that Olivia played one more game, which she solved in 22 minutes.
There are now 12 results to arrange in ascending order:
8, 12, 14, 14, 16, 18, 19, 19, 22, 25, 78, 523
Step 1: Since there is an even number of results, we divide them into two equal groups of 6.
8, 12, 14, 14, 16, 18 and 19, 19, 22, 25, 78, 523
14 + 14
Step 2: The lower quartile is now = 14 .
2
22 + 25
Step 3: The upper quartile is now = 23 12 .
2
Step 4: The interquartile range is now 23 12 14 = 9 12 .
In this case, the middle 50% of Olivias times have a spread of 9 12 minutes.
The minimum, maximum, median and the two quartiles are sometimes called the five
number summary. Sometimes the lower quartile is called the first quartile, because it marks
the first quarter of the ordered data. The median is then the second quartile, although this
term is seldom used. The upper quartile is called the third quartile.
We denote the lower quartile by Q1 and the upper quartile by Q3. We sometimes use the
abbreviation IQR for the interquartile range.
Example 2

Find the interquartile range of each data set.


a 26 19 25 13 24 23 23 25 20 28 23
b 7 9 13 14 10 15

Solution

a First arrange in order and locate the median.


13 19 20 23 23 23 24 25 25 26 28

median = 23
There are 11 data values. The 6th value is 23, so the median is 23.
The lower group contains 5 values. The 3rd value is 20. So the lower quartile is 20.
Similarly, the upper quartile is 25.
Thus, interquartile range = 25 20
=5
That is, the middle 50% of data values have a spread of 5.

290 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
b First arrange in order and locate the median.
7 9 10 13 14 15
10 + 13
median =
2
= 11 12
There are 6 data values. We divide them into two equal groups of 3. The lower
quartile is 9 and the upper quartile is 14.
Thus, interquartile range = 14 9
=5

Example 3

For the stem-and-leaf plot opposite, 2 4 6 7 8 9


find the median and the quartiles. 3 0 1 1 3 4 6 7
34 means 34. 4 1 4 5 5 7 8 9
5 0 1 2

Solution

There are 22 data values. First locate the median to divide the data into two equal
groups. The 11th value is 36 and the 12th value is 37, so the median is 36.5.
The lower group contains 11 values. The 6th value is 30. So the lower quartile is 30.
Similarly, the upper quartile is 47.

Measures of spread
The range is the difference between the highest and lowest values in a data set.
The interquartile range measures the spread of the middle 50% of the data in an
ordered data set.
To calculate the interquartile range, find the difference between the upper quartile Q3
and the lower quartile Q1.

CHAPTER 19 STATISTICS 2 91
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Exercise 19A
Example 2 1 Find the range and interquartile range of each data set.
a 7 5 15 10 13 3 20 7 15
b 8 5 1 7 5 7 8 10 5 7
c 4 0 6 4 6 7 9 4
d 3 13 8 11 1 18 5 13
Example 3 2 Locate the median and the quartiles for each of the following stem-and-leaf plots. State
the interquartile range for each data set.
a 2 0 1 2 4 4 7 7 9 b 5 4 4 6 7 7 9
3 1 1 1 2 2 4 6 6 7 8 9 6 1 4 4 4 6 7 8
4 0 1 2 2 4 7 1 5 7 8 9 9
8 0 1 1 2 3 4 6
3 | 2 means 32
9 1 3 4 5

6 | 1 means 61
3 Find the mean, the mode, the median and the interquartile range of this data set.

Value 0 1 2 3 4 5 6 7 8 9 10
Frequency 5 2 0 7 1 8 4 6 0 2 11

4 Copy and complete the following table for the positions of the median and the quartiles
for data sets of 100 and 101 items. (Note: A position of 8.5 means it is between the
eighth and ninth data values).

Number of data Lower quartile Median Upper quartile


items position position position
a 100
b 101

5 The stem-and-leaf plot opposite gives the height


14 4 5 6
in centimetres of 20 students in a class.
15 0 1 2 8
a What is the range of the height of 16 0 0 1 2 4 5 7
students in the class? 17 2 6 7 8
b What is the median height of students 18 0 2
in the class? 15 | 1 means 151
c What is the interquartile range?

292 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
6 The stem-and-leaf plot opposite gives the lengths in 4 4
centimetres of 15 leaves that have fallen from a tree. The 5 5 1 8 4 4
values are given correct to one decimal place. Find the 6 3 1 2 4
interquartile range of the leaf lengths.
7 7 2 7
8
9 4 3 9 | 4 means 9.4

7 The following figures are the amounts a family spent on food each week for 13 weeks.
$148 $143 $152 $149 $158
$155 $147 $152 $158 $139
$143 $150 $141
a Find the median, upper quartile and lower quartile.
b Find the interquartile range of the amounts spent.
8 Write down two sets of seven whole numbers with minimum data value 3, lower
quartile 5, median 10, upper quartile 12 and maximum data value 13.
9 The median is always between the two quartiles. Is the mean always between the two
quartiles? If not, give an example of seven whole numbers where the mean is above the
upper quartile and an example where the mean is below the lower quartile.
10 a For a data set, the minimum value is 8 and the range is 27. Find the maximum value.
b For a data set, the maximum value is 106 and the range is 52. Find the minimum value.
11 a For a particular data set, the lower quartile is 7.9, and the interquartile range is 11.6.
Find the upper quartile.
b For a particular data set, the upper quartile is 25.6, and the interquartile range is 11.9.
Find the lower quartile.

CHAPTER 19 STATISTICS 2 93
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
19B Boxplots

A useful way of displaying the maximum value and the minimum value, the upper and lower
quartiles and the median of a data set is a boxplot.

scale
lower upper
quartile (Q1) quartile (Q3)
minimum median maximum

The rectangle is called the box.


The horizontal lines from the lower and upper quartiles to the minimum and maximum are
called the whiskers. In a boxplot, the box itself indicates the location of the middle 50% of
the data.
Boxplots are especially useful for large data sets. A boxplot is a visual summary of some of
the main features of the data set. Boxplots are also useful for comparing related data sets
see Questions 8, 9 and 10 in Exercise 19B.

Example 4

The weights of 20 students are recorded here. The weights are given to the
nearest kilogram.
48 52 54 54 55 58 58 61 62 63 63 64 65 66 66 67 69 70 72 79
a Find the median, upper quartile, lower quartile and interquartile range.
b Draw a boxplot for this data.

Solution

63 + 63
a There are 20 data values. Therefore, the median = = 63 kg
2
Divide the data into two equal groups of 10.
48 52 54 54 55 58 58 61 62 63 63 64 65 66 66 67 69 70 72 79
55 + 58 66 + 67
The lower quartile = = 56.5 kg The upper quartile = = 66.5 kg
2 2
The interquartile range = 66.5 56.5
= 10 kg

294 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
b 40 50 60 70 80

lower upper
quartile quartile
56.5 kg 66.5 kg
minimum median maximum
48 kg 63 kg 79 kg

Exercise 19B
1 The boxplot below shows the price (in $) of 20 different brands of sports shirts.
10 20 30 40 50

a How much did the most expensive shirt cost?


b How much did the least expensive shirt cost?
2 The boxplot below gives information regarding the annual salaries (in thousands
of dollars) of employees in a large company.

40 60 80 100 120 140 160 180

a What is the lowest salary?


b What is the range of the salaries?
c What is the median salary?
d What is the interquartile range?

CHAPTER 19 STATISTICS 2 95
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
3 The boxplot below gives information about the marks out of 100 obtained by a group
of 40 people on a general knowledge quiz.

40 50 60 70 80 90 100

a What was the lowest mark obtained on the quiz?


b What was the median mark obtained on the quiz?
c What was the range of marks?
d What was the interquartile range?
4 Construct a boxplot for the data set given in Exercise 19A, question 2b.
Example 4 5 The pulse rates of 21 adult females are recorded.
60 61 67 68 69 70 70 70 73 74 75 75 76 77 77 78 79 80 81 89 90
a Find the median, upper quartile, lower quartile and interquartile range.
b Draw a boxplot for this data.
6 In a boxplot for a large data set, approximately what percentage of the data set is:
a below the median b below the lower quartile
c in the box d in each whisker?
7 In a boxplot, is one whisker always longer than the other?
8 In a boxplot, why is the median not always in the centre of the box?
9 Here are two boxplots drawn on the one scale.

Data set A

Data set B

10 20 30 40 50

Which data set has:


a the greater median
b the greater range
c the greater interquartile range
d the greater largest data value?

296 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
10 Students in two classes sat the same mathematics test. Their results are shown in the two
boxplots below.

Class A

Class B

10 20 30 40 50

a Which class had the higher median mark?


b Which class had the higher interquartile range?
c In which class was the highest mark for the test obtained?
d In which class was the lowest mark for the test obtained?
e Which class did better on the test? Give reasons for your choice. (Class discussion)
11 The ratings for a number of television programs on Channel A, Channel B and
Channel C were collated. The information is shown in the boxplots below. (If a
program has a rating of 14, it means that 14% of the viewing audience watched that
particular program.)
5 10 15 20 25

Channel A

Channel B

Channel C

a Write down the approximate values of the median, quartiles and maximum and
minimum values for each channel.
b Which channel has the largest interquartile range?
c If the winning channel is the one with the highest rated program, which channel is
the winner? Which is second? Which is third?
d If the winning channel is the one with the largest median, rank the channels.
e Can you find a criterion that makes Channel C the winning channel?

CHAPTER 19 STATISTICS 2 97
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
19C Boxplots, histograms and
outliers
It is common to use a form of the boxplot that is designed to illustrate any possible outliers
in the data. Outliers are unusual, or freak, values that differ greatly in magnitude from the
majority of data values.
Median
outlier
Q1 Q3

Any point that is more than 1.5 IQRs away from the end of the box is classified as an outlier.
That is, if a data value is greater than Q3 + 1.5 IQR or less than Q1 1.5 IQR it is
considered to be an outlier. An outlier is indicated by a marker, as shown in the diagram above.
The whiskers end at the highest and lowest data values that lie within 1.5 IQRs from the
ends of the box.

Comparing a boxplot to the histogram of the same data


In ICE-EM Mathematics Year 9 Book 2 we looked at different shapes of histograms and the
distributions of data, and in particular we used the terms symmetric, positively skewed and
negatively skewed to describe the shapes.

Symmetric distribution Negatively skewed distribution Positively skewed distribution

The following examples look at representing data with histograms and boxplots.

Example 5

The house prices of 50 houses sold in a town over a period of two years are recorded.
The prices are in thousands of dollars.
110, 110, 120, 130, 140, 150, 150, 170, 170, 170, 180, 190, 200, 210, 210, 230, 270,
270, 290, 310, 340, 340, 340, 340, 350, 360, 360, 365, 365, 400, 400, 400, 400, 410,
430, 440, 450, 460, 460, 460, 460, 564, 678, 678, 750, 760, 904, 1320, 2350, 2350
a Find the quartiles, the median and the interquartile range.
b Calculate 1.5 IQR.
c Name the outliers.
d Draw a histogram and boxplot of this information. The boxplot should show outliers.
e i Calculate the mean, including the outliers.
ii Calculate the mean, not including the outliers.

298 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Solution

a The data has been given in ascending order. There are 50 data values. The median
is the mean of the 25th and 26th values.
Median = $355 000
Q1 is the median of the lower set of 25 values. This is the 13th value.
Q1 = $200 000
Q3 is the median of the upper set of 25 values.
Q3 = $460 000
IQR = $260 000
b 1.5 IQR = 1.5 (Q3 Q1) = $390 000
Hence, a value is an outlier if it is greater than 460 000 + 390 000 = $850 000
or less than 200 000 1.5 260 000
c The outliers are $904 000, $1 320 000, $2 350 000 and $2 350 000.
d

0 500 1000 1500 2000 2500


(Thousands of dollars)

14

12

10

0
0-
0-
0-
0-
0-
0-
0-
0-

10 -
11 -
12 -
13 -
14 -
15 -
16 -
17 -
18 -
19 -
20 -
21 -
22 -
23 -
24 -
-
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90

(Thousands of dollars)

The classes are $100 000 to $199 000, $200 000 to $299 000 etc.
e i Mean with outliers = $449 300, to the nearest $100.
ii Mean without outliers = $337 800, to the nearest $100.
It could be said that the distribution has a positive skew. The left hand whisker is
short. Most of the values lie in the interval from $100 000 to $500 000.

CHAPTER 19 STATISTICS 2 99
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Example 6

The waiting times in seconds at a ticket counter were as follows:


0, 0, 3, 5, 5, 5, 9, 10, 12, 13, 16, 17, 18, 18, 21, 22, 23, 23, 24, 24, 24, 24, 24, 25, 25,
25, 26, 26, 27, 28, 29, 28, 29, 29, 28, 30, 31, 31, 31, 32, 34, 34, 33, 33, 33, 34, 34, 33,
34, 35, 35, 35, 36, 36, 37, 38, 39, 38, 39, 39, 38, 40, 41, 41, 52
a Find Q1, the median, Q3 and the IQR.
b Draw a boxplot, showing outliers.
c Draw a histogram.
d Comment on the shape of the histogram and the boxplot.

Solution

a Q1 = 22.5, Median = 29, Q3 = 34.5, IQR = Q3 Q1 = 12


b

0 5 10 15 20 25 30 35 40 45

Q3 + 1.5 IQR = 34.5 + 1.5 12 = 52.5


Q1 1.5 IQR = 22.5 1.5 12 = 4.5
Therefore, the values 0, 0 and 3 are considered to be outliers.
c 16

14

12

10

0
04 59 1014 1519 2024 2529 3034 3539 4044 4549 5054
(Waiting time in seconds)

d There is a negative skew. The right-hand whisker is short. The left whisker is
longer, indicating a tailing off of the data values. The values 0, 0 and 3 are outliers.

300 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Example 7

Fifty four lengths of wire are cut off by a machine. The resulting lengths measured in
cm are as shown:
103, 104, 105, 106, 106, 106, 107, 107, 107, 107, 107, 108, 108, 108, 108, 108, 108,
108, 108, 109, 109, 109, 109, 109, 109, 109, 109, 110, 110, 110, 110, 110, 110, 110,
110, 110, 111, 111, 111, 111, 111, 111, 111, 112, 112, 112, 112, 113, 113, 113, 113,
114, 115, 116
a Find Q1, the median, Q3 and the IQR.
b Draw a boxplot, showing outliers.
c Draw a histogram.
d Comment on the shape of the histogram and the boxplot.

Solution

a Q1 = 108 cm, median = 109.5 cm, Q3 = 111 cm and IQR = 3 cm


b

102 104 106 108 110 112 114 116


(Lengths of wires in cm)
c 10

0
103 104 105 106 107 108 109 110 111 112 113 114 115 116
(Length of wires in cm)

d The histogram is symmetric. The whiskers on the boxplot are of equal length.
The values 103 cm and 116 cm are outliers.

CHAPTER 19 STATISTICS 3 01
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Exercise 19C
Examples 5, 6 1 The heights, measured in centimetres, of 25 students in a class are:
170 175 133 153 164 189 143 133 167 145
150 164 169 159 177 186 173 164 177 168
142 155 153 167 166
a Find Q1, the median and Q3.
b Find the interquartile range.
c Draw a boxplot, showing any outliers.
Example 7 2 The annual incomes of 30 people, given correct to the nearest $1000, are:
54 000 67 000 92 000 78 000 54 000 87 000
102 000 112 000 132 000 45 000 256 000 89 000
78 000 98 000 34 000 75 000 65 000 100 000
34 000 68 000 79 000 81 000 82 000 103 000
21 000 345 000 98 000 67 000 105 000 98 000
a Find Q1, the median and Q3.
b Find the interquartile range.
c Draw a boxplot, showing any outliers.
3 For each histogram shown below, draw a possible boxplot and describe the shape of the
data distribution.
a 16

14

12

10

0
5059 6069 7079 8089 9099 100109 110119

302 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
b 16

14

12

10

0
5059 6069 7079 8089 9099 100109 110119
c 16

14

12

10

0
5059 6069 7079 8089 9099 100109 110119

4 Consider the data shown in the stem-and-leaf plot.

15 6 8
16 9 9
17 0 1 3 3 4 5 8 8 9 9
18 0 0 1 3 3 4 7 7 8 8
19 1 2 3

a Draw a histogram.
b Find Q1, the median, Q3 and the IQR.
c Draw the boxplot.
d Comment on the shape of the histogram and the boxplot.

CHAPTER 19 STATISTICS 3 03
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
5 The lower and upper quartiles for a data set are 116 and 134. Which of the following
data values would be classified as an outlier?
a 190 b 60 c 150
6 The speeds of 20 cars measured on a city street were recorded.
40 14 3 26 20 31 42 36 17 24 28 33 27 29 24 51 11 35 5 24
a Construct a stem-and-leaf diagram.
b Construct a boxplot.
c Comment on the shape of the distribution of data.
7 The reaction times (in milliseconds) of 20 people are listed here.
38 31 36 39 35 25 35 44 43 44 46 34 62 22 42 48 31 30 45 40
a Find the median, Q1, Q3 and the interquartile range.
b Construct a boxplot.
c Identify any outliers.
8 The weight loss (in kilograms) of 20 randomly selected people undertaking a special diet
over three weeks is:
8 5 10 6 6 12 4 5 5 6 8 13 7 7 7 6 6 4 5 5
a Construct a dotplot of the data.
b Construct a boxplot of the data.
c Comment on the shape.
9 The following data are the speeds of 45 semi trailers passing a given point on an
interstate highway. The speeds are measured in km/h.
88 90 93 94 95 96 98 100 100 100 100 100
101 102 102 102 103 103 103 104 105 106 106 107
109 109 110 110 110 112 113 114 116 117 118 120
120 121 128 130 130 139 141 144 150
a Construct a dotplot of the data.
b Construct a boxplot of the data.
c Comment on the shape.

304 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
19D
Mean
The mean and the
standard deviation
10A

The mean of a data set is a measure of its centre. The mean is calculated by adding together
all the data values and then dividing the resulting sum by the number of data values.
sum of values
Mean =
number of values

A more common name for the mean is average. We use the symbol x to denote the mean.
For a set of data x1, x2, x3, , xn,
x1 + x2 + x3 + ... + x n
x=
n

Example 8

A student obtained the following marks in seven tests:


43, 35, 41, 29, 33, 39 and 42
Calculate the mean mark correct to two decimal places.

Solution

43 + 35 + 41 + 29 + 33 + 39 + 42
x=
7
37.43 (correct to two decimal places)

For larger sets of data, a frequency table can be prepared. Let f1 be the frequency of the data
item x1, let f2 be the frequency of the data item x2 and so on. In this case we can write:

f1 x1 + f2 x2 + ... + fs xs
x=
f1 + f2 + ... + fs

The numerator is the sum of the data items and the denominator is the number of
data items.

CHAPTER 19 STATISTICS 3 05
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Example 9

The following information gives the number of children in each of 20 families.


Calculate the mean number of children per family.
Number of children xi Frequency f i
0 4
1 5
2 7
3 4

Solution

Add in a column for f i xi.


Number of children xi Frequency f i f i xi
0 4 0
1 5 5
2 7 14
3 4 12

Total = 20 Total = 31

31
x= = 1.55
20
It is obviously impossible for a family to have 1.55 children. The mean is not
necessarily a member of the data set.

Standard deviation
The standard deviation of a set of data is a measure of how far the data values are spread out
from the mean. The difference between each data item and the mean is called the deviation of
the data value. The sum of the deviations is zero, which will be proved in Exercise 19H.
The standard deviation is calculated from the squares of the deviations.
Here are the steps in finding the standard deviation:
Calculate the mean.
Square each of the deviations.
Sum these squares.

306 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Divide the sum of the squares by the number of data values.
Take the square root of the value obtained.
This is given by the formula

( x1 x )2 + ( x2 x )2 + ( x3 x )2 + ... + ( x n x )2
=
n

where the xi are the data values, x is the mean and n is the number of data values.
We will use the Greek letter (sigma) to denote the standard deviation of a data set.

Example 10

Find the standard deviation, correct to two decimal places, for the data set.
5, 7, 11, 13, 14

Solution

5 + 7 + 11 + 13 + 14
x=
5
= 10
(5 10)2 + (7 10)2 + (11 10)2 + (13 10)2 + (14 10)2
2 =
5
25 + 9 + 1 + 9 + 16
=
5
60
=
5
= 12
Hence, = 12 3.46 (correct to two decimal places)

When calculating the standard deviation from a frequency table, we can use the
following formula:

f1 ( x1 x )2 + f2 ( x2 x )2 + f3 ( x3 x )2 + ... + fs ( xs x )2
=
f1 + f2 + ... + fs
When frequencies are taken into account, we can see that this is the same formula as above.
We can calculate the standard deviation with an extended frequency table with five columns. Fill
in the first three columns, then calculate x . Fill in the other two columns and then calculate .

CHAPTER 19 STATISTICS 3 07
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Example 11

Calculate the mean and standard deviation of the set of values, correct to two
decimal places.
1, 3, 4, 5, 7, 3, 6, 9, 9, 4, 5, 2, 5, 7

Solution

xi fi f i xi (xi x ) fi (xi x )2
1 1 1 4 16
2 1 2 3 9
3 2 6 2 8
4 2 8 1 2
5 3 15 0 0
6 1 6 1 1
7 2 14 2 8
9 2 18 4 32
Total = 14 Total = 70 Total = 76

70 76
x= =5 =
14 14
2 . 33 (correct to two decimal places)

Note: The sum of the deviations xi x is zero. Hence, the average of the deviations is
not useful.

Mean and standard deviation

The mean of a set of data is denoted by x .


The standard deviation of a data set is a measure of spread and is denoted by the
Greek letter .
There are two formulas for the standard deviation.
( x1 x )2 + ( x2 x )2 + ( x3 x )2 + ... + ( xn x )2
= , when the data is in a list.
n

f1( x1 x )2 + f2 ( x2 x )2 + f3 ( x3 x )2 + ... + fs ( xs x )2
= , when the data is in
f1 + f2 + ... + fs
a frequency table.

308 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
It is clear that the larger the standard deviation, the more spread out the data are about
the mean.
For example, here is a bar chart of the data in Example 11, and also another set of 14 data
items where the data are not as spread out but have the same mean.
4 5

4
3
3
2
2
1
1

0 0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

x = 5 and 2.33 x = 5 and 1.25


In the following section we will see how the standard deviation may be used to make
comparisons between data sets.

Use of calculators
The calculation of the standard deviation of larger sets of data is quite time consuming. Many
calculators and spreadsheets have a built-in facility for calculating the standard deviation of a
set of data.
We recommend using this facility for all but the simplest data sets. In particular, if x is not an
integer, then calculating is very tedious.
It should be noted that in this book we calculate the standard deviation by dividing the sum of
the squares of the deviations by n, the number of data items, and taking the square root. There
is also another type of standard deviation that is obtained by dividing the sum of the squares
of the deviations by n 1, and taking the square root. Many calculators offer both versions.
Sometimes they are denoted by symbols such as n and n 1. In this book, we only use n.

Exercise 19D
Give all answers correct to two decimal places unless otherwise specified.
Example 8 1 Calculate the mean of each data set.
a 5, 9, 10, 23 and 37
b 1, 2, 6, 9, 13 and 23
c 6, 6, 8, 10, 10, 10 and 15

CHAPTER 19 STATISTICS 3 09
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
2 During a 13-week football season, the number of kicks obtained by a particular player
each week is:
18, 18, 20, 26, 10, 8, 21, 14, 16, 14, 12, 9 and 16
Calculate the mean number of kicks obtained by the player.
3 The daily maximum temperature was recorded in two different cities for a week. The
results are shown below.
City A: 28, 31, 34, 32, 31, 29, 28
City B: 26, 32, 36, 38, 37, 29, 25
Which city had the greater mean daily maximum temperature?
4 The average of 5 masses is 67 kg. If a mass of 25 kg is added, what is the average of
the 6 masses?
5 During a term, a student has an average of 46 marks after the first four tests and his average
for the next six tests is 38 marks. What is his average for the ten tests?
Example 10 6 a Calculate, correct to two decimal places, the mean and standard deviation for the
data sets.
i 2, 4, 5, 3, 7, 4, 8, 9, 2, 6 ii 2, 4, 8, 10, 2, 9, 3, 8, 2, 2
iii 3, 6, 4, 5, 6, 7, 3, 4, 6, 6
b Comment on the results from part a.
Example 11 7 Copy and complete the following extended frequency table to calculate the mean and
standard deviation of the given data set.

xi fi fi xi (xi x ) fi (xi x )2
1 2
2 7
3 6
4 1
5 2
6 2
Total = Total = Total =

310 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
8 Use a calculator to find, correct to two decimal places, the mean and standard deviation
for each data set.
a 3, 6, 7, 5, 8, 5, 10, 12, 13, 12, 6, 9, 12, 14, 15
b 4, 7, 8, 10, 14, 16, 18, 15, 16, 15, 19, 9
c 8, 10, 12, 14, 16, 17, 19, 12, 11, 10, 14, 16, 18, 19
9 Twenty students sat a test and their results are given in the 1 2 2 8 9
stem-and-leaf plot opposite.
2 2 4 5 6 8
1 | 2 means 12 3 0 2 6 8 8 9
a Calculate their mean mark. 4 0 1 2 3 6
b How many students obtained a mark higher than the mean mark?
c Find the standard deviation of their marks.
10 Twenty people completed a test worth 10 marks. Their scores are shown in the frequency
table below.
Score 0 1 2 3 4 5 6 7 8 9 10
Number of people 0 2 0 1 1 2 4 6 0 2 2

a Calculate the mean mark.


b How many students obtained a mark lower than the mean mark?
c Find the standard deviation of their marks.

CHAPTER 19 STATISTICS 3 11
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
19E Example 12
Interpreting the standard
deviation
10A

For the two sets of data, 4, 5, 6, 7, 8 and 2, 4, 6, 8, 10:


a find the mean
b find the standard deviation
c comment on the similarities and differences in the two data sets.

Solution

a For the first data set:


x = 4 + 5+ 6+ 7 +8
5
=6
For the second data set:
x = 2 + 4 + 6 + 8 + 10
5
=6
b For the first data set:
(4 6)2 + (5 6)2 + (6 6)2 + (7 6)2 + (8 6)2
2 =
5
= 2
For the second data set:
(2 6)2 + (4 6)2 + (6 6)2 + (8 6)2 + (10 6)2
2 =
5
=2 2
c The mean of each data set is 6. The median of each data set is 6. The standard
deviation for the second set of data is twice the standard deviation of the first.
This is also evident by looking at the two data sets, since the first is spread evenly
from 4 to 8 and the second from 2 to 10.

Note: We can see from this example that if we double the spread of the data then we
double the standard deviation. The standard deviation is not necessarily doubled if we just
double the range.

312 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Intervals about the mean
In the following we will look at a symmetric set of data which tails off as you move away
from the mean in either direction.
The stem-and-leaf plot below gives the incomes, in thousands of dollars, of 134 people.
0 889
1 00223
2 4444448888888
3 11122444466667777788888999
4 111112223334444455556677777788999999999
5 000001111111122233344444577
6 3333366669999
7 778 99
8 666 7 | 7 means $77000

The mean is 45.1 and the standard deviation is 16.1.


The median is 45.5.
The lower quartile is 36 and the upper quartile is 52.
We next consider intervals centred on the mean.
x + = 45.1 + 16.1 = 61.2 and x = 45.1 16.1 = 29.0
There are 92 values between 29 and 61; hence, the percentage of values within one standard
deviation of the mean is 68.7%. Also,
x + 2 = 45.1 + 2 16.1 = 77.3 and x 2 = 45.1 2 16.1 = 12.9
There are 121 values between 13 and 77.
Thus, the percentage of values within two standard deviations of the mean is 90.3%.
45
40
35
30
25
20
15
10
5
0
09 1019 2029 3039 4049 5059 6069 7079 8089

x to x +

x 2 to x + 2

CHAPTER 19 STATISTICS 3 13
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
We have seen that about 69% of the data is within one standard deviation of the mean and
about 90% of the data is within two standard deviations of the mean.
Histograms similar to this one occur frequently. In most cases like these the median and the
mean are very close.
A remarkable result known as Chebyshevs inequality states that, for any set of data, if we
1
take an interval between x k and x + k, then at most 2 of the data can lie outside this
interval. k

1
So, for example, taking k = 2, not more than of the data can be outside this interval.
4
So at least 75% of the data must lie inside this interval.
x 2 x x + 2

at least 75% of the data

Example 13

David plays golf every Friday. He has recorded his score each Friday for five years,
and has found that his mean score for all his games is 85 and the standard deviation of
his scores is 5.2.
Find the range of scores that lie within:
a one standard deviation of the mean
b two standard deviations of the mean

Solution

a x + = 85 + 5.2 = 90.2 and x = 85 5.2 = 79.8


So the range of scores within one standard deviation of the mean is 80 to 90.
b x + 2 = 85 + 10.4 = 95.4 and x 2 = 85 10.4 = 74.6
So the range of scores within the two standard deviations of the mean is 75 to 95.

314 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Using the standard deviation to compare data
The following example shows how to use the standard deviation to compare data.

Example 14

Gus scored 14 in a maths test and 14 in an English test. The scores of each student in
the maths and English classes are listed below. In which test did Gus perform better,
relative to the class results?
Maths test: 10, 13, 18, 17, 12, 16, 9, 8, 7, 11, 10, 12
English test: 15, 17, 18, 19, 18, 17, 19, 16, 14, 15, 14, 12

Solution

143
Maths test x= 11.92, 3.38
12
English test x 16.17, 2.11
It can be seen that in the maths test Gus scored about 0.6 of a standard deviation above
the mean and in the English test Gus scored about 1 standard deviation below the mean.
So Gus has done better relative to the class in the maths test.

Exercise 19E
Example 12 1 Find the mean and standard deviation of each set of data.
a 5, 6, 6, 7, 8, 9, 22
b 11, 7, 8, 9, 8, 10, 10
c 1, 3, 7, 9, 11, 15, 17
Compare the sets of data using their means and standard deviations.
Example 13 2 The mean and standard deviation of each set of data is given. Find the range of values
that is within:
i one standard deviation of the mean
ii two standard deviations of the mean
a x = 35, = 2.5 b x = 40, = 5 c x = 35, = 8

CHAPTER 19 STATISTICS 3 15
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Example 14 3 John sits for two state-wide tests for English. In the first test he obtains a mark of 79, and
in the second test he obtains a mark of 75.
The mean of all marks in each test is 65. The standard deviation of the marks in the first
test is 15 and in the second test is 5. Compare Johns performance in the two tests.
4 The mathematics and English marks for a class of 15 students are given below.
Mathematics: 12, 16, 14, 19, 17, 18, 15, 15, 19, 20, 14, 18, 19, 15, 11
English: 10, 13, 16, 19, 20, 19, 18, 16, 15, 14, 17, 11, 15, 18, 17
a Calculate, correct to two decimal places, the mean and standard deviation for each set
of marks.
b If a student scored 16 for the mathematics test and 14 for the English test, which is the
better mark relative to the class results?
5 The following table lists the marks of several students on different tests in English and
mathematics. Compare the English and mathematics marks of each student.

Mark Mean Standard deviation


a David
English 15 17 2
Mathematics 13 17 3
b Akira
English 42 30 6
Mathematics 39 25 8
c Katherine
English 70 75 5
Mathematics 65 70 10
d Daniel
English 70 55 9
Mathematics 69 62 7

316 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
6 The bar charts of three sets of data are shown.
i 4 ii 4

3 3

2 2

1 1

0 0
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11

iii 4

0
1 2 3 4 5 6 7 8 9 10 11

a For each set of data, calculate the mean and the standard deviation.
b Add 5 onto each data item in each of i, ii and iii and state the mean and standard
deviation of each new set of data.
c Multiply each data item in each of i, ii and iii by 2 and state the mean and standard
deviation of each new set of data.
7 (There is no arithmetic required in the following.)
Make up a list of 10 numbers so that the standard deviation is as large as possible and:
a every number is either 1 or 5
b every number is either 1 or 9
c every number is either 1 or 5 or 9, and at least two of them are 5
8 Repeat Question 7, but this time so the standard deviation is as small as possible.
9 An employer has 29 employees whose weekly salaries have x = $429 and = $1.53. The
employer decides to give a flat $100 raise to every employee.
a What would be the change to the average annual salary paid by the employer?
b Would there be a change in the standard deviation?
c What would be the change in total weekly payments to employees?

CHAPTER 19 STATISTICS 3 17
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
19F Time-series data

A time series is a set of data that has been obtained by taking repeated measurements
over time.
10A

Maximum daily temperatures, average weekly wages, quarterly sales figures of a company and
annual population of a city are all examples of a time series.
To represent the information obtained in a time series pictorially, a graph is drawn in which:
the horizontal axis represents time
the vertical axis represents the quantity that is being measured at regular intervals
adjacent plotted points are joined by line intervals.

Example 15

The mean daily maximum temperature was measured each month in a particular city.

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Mean daily
29.2 28.9 28.1 26.4 23.5 21.2 20.6 21.7 23.8 25.7 27.4 28.7
max. temp ( C)

a Represent this information on a time-series plot.


b Briefly comment on the annual variation in daily maximum temperature.

Solution

a To construct a time-series plot, the months are placed on the horizontal axis and
the vertical axis will represent the mean daily maximum temperature. The points
are plotted and joined by lines.
The following time-series plot is obtained.
30
29
28
27
Temperature (C)

26
25
24
23
22
21
20

J F M A M J J A S O N D
Month

318 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
b There is a gradual decrease in the mean daily maximum temperature over the
months January, February and March. During April, May and June, the mean
daily maximum temperature falls quite quickly to a minimum during July. For
the remainder of the year, there is a steady increase in the mean daily maximum
temperature each month.

Exercise 19F
1 a Construct a time-series plot for the average rainfall (in cm) in a particular city, which
is given in the table below.

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Rainfall (in cm) 16.2 17.5 14.2 9.1 9.6 7.1 6.2 4.1 3.3 9.3 9.6 12.6

b Use the time-series plot to write a brief description as to how the rainfall varies in this
particular city.
2 The table below gives the annual profit (in $ million) of a particular company over
a 10-year period. Construct a time-series plot of the information.

Year 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998
Profit
1.2 1.8 2.4 2.2 2.6 3.1 3.2 3.4 3.6 4.0
($ million)

3 The table below gives the number of births that occurred in a hospital each month
for a year.

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Number
52 46 43 40 31 32 26 27 24 20 26 26
of births

a Represent this information on a time-series plot.


b Briefly describe how the number of births recorded each month changed over the year.
4 The table below gives the position of a particular football team in a competition
of 12 teams at the completion of each round throughout the season.

Round 1 2 3 4 5 6 7 8 9 10 11
Position 10 12 11 9 8 6 5 5 4 5 5
Round 12 13 14 15 16 17 18 19 20 21 22
Position 6 4 4 3 4 3 5 7 6 9 8

CHAPTER 19 STATISTICS 3 19
ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
a Represent this information on a time-series plot.
b Briefly describe the progress of the team throughout the season.
5 The data below shows the quarterly sales of a department store over a period of three
years. The quarters are labelled 1 to 12 in the corresponding time-series graph.

Sales quarter Sales $000 90


80
20091 45 70

Sales $ 000
20092 63 60
20093 67 50
40
20094 43 30
20101 51 20
20102 69 10
0
20103 75 1 2 3 4 5 6 7 8 9 10 11 12
Quarter
20104 39
20111 55
20112 71
20113 79
20114 49

a In which quarter of each year are the sales figures the worst?
b In which quarter of each year are the sales figures the best?
c Are the sales figures improving? Compare the sales figures for the first quarter of each
year and do the same for the other quarters.
6 The table below gives the quarterly sales figures for a car dealer for the period
20092011.

Number of sales Q1 Q2 Q3 Q4
2009 72 62 90 98

2010 87 78 112 111

2011 90 84 132 117

a Represent this information on a time-series plot.


b Briefly describe how the car sales have altered over the given time period.
c Does it appear that the car dealer is able to sell more cars in a particular period
each year?

320 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
19G Bivariate data

We often want to know if there is a relationship between the items in two different data sets.
Examples of this are:
10A

Is there a relationship between childrens ages and their heights?


Is there a relationship between peoples heights and weights?
Is there a relationship between students marks in an English examination and their marks
in a mathematics examination?

In each of the above, two pieces of information are to be collected from each person
in the investigation and then the two data sets are to be compared. When two pieces of
information are collected from each subject in an investigation, we are then concerned with
bivariate data.
A scatter graph or scatter plot is a type of display that uses coordinates to display values for
two variables for a set of data. The data is displayed as a collection of points, each having the
value of one variable determining the position of the horizontal coordinate and the value of
the other variable determining the position of the vertical coordinate.

Example 16

The age (in years) and height (in cm) of a group of people was recorded. The
data obtained is shown in the table below. Present the information in the table on
a scatter plot.

Person Age (years) Height (cm)


Alan 12 145
Brianna 14 140
Chiyo 15 160
Danielle 14 150
Ezra 10 130
Frankie 11 135

Solution

The variables under consideration are age and height. The horizontal axis represents
the age and the vertical axis represents height. The axes are broken to allow us to focus
on the data points.
(continued on next page)

CHAPTER 19 STATISTICS 321


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
160
C (15, 160)
155
150
A (12, 145) D (14, 150)
Height (cm)

145
140 B (14, 140)
135
F (11, 135)
130
E (10, 130)
125

10 11 12 13 14 15
Age (years)

In this scatter plot, it is noted that points towards the top-right of the plot represent
individuals who are older and taller. Points in the bottom-right represent individuals
who are older but shorter than the rest of the group. The bottom-left of the plot
represents people who are younger and shorter, while the top-left portion of the graph
represents individuals who are younger but taller than the rest of the group.
We can see from the general trend of the points, which is upward as we move to the
right, that the height of a child increases as the child grows older (for children in this
data set).

Example 17

The second-hand price and age of a particular model of car are recorded in the table
below, and the points plotted on a scatter plot.

Age of car Second-hand


(years) price ($)
1 22 000
2 19 500
2 18 700
3 16 400
3 17 000
3 16 800
4 15 800
4 15 950
5 14 800

3 22 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
6 12 500 25 000

Second-hand price ($)


6 12 000 20 000
6 12 800
15 000
7 12 200
10 000
7 11 580
8 10 500 5000
8 9200 0
0 2 4 6 8 10 12
8 8600 Age of car (years)
9 5700
10 4850
11 4500

a Describe the points in the top-left of the plot.


b Describe the points in the bottom-right of the plot.
c Describe the trend.

Solution

a The top-left of the scatter plot has points corresponding to relatively new second-
hand cars with higher prices.
b The bottom-right of the scatter plot has points corresponding to older second-hand
cars with lower prices.
c As the age of the car increases the value decreases.

Exercise 19G
Example 16 1 The table below gives the marks obtained by 10 students in a mathematics examination
and an English examination.

Mathematics mark 72 50 96 58 86 94 78 66 85 78
English mark 78 64 70 46 88 72 70 62 72 74

Represent this information on a scatter plot, using the horizontal axis to represent
the mathematics marks and the vertical axis to represent the English marks.
Break the axes so that the vertical axis starts near 40 and the horizontal axis starts
near 50.

CHAPTER 19 STATISTICS 323


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
2 The table below gives the average monthly rainfall, in mm, and the average number of
rainy days per month for twelve different cities in Australia.

Average rainfall (in mm) 161 175 142 90 96 71 62 41 33 93 96 126


Average number of rainy days 13 14 14 11 10 7 7 6 7 10 10 12

a Represent this information on a scatter plot. Use the horizontal axis to represent
average monthly rainfall and the vertical axis to represent the average number of rainy
days per month.
b Give a brief description of the relationship between rainy days and average rainfall.
3 The table below gives the amount of carbohydrates, in grams, and the amount of fat, in
grams, in 100 g of a number of breakfast cereals.

Carbohydrates (in g) 88.7 67.0 77.5 61.7 86.8 32.4 72.4 77.1 86.5
Fat (in g) 0.3 1.3 2.8 7.6 1.2 5.7 9.4 10.0 0.7

a Represent this information on a scatter plot. Use the x-axis to represent the amount
of carbohydrates and the y-axis to represent the amount of fat.
b Does there appear to be any relationship between the carbohydrate content and the
fat content?
Example 17 4 The table below gives the IQ of a number of adults and the time, in seconds, for them to
complete a simple puzzle.

IQ 115 118 110 103 120 104 124 116 110

Time (in seconds) 14 15 21 27 11 25 9 16 18

a Represent this information on a scatter plot. Use the x-axis to represent IQ and the
y-axis to represent the time taken to complete the puzzle.
b Is there any trend in the data?
5 The table below gives the number of kicks and the number of handballs obtained by each
player in an AFL team in a particular match.

Player 1 2 3 4 5 6 7 8 9 10 11

Number of kicks 3 20 7 19 7 6 2 9 7 26 3

Number of handballs 8 11 11 6 4 6 3 1 3 3 8

Player 12 13 14 15 16 17 18 19 20 21 22

Number of kicks 12 17 6 11 14 5 1 21 6 13 4

Number of handballs 4 5 0 3 8 3 0 11 0 17 11

324 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
a Represent this information on a scatter plot. Use the x-axis to represent the number
of kicks and the y-axis to represent the number of handballs.
b Does your scatter plot support the claim, the more kicks a player obtains, the more
handballs he gives? Explain your answer.
6 The table below gives the number of goals for (scored by the team) and the number of
goals against (scored by the opposing team) for each team in a soccer competition.

Team A B C D E F G H I J K L

Goals for 36 45 22 26 20 59 24 41 23 43 32 41

Goals against 31 16 33 26 64 16 53 42 47 21 49 14

a Represent this information on a scatter plot. Use the x-axis to represent goals for and
the y-axis to represent goals against.
b Use your scatter plot to answer the following questions.
i Which team is the best team in the competition? Why?
ii Which team is the worst team in the competition? Why?
iii Which of team J and team H is better? Why?
iv Which of team A and team C is better? Why?
7 The body mass and heart mass in grams of 14 small marsupials are recorded in the table
shown and the pairs plotted on the scatter plot.
170
Heart mass Body mass
160
Body mass (grams)

27 118 150
30 136 140
130
37 156
120
38 150 110
32 140 100
36 155 15 25 35 45 55
32 157 Heart mass (grams)

32 114
38 144
42 149
36 159
44 149
33 131
38 160

Describe what you can see from the data.

CHAPTER 19 STATISTICS 325


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
8 The scatter plot at the right gives information iv
ii v vi
about the height and weight of a number iii vii
of people. Annabelles height and weight is
represented by the point A. i A

Weight (kg)
viii

Height (cm)

Write down the point that represents each of the following people.
a Barry, who is heavier and taller than Annabelle
b Chandra, who is shorter but heavier than Annabelle
c Dario, who is the same height as Barry but a little heavier
d Edwina, who is shorter and lighter than Chandra
e Frederick, who is the same weight as Barry but a bit taller
f George, who is the same height as Annabelle but heavier
g Harriet, who is the same weight as Annabelle but shorter
h Ivan, who is the tallest person in the group
9 The scatter plot below gives the marks obtained by students in two tests.
Johns marks on the tests are represented by the point J.
iii
ii
iv
v

vi J
Test 2

i viii
vii

Test 1

Which point represents each of the following students.


a Alex, who got the top mark in both tests
b Bao, who got the top mark in Test 1 but not in Test 2
c Charlene, who did better in Test 1 than Angela, but not as well on Test 2

326 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
d Drago, who did not do as well as Charlene on either test
e Eddie, who got the same mark as John for Test 2, but did not do as well as John on
Test 1
f Francis, who got the same mark as John for Test 1, but did better than John on Test 2
g Georgina, who got the lowest mark for Test 1
h Harvir, who had the greatest discrepancy between his two marks
10 The test results of a group of 9 students is recorded in the table and plotted on a scatter
plot. A line has been drawn through the middle of the points.

Test 1 Test 2
100
53 54
90
70 67 80
53 55
Test 2

70
81 81 60
85 82 50

51 51 40

52 53 40 50 60 70 80 90 100
Test 1
76 78
75 77

The equation for this line is Test 2 = 0.95 Test 1 + 3.85.


a Use this equation to predict the Test 2 mark of a student if their mark on Test 1 was:
i 53 ii 54 iii 34 iv 84 v 67
b Use this equation to predict the Test 1 mark of a student if their mark on Test 2 was:
i 53 ii 54 iii 34 iv 84 v 67

CHAPTER 19 STATISTICS 327


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
19H Miscellaneous exercises

In this section, additional questions involving the mean, the median, the mode and the
standard deviation are given.

Exercise 19H
Give your answers correct to two decimal places.
1 The numbers 20, w, 21, x, 6, y, 11 and z have a mean of 11.
Find the mean of w, x, y and z.
2 a, b, c, d, e and f are distinct numbers given in increasing order.
Find an expression for the median.
1 2 1 1 1
3 For the set of numbers 2
, , , 1, , :
x x x x x2
a find an expression for the mean
b find an expression for the median, given that x > 2
4 a Prove that the sum of the deviations for the data set a, b, c is zero.
b Prove that the sum of the deviations of any data set is zero.
5 The mean of 10 numbers is 6 and the standard deviation is 1.
a If 15 is added to each of the numbers, what is the mean and standard deviation of this
new set of 10 numbers?
b If each of the 10 numbers is multiplied by 5, what is the mean and standard deviation
of this new set of 10 numbers?
6 The average weight of 5 boys is 72 kg and the average weight of 3 girls is 61 kg. What is
the average weight of the 8 children?
7 The numbers a, b, c, d, e and f have a mean of p and the numbers x, y and z have a mean
of q. What is the mean of the 9 numbers?
8 Five positive integers have mean 12 and range 18. The mode and median are both 8. Find
6 sets of five positive integers that satisfy these conditions.
9 Three integers, a, b and c, satisfy 0 a b c 20. The average of these three numbers
is 16. What is the smallest value that a can take?

328 ICE-EM MATHEMATICS YEAR 10 BOOK 2


ISBN 978-1-107-64845-6 The University of Melbourne / AMSI 2011 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.

También podría gustarte