Está en la página 1de 72

lecture 3

Data Description
Reference: Allan G. Bluman (2007) Elementary Statistics: A Step-by Step Approach.
New York : McGraw Hill
Objectives
Summarize data using the measures of
central tendency, such as the mean,
median, mode, and midrange.
Describe data using the measures of
variation, such as the range, variance, and
standard deviation.

Measures of Central Tendency
Do you remember:
A statistic is a characteristic or
measure obtained by using the data
values from a sample.
A parameter is a characteristic or
measure obtained by using the data
values from a specific population.
The Mean (arithmetic average)
The mean is defined to be the
sum of the data values divided by
the total number of values.
We will compute two means: one
for the sample and one for a
finite population of values.
The Sample Mean
The symbol X represents the sample mean
X is read as X bar The Greek symbol
is read as sigma and it means to sum
X
X X X
n
X
n
n
.
" - ".
"
=
+

+ +
=

" " ".


...
.
1 2
X
The Sample Mean - Example
. 9
6
54
=
6
12 + 14 + 12 + 5 + 8 + 3
= =

.
. 12 , 14 , 12 , 5 , 8 , 3


weeks
n
X
X
is mean sample The
sample this of age average
the Find and
are shelter animal an at kittens six of
sample random a of weeks in ages The
=

The Population Mean


The Greek symbol represents the population
mean The symbol is read as mu
N is the size of the finite population
X X X
N
X
N
N

. "

=
+

".
.
...
.
1 2
+ +
=

The Population Mean - Example


A small company consists of the owner the manager
the salesperson and two technicians The salaries are
listed as and
respectively Assume this is the population
Then the population mean will be
X
N
, ,
, .
$50, , , , , , , ,
. ( .)

=
=
50,000 +20,000 +12,000 +9,000 +9,000
5
=
000 20 000 12 000 9 000 9 000

$20,000.
The Sample Mean for an Ungrouped Frequency
Distribution
The mean for a ungrouped frequency
distribution is given by
X
f X
n


=
( )




The Sample Mean for an Ungrouped
Frequency Distribution - Example
The scores for students on a point
quiz are given in the table Find the mean score

.
25 4
.
Score, X Frequency, f
0 2
1 4
2 12
3 4
4 3
5
Score, X
0 2
1 4
2 12
3 4
4 3
5
Frequency, f
Score, X Frequency, f
f

X
0 2 0
1 4 4
2 12 24
3 4 12
4 3 12
5
Score, X

X
0 2 0
1 4 4
2 12 24
3 4 12
4 3 12
5
08 . 2
25
52
= = =

n
X f
X
The Sample Mean for an Ungrouped
Frequency Distribution - Example
Frequency, f
f
The Sample Mean for a Grouped Frequency
Distribution
The mean for a grouped frequency
distribution is
given
by
X
f X
n
Here X is the corresponding
class
midpoint.
m
m


=
( )



.
The Sample Mean for a Grouped Frequency
Distribution - Example
Given the table below find the mean , .
Class Frequency, f
15.5 - 20.5 3
20.5 - 25.5 5
25.5 - 30.5 4
30.5 - 35.5 3
35.5 - 40.5 2
5
Class
15.5 - 20.5 3
20.5 - 25.5 5
25.5 - 30.5 4
30.5 - 35.5 3
35.5 - 40.5 2
5
Frequency, f
The Sample Mean for a Grouped Frequency
Distribution - Example
Table withclass midpoints X
m
, .
Class Frequency, f X
m f

X
m
15.5 - 20.5 3 18 54
20.5 - 25.5 5 23 115
25.5 - 30.5 4 28 112
30.5 - 35.5 3 33 99
35.5 - 40.5 2 38 76
5
Class X

X
15.5 - 20.5 3 18 54
20.5 - 25.5 5 23 115
25.5 - 30.5 4 28 112
30.5 - 35.5 3 33 99
35.5 - 40.5 2 38 76
5
Frequency, f
m f
m


=
= .
=
=
f X
and n So
X
f X
n
m
m

= + + + +

=
54 115 112 99 76
456
17
456
17
2682 . .
The Sample Mean for a Grouped
Frequency Distribution - Example
The Median
When a data set is ordered, it is
called a data array.
The median is defined to be the
midpoint of the data array.
The symbol used to denote the
median is MD.
The Median - Example
The weights (in pounds) of seven
army recruits are 180, 201, 220,
191, 219, 209, and 186. Find the
median.
Arrange the data in order and select
the middle point.
The Median - Example
Data array: 180, 186, 191, 201,
209, 219, 220.
The median, MD = 201.
The Median
In the previous example, there
was an odd number of values in
the data set. In this case it is
easy to select the middle number
in the data array.
The Median
When there is an even number of
values in the data set, the
median is obtained by taking the
average of the two middle
numbers.
The Median - Example
Six customers purchased the following
number of magazines: 1, 7, 3, 2, 3, 4.
Find the median.
Arrange the data in order and compute
the middle point.
Data array: 1, 2, 3, 3, 4, 7.
The median, MD = (3 + 3)/2 = 3.
The Median - Example
The ages of 10 college students
are: 18, 24, 20, 35, 19, 23, 26,
23, 19, 20. Find the median.
Arrange the data in order and
compute the middle point.
The Median - Example
Data array: 18, 19, 19, 20, 20,
23, 23, 24, 26, 35.
The median, MD = (20 + 23)/2 =
21.5.
The Median-Ungrouped Frequency
Distribution
For an ungrouped frequency
distribution, find the median by
examining the cumulative
frequencies to locate the middle
value.
The Median-Ungrouped Frequency
Distribution
If n is the sample size, compute
n/2. Locate the data point where
n/2 values fall below and n/2
values fall above.
The Median-Ungrouped Frequency
Distribution - Example
LRJ Appliance recorded the number of
VCRs sold per week over a one-year
period. The data is given below.
No. Sets Sold Frequency
1 4
2 9
3 6
4 2
5 3
No. Sets Sold
1 4
2 9
3 6
4 2
5 3
Frequency
The Median-Ungrouped Frequency
Distribution - Example
To locate the middle point, divide n by 2;
24/2 = 12.
Locate the point where 12 values would fall
below and 12 values will fall above.
Consider the cumulative distribution.
The 12
th
and 13
th
values fall in class 2.
Hence MD = 2.
No. Sets Sold Frequency Cumulative
Frequency
1 4 4
2 9 13
3 6 19
4 2 21
5 3 24
No. Sets Sold Frequency Cumulative
Frequency
1 4 4
2 9 13
3 6 19
4 2 21
5 3 24
This class contains the 5th through the 13th values.
The Median-Ungrouped Frequency
Distribution - Example
The Median for a Grouped Frequency
Distribution



class median the of boundary lower L
class median the of width w
class median the of frequency f
class median the preceding immediately
class the of frequency cumulative cf
frequencies the of sum n
Where
L w
f
cf n
MD
can be computed from: median The
m
m


=


=



=


=


) (
)
2
(


=
+

=
The Median for a Grouped Frequency
Distribution - Example
Giventhetablebelow findthemedian , .
Class Frequency, f
15.5 - 20.5 3
20.5 - 25.5 5
25.5 - 30.5 4
30.5 - 35.5 3
35.5 - 40.5 2
5
Class
15.5 - 20.5 3
20.5 - 25.5 5
25.5 - 30.5 4
30.5 - 35.5 3
35.5 - 40.5 2
5
Frequency, f
Table withcumulative frequencies .
Class Frequency, f Cumulative
Frequency
15.5 - 20.5 3 3
20.5 - 25.5 5 8
25.5 - 30.5 4 12
30.5 - 35.5 3 15
35.5 - 40.5 2 17
5
Class Cumulative
15.5 - 20.5 3 3
20.5 - 25.5 5 8
25.5 - 30.5 4 12
30.5 - 35.5 3 15
35.5 - 40.5 2 17
5
The Median for a Grouped Frequency
Distribution - Example
Frequency, f
Frequency
To locate the halfway point, divide n
by 2; 17/2 = 8.5 ~ 9.
Find the class that contains the 9
th

value. This will be the median class.
Consider the cumulative distribution.
The median class will then be
25.5 30.5.
The Median for a Grouped Frequency
Distribution - Example
The Median for a Grouped Frequency
Distribution
= 17
=
=
= 20.5 = 5


( )
( ) =
(17 / 2) 8
4
= 26.125.
n
cf
f
w
L
MD
n cf
f
w L
m
m
8
4
25.5
25 5
2
5 25 5
=
=

+ +
.
( ) .
The Mode
The mode is defined to be the value
that occurs most often in a data set.
A data set can have more than one
mode.
A data set is said to have no mode if
all values occur with equal frequency.
The Mode - Examples
The following data represent the duration (in
days) of U.S. space shuttle voyages for the
years 1992-94. Find the mode.
Data set: 8, 9, 9, 14, 8, 8, 10, 7, 6, 9, 7, 8,
10, 14, 11, 8, 14, 11.
Ordered set: 6, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9,
10, 10, 11, 11, 14, 14, 14. Mode = 8.
The Mode - Examples
Six strains of bacteria were tested to see
how long they could remain alive outside
their normal environment. The time, in
minutes, is given below. Find the mode.
Data set: 2, 3, 5, 7, 8, 10.
There is no mode since each data value
occurs equally with a frequency of one.
The Mode - Examples
Eleven different automobiles were tested at
a speed of 15 mph for stopping distances.
The distance, in feet, is given below. Find
the mode.
Data set: 15, 18, 18, 18, 20, 22, 24, 24, 24,
26, 26.
There are two modes (bimodal). The values
are 18 and 24. Why?
The Mode for an Ungrouped Frequency
Distribution - Example
Giventhe table below findthe , . mode
Values Frequency, f
15 3
20 5
25 8
30 3
35 2
5
Values
15 3
20 5
25 8
30 3
35 2
5
Mode
Frequency, f
The Mode - Grouped Frequency
Distribution
The mode for grouped data is the
modal class.
The modal class is the class with
the largest frequency.
Sometimes the midpoint of the
class is used rather than the
boundaries.
The Mode for a Grouped Frequency
Distribution - Example
Giventhe table below findthe , . mode
Class Frequency, f
15.5 - 20.5 3
20.5 - 25.5 5
25.5 - 30.5 7
30.5 - 35.5 3
35.5 - 40.5 2
5
Class Frequency, f
15.5 - 20.5 3
20.5 - 25.5 5
25.5 - 30.5 7
30.5 - 35.5 3
35.5 - 40.5 2
5
Modal
Class
The Midrange
The midrange is found by adding the
lowest and highest values in the data
set and dividing by 2.
The midrange is a rough estimate of
the middle value of the data.
The symbol that is used to represent
the midrange is MR.
The Midrange - Example
Last winter, the city of Brownsville,
Minnesota, reported the following number of
water-line breaks per month. The data is as
follows: 2, 3, 6, 8, 4, 1. Find the midrange.
MR = (1 + 8)/2 = 4.5.
Note: Extreme values influence the
midrange and thus may not be a typical
description of the middle.
The Weighted Mean
The weighted mean is used when the
values in a data set are not all equally
represented.
The weighted mean of a variable X is
found by multiplying each value by its
corresponding weight and dividing the
sum of the products by the sum of the
weights.
The Weighted Mean
. ..., , ,
..., , ,
...
...
=

2 1
2 1
2 1
2 2 1 1
n
n
n
n n
X X X values the f or
weights the are w w w where
w
wX
w w w
X w X w X w
X
mean weighted The

=
+ + +
+ + +
Basic example
Given two school classes, one with 20
students, and one with 30 students, the
grades in each class on a test were:
Morning class = 62, 67, 71, 74, 76, 77, 78,
79, 79, 80, 80, 81, 81, 82, 83, 84, 86, 89,
93, 98
Afternoon class = 81, 82, 83, 84, 85, 86,
87, 87, 88, 88, 89, 89, 89, 90, 90, 90, 90,
91, 91, 91, 92, 92, 93, 93, 94, 95, 96, 97,
98,
The straight average for the morning class is 80 and
the straight average of the afternoon class is 90. The
straight average of 80 and 90 is 85, the mean of the
two class means. However, this does not account for
the difference in number of students in each class,
hence the value of 85 does not reflect the average
student grade (independent of class).
The average student grade can be obtained by
averaging all the grades, without regard to classes
(add all the grades up and divide by the total number
of students):



Or, this can be accomplished by weighting the class
means by the number of students in each class
(using a weighted mean of the class means):




Thus, the weighted mean makes it possible to find
the average student grade in the case where only
the class means and the number of students in each
class are available.


Distribution Shapes
Frequency distributions can assume
many shapes.
The three most important shapes
are positively skewed, symmetrical,
and negatively skewed.
Positively Skewed
X
Y
M o d e < M e d i a n < M e a n
P o s i t i v e l y S k e w e d
n
Symmetrical
Y
X
S y m m e t r i c a l
M e a n = M e d i a = M o d e
Y
X
N e g a t i v e l y S k e w e d
M e a n
Negatively Skewed
< M e d i a n < M o d e
Measures of Variation - Range
The range is defined to be the highest
value minus the lowest value. The
symbol R is used for the range.
R = highest value lowest value.
Extremely large or extremely small
data values can drastically affect the
range.
Measures of Variation - Population Variance
The variance is the average of the squares of the
distance each value is from the mean.
The symbol for the population variance is
( is the Greek lowercase letter sigma)
2
o o
o


=
=
=
2
=

( )
,
X
N
where
X individual value
population mean
N population size
2
Measures of Variation - Population Standard
Deviation
The standard deviation is the square
root of the variance.
=
2
o o

( )
.
X
N
2
Consider the following data to constitute
the population: 10, 60, 50, 30, 40, 20.
Find the mean and variance.
The mean = (10 + 60 + 50 + 30 + 40
+ 20)/6 = 210/6 = 35.
The variance o
2
= 1750/6 = 291.67.
See next slide for computations.
Measures of Variation - Example
Measures of Variation - Example
X
X -

(X -

)
2
10 -25 625
60 +25 625
50 +15 225
30 -5 25
40 +5 25
20 -15 225
210 1750
X
X -

(X -

)
2
10 -25 625
60 +25 625
50 +15 225
30 -5 25
40 +5 25
20 -15 225
210 1750
Measures of Variation - Sample Variance
The unbias ed estimat or of the population
variance o r the samp le varianc e is a
statistic whose valu e approxim ates the
expected v alue of a population variance.
It is deno ted by s
2
,
( )
,
where
s
X X
n
and
X sample mean
n sample size

=
=
2
2
1
=



Measures of Variation - Sample Standard
Deviation
The sample standard deviation is the squ are
root of t he sample variance.
=
2
s s
X X
n
=



( )
.
2
1
Shortcut Formula for the Sample Variance and
the Standard Deviation
=
=

X X n
n
s
X X n
n
s
2
2 2
2 2
1
1






( ) /
( ) /
Find the variance and standard
deviation for the following sample: 16,
19, 15, 15, 14.
EX = 16 + 19 + 15 + 15 + 14 = 79.
EX
2
= 16
2
+ 19
2
+ 15
2
+ 15
2
+ 14
2

= 1263.
Sample Variance - Example
=

1263 (79)
= 3.7
= 3.7

2
s
X X n
n
s
2
2 2
1
5
4
1 9



=
=
( ) /
/
.
Sample Variance - Example
For grouped data, use the class
midpoints for the observed value in
the different classes.
For ungrouped data, use the same
formula (see next slide) with the class
midpoints, X
m
, replaced with the
actual observed X value.
Sample Variance for Grouped and
Ungrouped Data
The sample variance for groupe d data:
= s
f X f X n
n
m m
2
2 2
1



[( ) / ]
.
Sample Variance for Grouped and
Ungrouped Data
For ungrouped data, replace X
m
with
the observe X value.
X f
f

X f

X
2
5 2 10 50
6 3 18 108
7 8 56 392
8 1 8 64
9 6 54 486
10 4 40 400
n =24 E
f

X =186
E
f

X
2
=1500
X f
f

X f

X
2
5 2 10 50
6 3 18 108
7 8 56 392
8 1 8 64
9 6 54 486
10 4 40 400
n =24 E
f

X =186
E
f

X
2
=1500
Sample Variance for Ungrouped Data -
Example
The sample variance and standa rd deviati on:
=
=
1500 [(186)

2
s
f X f X n
n
s
2
2 2
1
24
23
2 54
2 54 1 6



=
= =
[( ) / ]
/ ]
.
. .
Sample Variance for Ungrouped Data -
Example
Writing up your results
To do this, you need to identify your data
analysis technique, report your test
statistic, and provide some interpretation
of the results. Each analysis you run
should be related to your hypotheses.
Descriptive Statistics
Mean and Standard Deviation are most
clearly presented in parentheses:
The sample as a whole was relatively
young (M = 19.22, SD = 3.45).
The average age of students was 19.22
years (SD = 3.45).
Percentages are also most clearly
displayed in parentheses with no decimal
places:
Nearly half (49%) of the sample was
married.
assignment 1
Open adl.sav file
Make descriptive analysis for the first 5
variables
Interpret the results
The end
Greek Alphabet
The coefficient of variation is defined
to be the standard deviation divided
by the mean. The result is expressed
as a percentage.
Coefficient of Variation
CVar
s
X
or CVar
=
100% 100%. =
o

Example :
The mean of the number of sales of cars over a 3-
month period is 87 and the standard deviation is
5. The mean of the commissions is $5225 and the
standard deviation is $773. Compare the
variations of the two.
The coefficient of variation are
sales

commissions

Since the coefficient of variation is larger for
commissions, the commissions are more variable
than sales.
% 7 . 5 % 100
87
5
= = =
X
s
CVar
% 8 . 14 % 100
5224
773
= = CVar
Coefficient of Variation

También podría gustarte