Está en la página 1de 28

Module 3a

Descriptive Statistics: Numerical


Methods
Measures of Location
Percentiles and Quartiles
Measures of Variability

Learning Goals
1. Understand the purpose of measures of
location.
2. Be able to compute the mean, median,
mode, quartiles, and various percentiles.
3. Understand the purpose of measures
of variability.
4. Be able to compute the range,
interquartile range, variance, standard
deviation, and coefficient of variation.

Measures of location
The table on the right
contains excerpt from a
data set that contains
salaries for 474
employees at a
Midwestern bank.
We want to use
measures of location to
describe this data set.

Measures of Location
The following are measures of location:
Mean
Median
Mode
Percentiles
Quartiles

Mean
The mean of a data set is the average of all the
data values.
If the data are from a sample, the mean is
xi
denoted by
x
n

If the data are from a population, the mean is


denoted by (mu).
xi

Mean

xi 6,525,950
x

13,767.80
n
474

Median
The median is the measure of location
most often reported for annual income and
property value data.
A few extremely large incomes or property
values can inflate the mean.

Median
The median of a data set is the value in
the middle when the data items are
arranged in ascending order.
For an odd number of observations, the
median is the middle value.
For an even number of observations, the
median is the average of the two middle
values.

Median
Median
Median = 50th percentile
i = (p/100)n = (50/100)474 = 237
Because N is even, we average the 237 th
and 238th data values:
Median = (11,520 + 11,580)/2 = 11,550

Mean and Median Compared


Both the mean and median are supposed to be
measures of central location for the data. In the
case of this data set notice that the mean is
$2,217.83 more than the median (13,767.80
11,550).
Why is there such a large discrepancy?
Looking at the frequency distribution of current
salaries helps to explain why this discrepancy
exists.

When there are data values in a distribution that are much


smaller or larger than the others such that the distribution is
skewed, the mean may not be a good measure of central
tendency.
The histogram on the left shows
11550

the distribution of current salary.


Notice two vertical lines that run
from top to bottom with numbers
attached. The line on the left is
the median (11,550) and the line
on the right is the mean (13,768).

Percent

40%

30%

13768
20%

10%

10000

20000

30000

40000

CURRENT SALARY

50000

When the distribution, as in this


example, has a long tail that
extends to larger values (skewed
right) then the mean will be larger
than the median. If the distribution
has a long tail that extends to
smaller values (skewed left), then
the mean will be smaller than the
median. When the data is
symmetric (not skewed) then the
mean and median will be equal.

Mode
The mode of a data set is the value that
occurs with greatest frequency.
The greatest frequency can occur at two
or more different values.
If the data have exactly two modes, the
data are bimodal.
If the data have more than two modes, the
data are multimodal.

Mode
Example: Salary
In the salary example, the modal salary was
$12,300. This was the current salary of 14 of
the 474 employees included in this sample.

Percentiles
Recall how the median divided the sample into 2
equal parts half the observations are less than
the median and half are greater than the median.
There are other ways to split the sample on a
percentage basis: such as finding the value
where 10 percent of the observations are less
than that value and 90 percent are greater.
Admission test scores for colleges and
universities are frequently reported in terms of
percentiles.

Percentiles
The pth percentile of a data set is a value such that at
least p percent of the items take on this value or less and
at least (100 - p) percent of the items take on this value
or more.
Arrange the data in ascending order.
Compute index i, the position of the pth percentile.
i = (p/100)n
If i is not an integer, round up. The pth percentile is the value in
the ith position.
If i is an integer, the pth percentile is the average of the values in
positions i and i +1.

Note: There is no universally accepted method to calculate percentiles.


The method used in the book is not the same used in SPSS. For further
information is available at http://cnx.rice.edu/content/m10805/latest

Percentiles
Example: Salary (Book Method)
10th Percentile
i = (p/100)n = (10/100)474 = 47.4 = 48
the 48th data value:
10th Percentile = 8,400

Quartiles

Quartiles are specific percentiles


First Quartile = 25th Percentile
Second Quartile = 50th Percentile = Median
Third Quartile = 75th Percentile

Quartiles
Example: Salaries (Book Method)
Third Quartile
Third quartile = 75th percent
i = (p/100)n = (75/100)474 =
355.5 = 356
Third quartile = 14,820
Using SPSS
Statistics
CURRENT SALARY
N
Valid
Missing
Mode
Percentiles
10
25
50
75

474
0
12300
8400.00
9600.00
11550.00
14865.00

Notice how the value for


the 75th percentile
calculated using SPSS is
different.

Measures of Variability
Measures of location do not give us an idea of how
observations differ from each other.
Measures of variability quantify the spread or
dispersion of observations.
Choosing suppliers is an example of why this is
important In business. When choosing between
suppliers we might consider not only the average
delivery time for each, but also the variability in
delivery time for each.

Measures of Variability

Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation

Measures of Variability: the Range


The range of a data set is the difference
between the largest and smallest data
values.
It is the simplest measure of variability.
It is very sensitive to the smallest and
largest data values.
The value of the range does not tell us
anything about the variability of the values
between the largest and smallest values.

Measures of Variability: the


Interquartile Range
The interquartile range of a data set is
the difference between the third
quartile and the first quartile.
It is the range for the middle 50% of
the data.
It overcomes the sensitivity to
extreme data values.

Measures of Variability: the


Interquartile Range
Example: Salaries (Book Method)
Interquartile Range
3rd Quartile (Q3) = 14,820
1st Quartile (Q1) = 9,600
Interquartile Range = Q3 - Q1 =
14,820 9,600 = 5,220

Using SPSS
Statistics
CURRENT SALARY
N
Valid
Missing
Mode
Percentiles 10
25
50
75

474
0
12300
8400.00
9600.00
11550.00
14865.00

Interquartile Range = Q3 - Q1 =
14,865 9,600 = 5,265

Measures of Variability: the


Variance
The variance is a measure of variability that
utilizes all the data.
It is based on the difference between the value of
each observation (xi) and the mean ( x for a
sample, for a population).

Measures of Variability: the


Variance
The variance is the average of the squared
differences between each data value and the mean.
If the data set is a sample, the variance is denoted by
s2.
2
(
x

x
)

i
s2
n 1
If the data set is a population, the variance is denoted
by 2.
2
(x )
2

Measures of Variability: the Standard


Deviation

The standard deviation of a data set is the positive square root of the
variance.
It is measured in the same units as the data, making it more easily
comparable, than the variance, to the mean.
If the data set is a sample, the standard deviation is denoted s.

s s2

If the data set is a population, the standard deviation is denoted (sigma).

2
2

Measures of Variability: the Coefficient of Variation


The coefficient of variation indicates how large the
standard deviation is in relation to the mean. This
enables the comparison of the variability of different
variables.
If the data set is a sample, the coefficient of variation
is computed as follows:

s
(100)
x

If the data set is a population, the coefficient of


variation is computed as follows:

(100)

Example: Salary
Variance
s2

( xi x )2
n 1

46 , 652 , 519.97

Standard Deviation
s s 2 46,652,519.97 6,830.265

Coefficient of Variation
s
6,830.265
100
100 49.61
x
13,767.83

También podría gustarte