Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Then, upon taking a random sample X1, X2, ..., Xn, we are interested in testing the null hypothesis:
H0 : m = m0
against any of the possible alternative hypotheses:
HA : m > m0 or HA : m < m0 or HA : m ≠ m0
As we often do, let's motivate the procedure by way of example.
Example
Solution. We are interested in testing the null hypothesis H0: m = 3.7 against the
alternative hypothesis HA: m ≠ 3.7. In general, the Wilcoxon signed rank test procedure
requires five steps. We'll introduce each of the steps as we apply them to the data in this
example.
1 de 10 20-12-2015 00:31
https://onlinecourses.science.psu.edu/stat414/prin...
Step #2. In general, calculate the absolute value of Xi − m0, that is, |Xi − m0| for i = 1, 2,
..., n. In this case, we have to calculate |Xi − 3.7| for i = 1, 2, ..., 10:
Step #3. Determine the rank Ri, i = 1, 2,..., n of the abolute values (in ascending order)
according to their magnitude. In this case, the value of 0.2 is the smallest, so it gets rank
1. The value of 0.6 is the next smallest, so it gets rank 2. We continue ranking the data in
this way until we have assigned a rank to each of the data values:
2 de 10 20-12-2015 00:31
https://onlinecourses.science.psu.edu/stat414/prin...
Step #4. Determine the value of W, the Wilcoxon signed rank test statistic:
n
∑
W= Zi R i
i=1
3 de 10 20-12-2015 00:31
https://onlinecourses.science.psu.edu/stat414/prin...
Step #5. Determine if the observed value of W is extreme in light of the assumed value of
the median under the null hypothesis. That is, calculate the P-value associated with W, and
make a decision about whether to reject or not to reject. Whoa, nellie! We're going to
have to take a break from this example before we can finish, as we first have to learn
something about the distribution of W.
The Distribution of W
As is always the case, in order to find the distribution of the discrete random variable W, we need:
(1) to find the range of possible values of W, that is, we need to specify the support of W
(2) to determine the probability that W takes on each of the values in the support
n
Let's tackle the support of W first. Well, the smallest that W = ∑i=1 Zi Ri could be is 0. That
would happen if each observation Xi fell below the value of the median m0 specified in the null
hypothesis, thereby causing Zi = 0, for i = 1, 2, ..., n:
n n(n+1)
The largest that W = ∑i=1 Zi Ri could be is 2
. That would happen if each observation fell
above the value of the median m0 specified in the null hypothesis, thereby causing Zi = 1, for i =
1, 2, ..., n:
So, in summary, W is a discrete random variable whose support ranges between 0 and n(n+1)/2.
Now, if we have a small sample size n, such as we do in the above example, we could use the
exact probability distribution of W to calculate the P-values for our hypothesis tests. Errr....
first we have to determine the exact probability distribution of W. Doing so is very doable. It just
takes some thinking and perhaps a bit of tedious work. Let's make our discussion concrete by
considering a very small sample size, n = 3, say. In that case, the possible values of W are the
integers 0, 1, 2, 3, 4, 5, 6. Now, each of the three data points would be assigned a rank Ri of either
1, 2, or 3, and depending on whether the data point fell above or below the hypothesized median
m0, each of the three possible ranks 1, 2, or 3 would remain either a positive signed rank or
4 de 10 20-12-2015 00:31
https://onlinecourses.science.psu.edu/stat414/prin...
become a negative signed rank. In this case, because we are considering such a small sample size,
we can easily enumerate each of the possible outcomes, as well as sum W of the positive ranks to
see how each arrangement results in one of the possible values of W:
There we have it. We're just about done with finding the exact probability distribution of W when n
= 3. All we have to do is recognize that under the null hypothesis, each of the above eight
arrangements (columns) is equally likely. Therefore, we can use the classical approach to assigning
the probabilities. That is:
And, just to make sure that we haven't made an error in our calculations, we can verify that the
sum of the probabilities over the support 0, 1, ..., 6 is indeed 1/8 + 1/8 + ... + 1/8 = 1.
Hmmm. That was easy enough. Let's do the same thing for a sample size of n = 4. Well, in that
case, the possible values of W are the integers 0, 1, 2, ..., 10. Now, each of the four data points
would be assigned a rank Ri of either 1, 2, 3, or 4, and depending on whether the data point fell
above or below the hypothesized median m0, each of the three possible ranks 1, 2, 3, or 4 would
remain either a positive signed rank or become a negative signed rank. Again, because we are
considering such a small sample size, we can easily enumerate each of the possible outcomes, as
well as sum W of the positive ranks to see how each arrangement results in one of the possible
values of W:
Again, under the null hypothesis, each of the above 16 arrangements is equally likely, so we can
use the classical approach to assigning the probabilities:
Do you want to do the calculation for the case where n = 5? Here's what the enumeration of
possible outcomes looks like:
5 de 10 20-12-2015 00:31
https://onlinecourses.science.psu.edu/stat414/prin...
After having worked through finding the exact probability distribution of W for the cases where n =
3, 4, and 5, we should be able to make some generalizations. First, note that, in general, there are
2n total number of ways to make signed rank sums, and therefore the probability that W takes on a
particular value w is:
c(w)
P(W = w) = f (w) =
2n
where c(w) = the number of possible ways to assign a + or a − to the first n integers so that
∑ni=1 Zi Ri = w.
Okay, now that we have the general idea of how to determine the exact probability distribution of
W, we can breathe a sigh of relief when it comes to actually analyzing a set of data. That's because
someone else has done the dirty work for us for sample sizes n = 3, 4, ..., 12, and published the
relevant results in a statistical table of W [1]. (Our textbook authors chose not to include such a
table in our textbook.) By relevant, I mean the probabilities in the "tails" of the distribution of W.
After all, that's what P-values generally are, that is, probabilities in the tails of the distribution
under the null hypothesis.
6 de 10 20-12-2015 00:31
https://onlinecourses.science.psu.edu/stat414/prin...
Proof. Because the Central Limit Theorem is at work here, the approximate standard
normal distribution part of the theorem is trivial. Our proof therefore reduces to showing
that the mean and variance of W are:
n(n+1) n(n+1)(2n+1)
E(W) = 4
and Var(W) = 24
respectively. To find E(W) and Var(W), note that W = ∑ni=1 Zi Ri has the same distribution
of U = ∑ni=1 Ui where:
Ui = 0 with probability ½
Ui = i with probability ½
In case that claim was less than obvious, consider this intuitive, hand-waving kind of
argument:
[ 0 ( ) + i ( )] =
1 1 1 1 n(n + 1) n(n + 1)
∑ ∑ 2∑
E(W) = E(U) = E(Ui ) = i= × =
i=1 i=1
2 2 i=1
2 2 4
and:
n
∑
Var(W) = Var(U) = Var(Ui )
i=1
because the Ui's are independent under the null hypothesis. Now:
7 de 10 20-12-2015 00:31
https://onlinecourses.science.psu.edu/stat414/prin...
n(n+1)
′
∑ni=1 Zi Ri − 4
W =
‾n(n+1)(2n+1)
‾‾‾‾‾‾‾‾‾
√ 24
Example (continued)
Solution. Recall that we are interested in testing the null hypothesis H0: m = 3.7 against
the alternative hypothesis HA: m ≠ 3.7. The last time we worked on this example, we got
as far as determining that W = 40 for the given data set. Now, we just have to use what
we know about the distribution of W to complete our hypothesis test. Well, in this case,
with n = 10, our sample size is fairly small so we can use the exact distribution of W. The
upper and lower percentiles of the Wilcoxon signed rank statistic when n = 10 are:
Therefore, our P-value is 2 × 0.116 = 0.232. Because our P-value is large, we cannot reject
the null hypothesis. There is insufficient evidence at the 0.05 level to conclude that the
median length of pygmy sunfish differs significantly from 3.7 centimeters.
Notes
A couple of notes are worth mentioning before we take a look at another example:
n
(1) Our textbook authors define W = ∑i=1 Ri as the sum of all of the ranks, as opposed to
just the sum of the positive ranks. That is perfectly fine, but not the most typical way of
defining W.
8 de 10 20-12-2015 00:31
https://onlinecourses.science.psu.edu/stat414/prin...
(2) W is based on the ranks of the deviations from the hypothesized median m0, not on the
deviations themselves. In the above example, W = 40 even if x7 = 6.4 or 10000 (now that's
a pretty strange sunfish) because its rank would be unchanged. It is in this sense that W
protects against the effect of outliers.
Example
Assuming the distribution of the age of the onset of diabetes is symmetric, is there evidence to
conclude that the median age of the onset of diabetes differs significantly from 45 years?
Solution. We are interested in testing the null hypothesis H0: m = 45 against the
alternative hypothesis HA: m ≠ 45. We can use Minitab's calculator and statistical
functions to do the dirty work for us:
9 de 10 20-12-2015 00:31
https://onlinecourses.science.psu.edu/stat414/prin...
Because we have a large sample (n = 30), we can use the normal approximation to the
distribution of W. In this case, our P-value is defined as two times the probability that W ≤
200. Therefore, using a half-unit correction for continuity, our transformed signed rank
statistic is:
200.5 − ( 4 )
30(31)
W′ = = −0.6581
‾30(31)(61)
‾‾‾‾‾‾‾
√ 24
Therefore, upon using a normal probability calculator (or table), we get that our P-value is:
By the way, we can even be lazier and let Minitab do all of the calculation work for us.
Under the Stat menu, if we select Nonparametrics, and then 1-Sample Wilcoxon, we
get:
Links:
[1] https://onlinecourses.science.psu.edu/stat414/sites/onlinecourses.science.psu.edu.stat414/files/lesson48
/ExactW_Table.pdf
10 de 10 20-12-2015 00:31