Está en la página 1de 27

Kendall, M. G. (1970) Rank Correlation Methods, 4th ed.

London: Griffin
Statistical Methods for Environmental Pollution Monitoring, Richard O. Gilbert (1987)
http://www.swrcb.ca.gov/water_issues/programs/tmdl/docs/303d_policydocs/205.pdf

(Good intro, but lacks look-up table, pdf image cuts off last sentence on each page)
Myles Hollander and Douglas A. Wolfe (1999)
Nonparametric Statistical Methods, 2nd Edition
Wiley-Interscience
ISBN-10: 0471190454
ISBN-13: 978-0471190455
A User-Written SAS Program for Estimating Temporal Trends and Their Magnitude
http://www.sjrwmd.com/technicalreports/pdfs/TP/SJ2004-4.pdf

Techniques of Water-Resources Investigations of the United States Geological Survey


Book 4, Hydrologic Analysis and Interpretation
Chapter A3 Statistical Methods in Water Resources
By D.R. Helsel and R.M. Hirsch
http://pubs.usgs.gov/twri/twri4a3/pdf/twri4a3-new.pdf

Detecting Trends of Annual Values of Atmospheric Pollutants by the Mann-Kendall Test and Sens Slope Estimates
http://www.fmi.fi/kuvat/MAKESENS_MANUAL.pdf

Statistical Sirens: The Allure of Nonparametrics, Ecology 76(6), 1995, Douglas H. Johnson, pp. 1998-2000
http://www.jstor.org/pss/1940733

Why Kendall tau?


http://rsscse.org.uk/ts/bts/noether/text.html

Kendalls tau and Spearmans Rho


http://www.statisticssolutions.com/methods-chapter/statistical-tests/kendall-spearman-rank-correlation-coefficient/

Non-parametric Measures of Bivariate Relationships


http://www.unesco.org/webworld/idams/advguide/Chapt4_2.htm

Kendall's rank correlation


http://www.statsdirect.com/help/nonparametric_methods/kend.htm

(clearer description of how to handle ties)


Powerpoint on nonparametric time series
http://www.webs.uidaho.edu/envs541/Module_08/8_2.pdf

ndall Test and Sens Slope Estimates -The Excel Template Application Makesens

H. Johnson, pp. 1998-2000

ank-correlation-coefficient/

This Excel file has been designed to calculate a Mann-Kendall trend statistic for ten data points (i.e., ten years).
Instructions
Enter your data values into the green-highlighted cells C5:C14 of the sheet labeled "MannKendall"
Change the slide title (B1), Y-axis title (C4) and the year labels (if necessary).
If you have fewer than ten years of data, you must also
Enter the number of time periods (e.g., years) into cell C18.
Clear the contents of any irrelevant cells from D26 to L34.
That's it. The worksheet will calculate the Mann-Kendall S statistic (FYI, some authors refer to it as the K statistic).

Here is what the worksheet is doing:


For every n*n pair of values, subtract the value in yearK from the value in yearJ in all n(n-1)/2 cases/cells where yearJ
(Subtract the value on the left from the value on the top for all cells above the diagonal - top value minus left value for
Above the diagonal will be values for which the column value is from a later year than the row value, or yearJ > yearK

Count the number of n(n-1)/2 cells that yielded a positive value (result > 0) and put the count value in the first column
Count the number of n(n-1)/2 cells that yielded a negative value (result < 0) and put the count value in the first column
Count the number of n(n-1)/2 cells that yielded a zero value (tied values) and put the count value in the row at the bott
Sum all the plusses and all the minuses and subtract the total of minuses from the total of pluses.
S=number of cells with positive values minus the number of cells with negative values.
The sign of S indicates the slope of the trend (positive=upward, negative=downward).
If n>=10, then calculate variance and use the formula for the normal approximation of the probability of S
There are two formulae, one if there are no tied values and another if there are tied values.
If n<10, then use the lookup table, below, for the critical value of S for various values of n.
Either way, note that n>=5 is required to reach p < .05.

Evaluation
# Positive diffs
# Negative diffs
S
Variance(S) *
ZS **
Zcrit,.05
Interpretation

45.00
-35.00
80.00
0 (n(n-1))(2n+5)/18) (This formula may be conservative in the presence of tied valu
0.00
165 1.96 (positive or negative) is the critical value for Z, two-tailed, at p < .05. The one
-2.64689641

/~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/

Note: This variance formula assumes there are no tied values (i.e., no differences=0).
Tied values may reduce the validity of the normal approximation when the number of data values is close to 10.
If there are tied values, then the following formula with the correction factor for the tied values should be used.

Where q is the number of tied groups and t p is the number of data values in the p th group.

I'm not ENTIRELY sure, but I think in this infant mortality example there is actually ONE tied "group." It looks likeGilbe
of years that share the same value. I think (according to how I am interpreting Gilbert's example) thatour 6.1 value con
The value 6.1 occurrs in three different years.

Because we only had 10 years to begin with, and three are tied, we should use the lookup table to gauge the significa
According to the table, S must be at least 30 for significance at the p < .05 level, and 30 is what we have.
But I will also recalculate the variance with the correction factor.
So, based on Gilbert's example (see last sheet in this file), q=1 and t1=3.
Var(S) correction factor for tied values:
Our variance after correction:

= 3(3-1)((2*3)+5)) =
=125-102 =

102
23

Evaluation of Tied Data


# Positive diffs
45.00
# Negative diffs
-35.00
S
80.00
Variance(S)
-102 using correction factor for tied data
ZS
#NUM!
**
Zcrit,.05
1.96 1.96 (positive or negative) is the critical value for Z, two-tailed, at p < .05. The one
Interpretation

#NUM!

In any case, the correction factor decreases the variance, which increases the Z-score, and the likelihood of significan
So if there are tied values, and we do not use the variance formula with the correction factor, our test is conservative.
/~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/

**

The direction of Z indicates the direction of the trend. A positive(negative) value of Z indicates an upward(downward)
Formula for ZS:
if S > 0 then Z = S-1/SQRT(variance S)
if S = 0 then Z = 0
if S < 0 then Z = S+1/SQRT(variance S)
Some sources said the calculation for the normal approximation of the probability of S should only be used if n>=10,
but others said only when n>=40. But there was some ambiguity about the definition of n (#years versus
#values in the matrix). If nyears=10, then the number of values inside the matrix is n(n-1)/2, or 10*9/2 = 45.
So I'm thinking nyears>=10 is okay.
/~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/
Lookup Table for Significance of S:

n
Critical Values of Mann Kendall S Statistic for

5
10
15
20
25
30
35
40

Critical Value of S

Critical Values of Mann Kendall S Statistic for


alpha=.05 and Varying Values of N
180
160
140
120
100
80
60
40
20
0
5

10

15

20

25

30

35

40

Number of Years (data points)

/~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/
Information of the Power of Mann-Kendall S Test:
For MannKendall S to
yield a significance level of:
p < .10
it requires
4
or more data points (e.g., years).
p < .05
it requires
5
or more data points (e.g., years).
p < .01
it requires
6
or more data points (e.g., years).
p < .001
it requires
7
or more data points (e.g., years).

a points (i.e., ten years).

efer to it as the K statistic).

n(n-1)/2 cases/cells where yearJ > yearK


al - top value minus left value for each cell.
the row value, or yearJ > yearK.)

e count value in the first column to the right.


he count value in the first column to the right.
count value in the row at the bottom.

f the probability of S

ative in the presence of tied values.)


Z, two-tailed, at p < .05. The one-tailed value is 1.65.

data values is close to 10.

d values should be used.

NE tied "group." It looks likeGilbert defines a "group" as a group


s example) thatour 6.1 value constitutes one "group."

okup table to gauge the significance of S.


30 is what we have.

Z, two-tailed, at p < .05. The one-tailed value is 1.65.

e, and the likelihood of significance.


n factor, our test is conservative.

ndicates an upward(downward) trend.

S should only be used if n>=10,


of n (#years versus
n-1)/2, or 10*9/2 = 45.

critical value of S for alpha=.05

11
30
40
62
85
111
139
169

Infant Mortality. New Mexico, 1999-2009

y-axis title:

1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009

Deaths per
1000 Live
Births
6.62
6.42
6.06
5.40
6.28
6.07
5.68
6.14
5.03
4.74
5.00

7.0
6.0
Deaths per 1000 Live Births

Graph Title:

5.0
4.0
3.0
2.0
1.0
0.0

n=number of time periods


n= 11

Subtract each earlier year from each later year


1999
2000
2001
2002
6.6
6.4
6.1
5.4
6.6
-0.20
-0.55
-1.22
6.4
-0.36
-1.02
6.1
-0.67
5.4
6.3
6.1
5.7
6.1
5.0
4.7
5.0
# ties (diff=0):
0
0
0
0
year J:

year K:
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009

Source: NM Death Certifica

2003
6.3
-0.34
-0.14
0.21
0.88

If n>=10, then use the variance calculation to estimate probability.


If n<10, then use the table, below. Note: a significant p value is not possible with fewer than 4 time periods.
n>=5 is required to reach p<.05.
# Positive diffs
# Negative diffs
S

10.00
45.00
-35.00

Evaluation (Normal Approximation, N>=10)


Variance(S)
ZS
Zcrit,.05

165 =(n(n-1))(2n+5)/18) This formula may be conservative in the presence of tied values.
-2.65
1.96 (two-tailed. For one-tailed test use 1.65)
Sig. Decreasing

Interpretation

Evaluation (Lookup Table for Fewer Than 10 Years)


If S>=S-crit, then reject H0
S-crit (p<.05)
1-tailed

# Years
4
5
6
7
8
9
10

2-tailed
6
8
11
13
16
18
21

10
13
15
18
20
23

Infant Mortality. New Mexico, 1999-2009


7.0
6.0
5.0
4.0
3.0
2.0
1.0
0.0
1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

Source: NM Death Certificate and Birth Certificate Data. NMDOH Bureau of Vital Records and Statistics.

2004
6.1
-0.54
-0.35
0.01
0.68
-0.21

2005
5.7
-0.93
-0.74
-0.38
0.29
-0.60
-0.39

2006
6.1
-0.47
-0.28
0.08
0.75
-0.13
0.07
0.46

2007
5.0
-1.58
-1.39
-1.03
-0.36
-1.25
-1.04
-0.65
-1.11

2008
4.7
-1.88
-1.68
-1.33
-0.66
-1.54
-1.33
-0.94
-1.40
-0.29

S=
S=

10.00
-35.00

2009
5.0
-1.62
-1.42
-1.06
-0.40
-1.28
-1.07
-0.68
-1.14
-0.03
0.26

#+
0.00
0.00
3.00
4.00
0.00
1.00
1.00
0.00
0.00
1.00
10.00
minus

he presence of tied values.

2009

#10.00
9.00
5.00
3.00
6.00
4.00
3.00
3.00
2.00
0.00
45.00
45.00

New Mexico Infant Mortality Rate from 1999-2009

y-axis title:

2000
2001
2002
2003
2004
2005
2006
2007
2008
2009

Inf Deaths
per 1000
Live Births
6.6
6.4
6.1
5.4
6.3
6.1
5.7
6.1
5.1
5

7
Inf Deaths per 1000 Live Births

Graph Title:

6
5
4
3
2
1
0
2000

n=number of time periods


n= 10

Subtract each earlier year from each later year


2000
2001
2002
2003
6.6
6.4
6.1
5.4
6.6
-0.20
-0.50
-1.20
6.4
-0.30
-1.00
6.1
-0.70
5.4
6.3
6.1
5.7
6.1
5.1
5.0
# ties (diff=0):
0
0
0
0
year J:

year K:
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009

2004
6.3
-0.30
-0.10
0.20
0.90

If n>=10, then use the variance calculation to estimate probability.


If n<10, then use the table, below. Note: a significant p value is not possible with fewer than 4 time periods.
n>=5 is required to reach p<.05.
# Positive diffs
# Negative diffs
S

6.00
36.00
-30.00

Evaluation (Normal Approximation, N>=10)

Variance(S)
ZS
Zcrit,.05

125 =(n(n-1))(2n+5)/18) This formula may be conservative in the presence of tied values.
-2.59
1.96
Sig. Decreasing

Interpretation

Evaluation (Lookup Table for Fewer Than 10 Years)


If S>=S-crit, then reject H0
S-crit (p<.05)
1-tailed

# Years
4
5
6
7
8
9
10

2-tailed
6
8
11
13
16
18
21

10
13
15
18
20
23

New Mexico Infant Mortality Rate from 1999-2009

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

Year

2005
6.1
-0.50
-0.30
0.00
0.70
-0.20

2006
5.7
-0.90
-0.70
-0.40
0.30
-0.60
-0.40

2007
6.1
-0.50
-0.30
0.00
0.70
-0.20
0.00
0.40

2008
5.1
-1.50
-1.30
-1.00
-0.30
-1.20
-1.00
-0.60
-1.00

2009
5.0
-1.60
-1.40
-1.10
-0.40
-1.30
-1.10
-0.70
-1.10
-0.10

S=
S=

6.00
-30.00

#+

#0.00
0.00
1.00
4.00
0.00
0.00
1.00
0.00
0.00

9.00
8.00
4.00
2.00
5.00
3.00
2.00
2.00
1.00

6.00

36.00

minus

36.00

he presence of tied values.

Table A 30, Upper-tail Probabilities for the Null Distribution of the KendallK Statistic.
For N>10 use the approximation given in section 8.12 (of Hollander and Wolfe)
One-sided p = Prob [S x] = Prob [S x]
N = Number of time periods
N=3
N=4
N=5
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Tot # cells

0.5
0.3335
0.167

N=6

0.375
0.271
0.167
0.1045
0.042

0.408
0.325
0.242
0.1795
0.117
0.0795
0.042
0.02515
0.0083

10

N=7

N=8

0.5
0.43
0.36
0.2975
0.235
0.1855
0.136
0.102
0.068
0.048
0.028
0.01815
0.0083
0.00485
0.0014

0.5
0.443
0.386
0.3335
0.281
0.236
0.191
0.155
0.119
0.0935
0.068
0.0515
0.035
0.025
0.015
0.0102
0.0054
0.0034
0.0014
0.0008
0.0002

15

21

N=9

N=10

0.452
0.406
0.36
0.317
0.274
0.2365
0.199
0.1685
0.138
0.1135
0.089
0.0715
0.054
0.0425
0.031
0.0233
0.0156
0.01135
0.0071
0.00495
0.0028
0.00185
0.0009
0.00055
0.0002
0.0002
<0.0001

0.46
0.4205
0.381
0.3435
0.306
0.272
0.238
0.2085
0.179
0.1545
0.13
0.11
0.09
0.075
0.06
0.049
0.038
0.03
0.022
0.0172
0.0124
0.00935
0.0063
0.0046
0.0029
0.00205
0.0012
0.0008
0.0004
0.00025
0.0001
0.0001
<0.0001
<0.0001
<0.0001

28

36

0.5
0.4655
0.431
0.3975
0.364
0.332
0.3
0.271
0.242
0.216
0.19
0.168
0.146
0.127
0.108
0.093
0.078
0.066
0.054
0.045
0.036
0.0295
0.023
0.01865
0.0143
0.0113
0.0083
0.00645
0.0046
0.00345
0.0023
0.0017
0.0011
0.0008
0.0005
0.00035
0.0002
0.00015
<0.0001
45

0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05

The table was adapted from D. Helsel and R. M. Hirsch, Statistical Methods in Water Resources
Helsel and Hirsch cited Table A30 in Myles Hollander and Douglas A. Wolfe (1999)

Original Table from Helsel & Hirsch:


Table B8 -- Quantiles (p-values) for Kendall's S statistic and tau correlation coefficient
For N>10 use the approximation given in section 8.2.2
One-sided p = Prob [S x] = Prob [S x]
N = Number of time periods
4
5
x
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36

0.625
0.375
0.167
0.042

0.592
0.408
0.242
0.117
0.042
0.0083

0.548
0.452
0.36
0.274
0.199
0.138
0.089
0.054
0.031
0.0156
0.0071
0.0028
0.0009
0.0002
<0.0001

N = Number of time periods


3
6

9
0.54
0.46
0.381
0.306
0.238
0.179
0.13
0.09
0.06
0.038
0.022
0.0124
0.0063
0.0029
0.0012
0.0004
0.0001
<0.0001
<0.0001

Table from D. Helsel

http://pubs.usgs.gov/twri/twri4a3/pdf/endofreportnew.pdf
Statistical Methods
in Water Resources
By D.R. Helsel and R.M. Hirsch
http://pubs.usgs.gov/twri/twri4a3/pdf/twri4a3-new-11.pdf

x
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45

0.5
0.167

0.5
0.36
0.235
0.136
0.068
0.028
0.0083
0.0014

0.5
0.386
0.281
0.191
0.119
0.068
0.035
0.015
0.0054
0.0014
0.0002

10
0.5
0.43
0.36
0.3
0.24
0.19
0.15
0.11
0.08
0.05
0.04
0.02
0.01
0.01
0
0
0
0
0
<0.0001
<0.0001
<0.0001
<0.0001

Hollander and Wolfe)

Probability of S

0.03 RED
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03

These do not appear on the table in the textbook because they are impossible values - but they ARE possi
Significant at p<.05 (one-tailed test)
Significant at p<.05 (two-tailed test)

The Probability of Mann-Kendall "S" for N-Years 3 through 10


0.50
0.45
0.40
0.35
0.30
0.25

0.20
0.15
0.10
0.05
0.00
1

10

11

12

13

14

15

16

Value of S

17

18

19

20

21

This is from Helsel & Hirsch


S-crit (p<.05)
# Years
1-tailed
2-tailed
4
6
5
8
6
11
7
13
8
16
9
18
10
21
If S>=S=crit, then reject H0

<0.0001
<0.0001
<0.0001
<0.0001

10
13
15
18
20
23

This includes interpolated values (red text)


S-crit (p<.05)
# Years
1-tailed
2-tailed
4
6
5
8
9
6
10
12
7
12
14
8
15
17
9
17
20
10
20
23
If S>=S=crit, then reject H0

sible values - but they ARE possible if there are tied cells. We still need to figure out how to handle ties.

rough 10
N=3
N=4
N=5
N=6
N=7
N=8
N=9
N=10
p=0.05
p=0.025

21

22

23

24

25

26

27

28

29

30

Gilbert, 1987, on Tied Values in Mann-Kendall Test

23
23
24
0.1
6
0.1
24
24
0.1
23
# ties (diff=0):

24
1

0.1
-22.9
-23.9

6
-17
-18
5.9

0.1
-22.9
-23.9
0
-5.9

24
1
0
23.9
18
23.9

This is from the Gilbert (1987) article.


Gilbert says:
the number of tied groups=3 (!?)*
t1=2 for the tied value 23
t2=3 for the tied value 24
t3=3 for the tied value .1

*Does he mean the number of different/unique values with a tie?


There are five columns, above, with tied values. But the number of unique values that happen to h
matrix = 3. How on Earth am I supposed to ask SAS to do that!
0.1
0.1
0.1
6
23
23
24
24
24

The "value" of 23 happens twice


The "value" of 24 happens three times
The "value" of 23 happens three times

24
1
0
23.9
18
23.9
0

0.1
-22.9
-23.9
0
-5.9
0
-23.9
-23.9

23
0
-1
22.9
17
22.9
-1
-1
22.9

mber of unique values that happen to have ties in the

#+

#3.00
0.00
4.00
3.00
3.00
0.00
0.00
1.00

4.00
5.00
0.00
2.00
0.00
2.00
2.00
0.00

14.00

15.00

-1.00

También podría gustarte