Está en la página 1de 197

.

,--
\.UlS HENRIQUE z. BRANO)
1 OZB ~ lBlLCE • UNESP
c
... _
J>RIMER v6:
( .
I
"'
User Manual I Tutorial
..

••
Ij.:

K R Clarke & R N G orIey

~RlMER-E Ltd
a~~c~~~
- 2J oo~
luts HENRIQUE z. BAANco
DZB • IBILCE • UNESP
Plymouth_.
Routines
In
Multivariate
Ecological
Research

'··~
.. ._
·-~
:fr~
'~
'f •
.~.~-···

PRIMER v6:
User Manual/Tutorial

K R Clarke & RN Gorley

~RIMER-E Ltd
;
~
~-~)
. '
~·'
~·_,,

~··
~. ~

~:

~"
~J
Published 2006
~'.'.~·
by
PRIMER-E Ltd ~?
~~·
·•7\
Registered office: Plymouth Marine Laboratory
Prospect Place ~
West Hoe ~
Plymouth PLl 3DH ~
United Kingdom
~
~J
Business Office: 6 Hedingham, Gardens ~
Roborough ~)
Plymouth PL6 7DX
United Kingdom ~::
<~

Er Din.•i:ti)r~: K Robert Clarke MSc PhD Raymond N Gorley MA


1'<
~-·
~ ..
~-" -~

~-.'


~:;,
~'5
~
~
~)
Clarke, K.R., Gorley, R.N. 2006 ~
PRIMER v6: User Manual/l'utoria/
PRIMER-E: Plymouth ~
1 ~)

© Copyright 2006 PRIMER-E Ltd, all rights reserved


~
41
~;
Original paintings for cover and CD by Greg Millwood, Blue Peace Gallery, Barbican, •. ;:::"

Plymouth, UK (www.bluepeace.co.uk) ~;

~.:

~
~.
~~;
~l
¥..r
.;.'!' l.uts HENRIQUEZ. BAANco
'f'
·~f'
l PRIMER v6: User Manual/Tutorial DZB • IBILCE • UNESP
page
~ Contents OVERVIEW
.:(!A A. Contact details and installation of the PRlMER v6 software
Getting in touch with us 7
~ System requirements 7
:;;_f!i Installing PRIMER 7
·~~ Getting help 7
·-ft B. Introduction to the methods of PRIMER
./:-
·~ Application areas
Basic routines
8
8
C. Changes from PRIMER 5 to PRIMER 6
Technical 9
Interface ., 9
Enhanced graphics 10
Additional analyses 10
D. Typographic conventions for this manual
Emphases and text symbols 12
Finding your way around 12
E. A brief tour through the operation of PRIMER v6
Reading in the examples 13
Reading in non-v6 files 13
Entering data directly in v6 13
Adding factors or indicators 13
Pre-treatment of data 13
Working with resemblances 14
Creating plots 14
Exploring the workspace 14
Selecting data subsets 15
Tools menu 15
Results windows 16
Analyse menu 16

MANUAL/TUTORIAL
1. Opening, editing and saving data (File, Edit)
Finding the examples 17
PRIMER file types 17
Opening the PRIMER 6 desktop 17
Entering data directly 18
Labelling samples & variables 18
Deleting & inserting rows/cols 19
Moving & sorting rows/cols 19
iS'~
.... ,.~~~···· Cut/copying & pasting 19
Saving data & warning prompts 20
Saving, closing & opening a workspace 20
::~ Setting the initial directory 21
Opening PRIMER files 21
•""'
·:·~
" (Ekofisk oil-field fauna) 21
Properties 22
"··~ Opening Excel files 22
CfA Missing or zero values? 23
23
:~;~ (Ekofisk abiotic data)
24
{Tasmanian meiofauna)
(~ Opening several files at once 24
~../
./If
~
~,
~
!-'. ... - .
Opening the same file twice· .' . :i 24
Text-format input files 25
Entering (& merging) large arrays 27 ~-
Output data formats 27 ~>
..,
Handling PRIMER 5 & 4 files 27 ~I
2. Factors (and lndicat9rs), identifying sample (and species) groups ~·l;
I.,
Active window 29
~--'
Use of factors 29
~
Creating factors (for samples)
Multiple PRIMER sessions
More Edit>Fill Down options
30
31
31
,,,
32 ~
Closing a PRIMER session
Combining factors (e.g. to average) 32 ~-
Factor keys 33 ~; '

Importing..factors 34 ~:. ~

Label matching 34
Importing/exporting factors for .xls/.txt files 34 ~
35
Importing factors from old v4 files
Creating indicators on variables 35
'4
~-
Indicators in selection
Aggregation files (versus indicators)
3. Pre-treatment options
36
36 J.
~::i
Standardising
(W Australia fish diets)
37
37 ~
~)
Stats to worksheet 38 "",
Transforming (overall) 38 ~j

Normalising variables 39 ~;_;'


Dispersion weighting of species 40
(Fal estuary copepods) 40 ~~
RenameData 41 ~;
.....
Recent Items 41 ~·.J
Other variable weighting 42 ~:·~
.._~
Mixed data types 42
Cumulating samples 42 ~)
--~
(Particle sizes for Danish sediments) 42 ~--~
4. Resemblance: similarities, dissimilarities and distances ~~
Resemblance matrices 43 ~-.~
Standard resemblance choices 43 •
Bray-Curtis similarity
Zero-adjusted Bray-Curtis
43
44
,.,"'· '

Euclidean distances 45 ~·;


Accessing other resemblance measures 46
Distance measures 46 ~) --'
Similarity to dissimilarity 47 ~)
Quantitative similarity measures 47 ~-\
_ .
Pres/Abs similarity measures 48
Taxonomic distinctness/aggregation files 49 ~
Taxonomic dissimilarity measures so ~~
·'
(Groundfish of European shelf waters) . so ~
Timing bar, interrupts, multi-tasking & multi-processors 51
Analysing between variables 52 ~)

Correlation between variables ·,.. . 52 ~
(Biomarkers from N Sea flounder) 53 ~)
Saving & opening triangular matrices 54
Other coefficients 55 ~~
Between-curve distances 56 ~~
JI
~;
2
~
~-,
,....t'
,
~
~~::-
5. Hierarchical clustering (CLUSTER, SIMPROF)
~
,-;, Clustering & linkage choices
·;.,.~ 57
SIMPROF tests 57
r.~
•; Modifying plots in PRIMER 57
I~
·~
(Exe estuary nematodes) 57
Data labels & symbols menu
~
58
Symbol & text sizes 58
,:;.•.
\.~ Editing plot titles/scales 59
_,....,
General plot menu 59
·~
,.-... Edit/create new factors from plots 60
'.~ Special menu for slicing & orientation of dendrograms 60
:~',..
.. ; Rotating & condensing dendrograms 61
:~"'
Ordering factor levels in keys 62
Zooming dendrograms 63
~ SIMPROF method 64
(Bristol Channel zooplankton)
~f!i 65
CLUSTER results window 66
~
.. SIMPROF direct run 67
.f!t Histograms of null distributions 68
SIMPROF on a subset of samples
~ 68
6. Managing the workspace. (Window, View, File)
'~ Explorer tree 69

~
'\~
" Closing, redisplaying & tiling windows
Minimising 'windows
View menu
69
70
70
>~
Understanding the Explorer: tree 70
Deleting items from a workspace 71
~-~
Rolling up branches of the tree 71
~~ Renaming items 71
Saving plots 71
~ Vector vs. pixel plots 72

='f!" " Saving graph values


Saving results
Adding notes
73
73
73
·--~
\ Printing results and graphs 74
~ Workspace planning 74
~·~
7. Non-metric multi-dimensional scaling (MDS)
.:.·fl' MOS rationale 75
~1. '
"·~
Running an MOS 75
MOS results window 76
~
.-""'
->~
Shepard diagrams
Accuracy and fit scheme
Ordination plots in 2-d
76
77
77
.~~
Rotating and flipping the ordination 78
Linking MOS plots to cluster analysis 78
. fPA Special menu for configuration plots 79
Jfrto. Cluster overlays on MOS plots 79
•..~

(Clyde dump-ground macrofauna) 80
Overlay trajectory 80
,·-f' (Ekofisk oil-field study) 81
:-ft' Species bubble plots 82
~
Duplicating graphs 82
( ..~ Environmental bubble plots 83
(~~
.. Plot values on bubbles 84
Changing bubble scale & colours 85
~ 3-d MOS graphs 85
~

"'
O"
CIP'
3
~.'\
~

~
~
Zooming 3-d MDS plots 87 ~1.,
Rectangular zoom for 2-d MDS plots 88
88 ~)
MOS subset plots ('"
~.:.
8. Highlighting and selection (Select)
Highlight and select 89 ~:
Selection by highlighting 89 ~"·
Duplicating a selected worksheet 90 r,
~,
Deselecting 91
Selecting by factor levels 91 ~
Multiple selections 92

Selecting by number 92 ,.,__;
Selecting variables 93 ''

Selecting by 'most important' 93 ~:


Excluding missing data 94 ~·
Selection i!l resemblance matrices 94
~
9. General data manipulation (Tools) r-
Tools v Edit menu 95 ~·
Average and Sum for samples 95 ~

-
'-·
Average and Sum for variables 97 ~·
Aggregation 97
~
Check on aggregation files 98
99 ~-·
-
Tree menu
Check on datasheets and resemblances · 99
Undefined resemblances 99 ~· ...
~:;
Duplicating resemblances 100 f...
Merge (/join) operations 100 ~.'
.,

Combined cells in Merge 101 ~:


Avoiding strict label matching 102 l''Jtl/~r
Merging non-unifonn species lists 103 .~,
(Phuket data on coral transects) 103 ~;

Transposing the datasheet


Transfonn (individual)
104
104
dif' .... '

Transfonn expressions 104 ~j


Expressions combining variables 105 ~)
Expressions combining worksheets 106 ~,
Average body mass· matrix (B/A) 107
Transfonn on resemblances 108 ~)
Ranked variables 108 ~l
;1
e!
Ranked resemblances 109 ~)
Tools Options menu 110
~)
10. Analysing environmental variables (Draftsman Plot, PCA) ~)
Environment-type data 111 ~

Draftsman plots 111 ~:,


I=.
Principal Components Analysis 113 ~-·
PCA eigenvector plot 114
PC scores 115 ~~·
PCA plot options 115 ~·
...
Multiple 2-d and 3-d plots 116 ~_;\
Interpreting PCA v MOS pairwise plots 117 ~)
PCA of data on biomarkers 118
Missing data estimation 119 ~)
EM algorithm assumptions 119 ~
11. Linking assemblage to environment (BEST: Bio-Env, LINKTREE) ~
BEST rationale 121
121 ~
Bio-Env v. BVStep
~I

4 'i:"
~
ttiji
General and BioEnv tab choices 121
(Messolongi diatoms & abiotic data) 122
Global BEST match test 124
Linkage trees - rationale 125
Non-metric, non-linear, non-additive 126
LINKTREE on lagoon diatom study 126
SIMPROF test in LINK.TREE 128
Missing data in LINK.TREE 128
12. 'Analysis of Similarities' and species contributions (ANOSIM, SIMPER)
ANOSIM introduction 129
1-way layout (WA fish diet example) 129
Paiiwise comparisons 130
Other 1-way ANOSIM options 131
1-way layout (Biomarkers example) 133
2-way crossed ANOSIM (Tasmanian crabs study) 133
2-way c.rossed ANOSIM (Danish sediment data) 135
2-way nested ANOSIM (Calafuria macroalgae) 136
ANOSIM test for the unreplicated 2-way layout 138
Contributions of variables to similarity (SIMPER) 140
Species discriminating two groups 140
Species typifying a group · 141
SIMPER for the 2-way crossed layout 141
13. Further matching of multivariate patterns (RELATE, 2STAGE, BEST, MVDISP)
RELATE on resemblance matrices 143
Model matrix construction 143
RELATE hypothesis test 144
Seriation on Phuket coral transects 144
RELATE test on two biotic arrays 145
'Seriation with replication' test 146
Other Model Matrix options 148
Cyclicity tests · 148
Rationale for 2nd stage MOS 149
(Morlaix macrofauna, Amoco-Cadiz oil spill) 150
2nd stage MOS plots 150
2STAGE to compare resemblance coefficients 152
Conclusions on comparing resemblance coefficients 153
2STAGE from a single similarity matrix 153
2STAGE for time series and repeated measures 155
Ideas for other BEST applications 155
BVStep stepwise selection 156
Species sets 'explaining' the overall pattern 156
BVStep on Morlaix oil-spill data 157
BVStep starting and stopping options 157
BVStep from random starts 158
Multivariate dispersion MVDISP 160
(Mesocosm experiment, Solbergstrand copepods) 160
14. Biodiversity measures and tests (DIVERSE, TAXDTES1)
Input/output for diversity 161
Presentation of diversity information 161
Taxonomic distinctness 161
Standard indices calculated 161
Multivariate analysis of diversities 163
(Bermuda macrofauna) 164
Caswell' s neutral model 165

5
Range of relatedness indices calculated 165
Specifying aggregation sheets 166
Weighting of tree step lengths 166
Taxonomic distinctness (groundfish suiveys) 167
Tests for taxonomic distinctness 168
TAXDTEST (on.European groundfish) 169
Histograms for one sublist size 169
'Funnels' for a range of sublist sizes 170
Using taxon frequency in simulations 171
'Ellipses' for joint values of(A+, A1 171
15. Species curves (Geometric Class, Dominance and Species-Accumulation Plots)
Range of diversity curves · 173
Geometric class plots · ···· :- · 173
Dominance curves . .. ! •. 174
(L. Linnhe macrofauna time series) · 174
k-dominance, ordinary & partial plots 174
Abundance-Biomass Comparison curves 175
Testing fork-dominance curves 177
(Tikus Is coral cover) · 178
(Loch Creran contiguous macrofauna cores) 179
Species accumulation plots 179
S estimators 180
Acknowledgements 181
Index to data sets 181
General index 182

-
~-'
i;
~:

~~
~:
~
.
~-
~~

~
~


~
~
mf
~~~

~
~
~
~
~ ...
6

-
~.
A. Contact details

OVERVIEW
A. Contact details and installation of the PRIMER v6 software
Getting in For any up-to-date news about PRIMER, including details of upcoming PRIMER workshops, see
touch with us our web site at: http://www.primer-e.com .
Please report any bugs, technical problems, dislikes or suggestions for improvement to Ray Gorley
., at: tech@primer-e.com
·J!ti
'.. ~
For licensing and other general enquiries, contact Cathy Clarke at: admin@primer-e.com
-~ Use this latter e-mail address to contact Bob Clarke, for queries related to the scientific methods.
f:~f!i
Our business postal address is:
:~=~
(-, PRIMER-E Ltd
6 Hedingham Gardens
Roborough
·~ Plymouth PL6 70.X
:::~ United Kingdom
Tel/Fax: +44 (0)1752 783366
(or you may, if you prefer, use the registered address of the company: PRIMER-E Ltd, Plymouth
Marine Laboratory, Prospect Place, West Hoe, Plymouth PLl 3DH, United Kingdom)
System PC with Intel compatible processor
requirements Windows 98/2000/Me/NT4/XP possible, but Windows 2000, XP or later recommended
Memory. For Windows 98 and Me: 64 Mb (but at least 128 Mb recommended). For Windows
2000, XP or later: 128 Mb (256 Mb recommended)
Internet Explorer 5.5 or later is needed to install the .Net environment
To be able to read and write Excel worksheets you need Excel 2000, or later, installed on the PC

i!~~ Installing This is a stand-alone product for installation on an individual PC, not a network version. You need
r~ff~ PRIMER to be logged on as Administrator for WinNT, 2000, XP or later. If you have an earlier PRIMER v6
'i;'& installed (including a beta-test version), you first need to uninstall it, using Add or Remove
'<il'C''
Programs from the Control Panel. You do not need to uninstall PRIMER v5 (or earlier versions).
1~ Inseh the PRIMER CD in your CD drive. The install program will automatically run unless you
..........
'·x:·· ····

·- have disabled autorun on your drive. (If it does not run, open Windows Explorer, select the CD
·.~
·-
drive and double click on Setup.exe.) The install program will install the .Net environment on your
,-·f!t computer if it is not already up to date. You will be asked for your serial number and key which is
'•··
.:·-A. on the front of the CD case. Setup may reboot your system during the installation and, as usual, it
\~'\L·'•
is advisable to close down all other programs before commencing installation.
t~
·~'~, ...•

GP Getting help General information about the techniques underlying the analyses are found in the accompanying
Methods manual: Clarke KR & Warwick RM, 2001, 'Change in Marine Communities', 2nd edn,
·:~ PRIMER-E Ltd, Plymouth, with the newer techniques covered in the tutorial components of this
• User manual. Information specific to PRIMER 6 is contained in this manual and the software Help
.:';,~
system. The latter is context sensitive: if you click on the Help button in a dialog box then you will
get an appropriate help topic. You can also get into the help system by choosing the Help menu
option or clicking on the help button j"f t on the Tool bar. You can then browse this system via the
Contents or Index tabs in the Help window. If you are still having problems, contact the staff at
PRIMER-E (see the top of this page), who will be happy to help.
·· . ~ ·:.~ · -. ·..,:.~::·-.-~ -.~.~-~::~r~·~~{~#}~J:~·::~r~·qf~~~?~~~~~~;~~1i~~;.v}fi~~~~tf·F·~'!_~·:~:>> ·
If you are familiar with P~R. S,:Jt is~~ugge~te~ Y9.~rea~ ~ection C, on the_ many enhancements
!--~
in PRIMER 6, and then try 9ut the.
so~~e. on.·some, 9( th~ Examples installed ~ith the software.
I
·~
,
New users should starfwith' Secdons:,B,"I)~ancfl{·an~ftiieil"work selectively through the _detai~ed
'~."f!A
material from page 17. This . ls\viitfen~·in:·the'~tYle.off~ontiD.uous Tutorial but functions also a as
:···~
detailed Reference manual, 'utilising.the' Co~tents'pages '(1-6) and the General inde~ <~.8.~-1.9.9.?.·.
Analyses of specific examples'can b~' trac~e~ yia direct~ons iti the margins or th~ data ~~~~ .~-~~~~:tt
·-,~- ~ :.::...•''·~~·-~t·ll·•··-"'··
.. ,.~
' • •• ·:··. •• - ., • .
• ... _· ... I. • •

Cr' 7
(?'
CJ!'
~-r .....--.....-------------·- --..
~
~)
B. Introduction ·
~,i
IB. Introduction to the methods of PRIMER· I ~~·;
~-.
Application PRIMER 6 (Plymouth Routines In Multivariate Ecological Research) consists primarily of a wide
areas range of univariate, graphical and multivariate routines for analysing arrays of species-by-samples ~·
data from community ecology. Data are typically of abundance, biomass, % area (or line) cover, ~.
presence/absence etc, and arise in biological monitoring of environmental impact and more
fundamental studies, e.g. of dietary composition. Also catered for are matrices of physical values
~ ,
and chemical concentrations, which are analysed in their own right or in parallel with biological
assemblage data, 'explaining' community structure by physico-chemical conditions. The methods
of this package make few, if any, assumptions about the form of the data ('non-metric' ordination
"'
~
~,
and permutation tests are fundamental to the approach) and concentrate on approaches that are
straightforward to explain. This robustness makes them widely applicable, leading to greater ~~;·
confidence in interpretation, and the transparency possibly explains why they have been adopted ~>
worldwide, particularly in marine science but increasingly in terrestrial, frelhwater, palaeontology
~;
etc contexts. The statistical methods underlying the software are explaine~ in non-mathematical
terms in the accompanying 'methods' manual, which also shows outcomea from many literature "~':
.~,
studies, e.g. of environmental effects of oil spills, drilling mud disposal aq~ sewage pollution on ~:~
soft-sediment benthic assemblages, disturbance or climatic effects on coral reef composition or fish 1':'1
communities, more fundamental biodiversity and community ecology patterns, mesocosm studies "J ..

with multi-species outcomes etc. Many of the methods manual data sets, apd .all the user manual
data, are included with the package so that the user can replicate the analyses~
\'
~.'
T.~10ugh the analysis requirements for biological assemblage data are a princtpaJ focus, the package ~-··
is equally applicable (and increasingly .being applied) to other data structures which are either
multivariate or can be treated as such. These include: multiple biomarkers' in ecotoxicology, and
their relation to water or tissue concentrations of chemical contaminants; composition of substrate
in geology or materials science; morphometric measurements in taxonomy; genetic studies, e.g.
involving presence or absence of specific sets of alleles; signals at multiple ~avelengths in remote
sensing, which characterise vegetation or water masses, etc. Univariate measurements which can
'
,,:.
.~"
~::

~~;
sometimes be treated more effectively in· multivariate fashion include particle size analysis for ~ ...
.~
water or sediment samples and size frequency distributions of organisms in cohort studies (the ~ : ..
multivariate variables are the discrete particle or organism size classes). Sets of growth curves for 1'.~.
individual organisms, monitored through time ('repeated measures') are also readily analysed: the
unifying feature is that all data sets are reduced to an appropriate triangular matrix representing the ~c
(dis)similarity of every pair of samples, in terms of their assemblages, suites ofbiomarkers, particle ~·
size distributions, shape of growth curves, etc. Clustering and ordination techniques are then able to
display the relationships between the samples, and permutation tests impose a necessary hypothesis ~ "
testing structure. To demonstrate the range of application areas, references to ISi-listed publications ~..
which cite earlier versions of the PRIMER methods manual (and one of the core methods papers, ~--

Clarke, KR, 1993. Non-parametric multivariate analyses of changes in community structure. Aust. ~ ...
J. Ecol. 18: 117-143), can be downloaded from the PRIMER-E website (www.primer-e.com). ~-·
~·.·
Basic The core routines of the package cover: hierarchical clustering into sample (or species) groups ~, ..

routines (CLUSTER); ordination by non-metric multidimensional scaling (MDS) and principal components
(PCA) to summarise patterns in species composition and environmental variables; permutation- ~~..
based hypothesis testing (ANOSIM), an analogue of univariate ANOVA which tests for differences ~~;1
between groups of (multivariate) samples from· different times, locations, treatments etc; identify- ~:~
....
~-

ing the species primarily discriminating between sample groups. (SIMPER); the linking of multi- ~·,
~-·1/
variate biotic patterns to suites of 'environmental variables or other biotic arrays (BEST and nJAf!..
LINKTREE); comparative (Mantel-type) tests on similarity matrices (RELATE); a second-stage J~)

MDS routine in which relationships between a large set of ordinations can be visualised (2STAGE);
'~
standard diversity indices; dominance .plots; geometric abundance distributions; species accumul-
ation estimators; aggregation of arrays to allow data analysis at higher taxonomic levels, etc. A ~
;..t
further unique feature of PRIMER is the ability to calculate and test biodiversity indices based on the -~:)
taxonomic distinctness of the species making up a quantitative sample or species list, indices whose 4~1
statistical properties are robust to varying richness. These permit testing for change in biodiversity
{TAXDTEST), by comparison with a regional 'species pool', so that diversizy patterns over wide ~:
~·,
space/time scales are comparable when sampling effort is uncontrolled.
8 ~

~/
~!. ,•. _. ~· . . .•• ~·~... ~.-· •.

.:..c. Changes vS to;> v6• ~

IC. Changes from PRIMERS to PRIMER 6


• : t. " ... f

1) .Moved entirely into the new ~crosoft .NET environment, giving the software a fully modem
Wmdo~s appearance, safeguarding future growth paths and opening the way to diversification
onto different platforms (UNIX, Mac) in future. (Currently v6 is only available in a Windows
environment, though this could be a Virtual Windows environment on an Apple Mac, for example).
2) ~eplacem~nt of inefficient and .inflexible graphics and grid controls (e.g. Graphics Server), by
n~tlve code m .NET, h~ greatly unproved 'look and feel' of ~e graphics, which are now fully
tailored to the needs. This has also cut through speed bottlenecks m handling large spreadsheets.
3) No fixed size constraints on data matrices or group sizes for any analysis. The limitations are
imposed only by total available RAM. Windows 2000 or later, and 256Mb RAM, recommended.
No longer possible to run on Windows 95 •.
4) Speed gains of around a factor ofS in most of the heavily-computational algorithms (MOS, Bio-
Env searches, permutation tests etc); larger speed gains in manipulating data windows, especially
for very large data sets (e.g. opening and closing factor windows).
5) Now fully muiti-tasking. You can start' several very long analyses and continue doing other
things in the workspace. This takes full advantage of multi-processor systems (different tasks will
automatically be allocated to the least loaded processor).

6) The workspace now displays an 'Explorer tree' which can be navigated to recall instantly any of
the derived worksheets, plots or results windows. This is not solely a display: the tree structure is
used internally to pass infonnation between related data sheets or plots (e.g. new factors defined
from a CLUSTER dendrogram are back-propagated through the similarity matrix to the original
data matrix, and forward-propagated to an existing MDS plot from that same similarity sheet).
Results windows are now local to each analysis, making it easier to find results at a later stage.
7) Workspaces are now saveable in their entirety; PRIMER v6 can be shut down and re-opened at
a later date, recalling the workspace content in exactly the fonn it was left (i.e. irrespective of any
subsequent changes or deletions to the data files that were originally read-in to the workspace).
8) Improved data entry dialog and wider selection of input formats. Rectangular Excel sheets of
variables by samples (or samples by variables) are read in easily using a new entry 'Wizard', which
now allows choice of sheet within the Excel file by name. PRIMER does not have the Excel
constraint to 255 columns. Larger files can be read-in from multiple Excel sheets and Merged or, if
created by a database package in 3-column fonnat ('sample number, variable number, value'), can
now be read directly into PRIMER - and will be automatically converted to rectangular fonnat.
9) There is now label matching: samples or variables do not need to appear in the same order for
those analyses that match two different sheets (e.g. BioEnv, RELATE, ABC dominance plots,
Aggregation etc). You usually only need to worry about the selection in the active worksheet and
v6 will perform any selection or re-ordering needed for other sheets. One major advantage of label
matching is the automatic merging of two species-by-samples sheets in which species lists are only
partly overlapping (and in a different order). Consistent spelling oflabels is required, naturally!
10) Worksheets are given an explicit data type (abundance, biomass, environment, other), which
perntit sensible defaults and warnings (e.g. if AB_C inputs are not of types abundance and biomass).
11) New data handling operations include: ranking of variables (e.g. an alternative to individually
transforming environmental variables); sorting (by name or factor levels) and moving data; more
powerful individual transform options, that can now combine different samples, variables, numeric
factors and indicators, as well as other worksheets (using the new label matching); more extensive
checks on input data (including validation of aggregation files); and a new Sum tool.
12) Improved output formats, and a wider range of options. Tables in the results windows are
formatted to be able to copy and paste directly into Excel spreadsheets. Many routines will now
allow summary information to be sent to a new worksheet (for further operation or export) instead
of to text-format results. There is a multiple-page print option, useful for large dendrograms, and
additional graphics output fonnats, including .jpg, .tif, .png. The standard vector format is the
("~i:e.
':_.\;· . Windows enhanced metafile (.emf). Final point co-ordinates (e.g. of the new position of samples
('4 after rotation of an MDS plot) can also be output.
~? 9
r
r
C. Changes vs to_v6
ti\._-.
13) Data sheets now pem'jt factors on the samples· (or indicators on the variables) to be read in ~.::
' ,r
from (or saved to) the Excel or text formats, allowing rapid transfer of all information between ~-··
formats. Factors are now easily imported from other data sheets, combined to produce composite
levels (e.g. a site-by-time composite factor could be used to Average or Sum over replicates for
,,_
each site/time level), and levels can be given default symbols and colours for use in all plots.
'.\~
14) Improved timer, showing how long a particular analysis has left to run, plus a Stop button, so ~··
that any analysis which threatens to take too long to execute can be swiftly and cleanly terminated. ,:._
A\:;


Enhanced 15) In ordination plots (MDS, PCA}, more flexible addition of symbols and labels denoting
~·''
graphics (different) factors. The 3-d plotting routine is greatly improved, both in respect of rotation of axes ~-

(and points, for MDS) and reflection of axes, with the ability to add labels to points in addition to
symbols. Information displayed on plots is more comprehensive, with the automatic use of subtitles
and a 'history display' to allow clear differentiation between plots from different transform and
,,:
~ ·

similarity options, and a key for bubble size is given when a variable is superimposed. ~·
~
16) In-clustering, lCeyed symbols can now be added (independently of labels} to dendrograms, and
the latter can be displayed in any one of four orientations. Clusters produced at a fixed similarity ~
level can be identified and saved as a factor, and can then be displayed oq the matching MDS.
~
Alternatively, an MDS plot can show the results of a cluster analysis as smoothed convex contours,
drawn round points that are clustered at several specified resemblance levels. ~
t-
~
17) Points in an ordination (in 2- or 3-d) can be joined by straight line segmeqts, in a defined order,
allowing trends in time, space or environmental gradient to be more readily se~n, ~-
. '

18) MDS diagnostics are improved: Shepard plots of distances in the ordin~ti~n against original '-';..
dissimilarities are given for both 2- and 3~ solutions; % contributions of in~ividual points to the ~:-
stress level are listed; an alternative fitting scheme allows a different treatmept of tied similarities,
and the user has more control over the precision of output stress values. ·
~~
. .. .....
19) In PCA, vector plots are now permitted (since the axes are linear combinations of the variables) '.-
~;

and any pair or triple of PCs can.be plotted.~


. .;
the. 2-
~.. .. or 3-d configurations.
20) Dominance plots from a single· sheet: (abundance or biomass) can have the multiple Jines
identified as different levels of a factor (by differing line colours and/or symbols). For ABC curves,
the full set of plots from all samples are· now automatic~lly created in one run of .the routine. ~ The
(confusing) option in v5 of automatic SumD}ation of seleqted samples has been removed (if needed,
it can be carried out using the new Sum tool prior to entering Dominance or Geometric plot menu).
21) The 'funnel' and 'ellipse' plots for average (and variance in) taxonomic distinctness indices are
also enhanced, with more flexible choice of labels and symbols, colour shading etc ·
22) Perhaps the most useful enhancement is the ability to zoom in on the content of any portion of
a plot (ordination, dendrogram, funnel plot, draftsman plot etc), whilst retaining axes and scales,
and having the ability to scroll through the plot at the current magnification. This even extends to
changing the aspect ratio (though not for ordinations!), by first drawing a rectangle around the
portion to magnify. This is particularly useful for dendrograms, which for large numbers of points
need to magnify the sample scale but not the similarity scale.
23) For MDS ordinations, interest is usually less in magnifying a subset of the plot than repeating
the ordination just for that subset of points. Drawing a box around the subset on the plot allows a
re-run of the MDS for those points, with a single click on a menu/toolbar icon!

Additional 24) Pre-treatment operations are greatly expanded: a) standardisation options are now by species or
analyses variable, and by total or maximum; b) a cumulate operation assists in widening the application of
these routines to size classes in particle· or body-size distributions; c) variables can be weighted,
e.g. downweighting some species on the basis of misidentification rates. Overall transformation
and normalisation options remain as in vS but, importantly, all pre-treatments now become an
initial step, separated from similarity calculations, allowing a highly flexible range of alternatives.

10
• • ,..~ .... ~!!

. C. Changes vS to~\,~;" ;,
II ' '' ,

.~.: ~ '.
25) The new user is still guided to 'safe', standard choices {e.g. Bray-Curtis on transfonned species
abundances, nonnalised Euclidean distance for environmental variables), but the more experienced
user now has access to nearly 50 similarity/dissimilarity/distance measures, drawn from different
literature contexts, and collectively referred to as 'resemblance measures'.
26) New research techniques here are: a) 'dispersion weighting', the downweighting of individual
species counts using their innate heterogeneity in replicates, arising from clustered spatio-temporal
distributions; b) 'taxonomic dissimilarities', extending the concepts of taxonomic distinctness
introduced in PRIMER v5 from diversity measures to dissimilarity coefficients.
27) Missing values are now tolerated in some routines and, where they are sparse, can be assumed
'missing at random', many fewer variables than samples ,are selected, and the variables can be
transformed to approximate normality (e.g. environmental-type variables), an EM (Expectation-
Maximisation) algorithm for optimal prediction of missing data is provided. (Use with care!)
28) Data handling operations such as transforming and ranking now apply to resemblance matrices,
which allows conversion of correlation matrices to similarities, e.g. to obtain environmental
variable MDS plots. Similarities can be explicitly switched to dissimilarities, and vice-versa, and
another resemblance tool is a 'Model matrix', for building model matrices in RELATE tests (non-
parametric Mantel-type): these might represent serial or cyclic trends with replication of groups.
29) The similarity percentages routine {SIMPER), identifying variables that contribute most to the
difference between two groups, is expanded to cover two-factor designs and to apply to Euclidean
distance (e.g. for environmental variables) as well as Bray-Curtis. Other menu items are regrouped:
Bio-Env and BVStep are merged (into BEST) and ANOSIM2 absorbed into the ANOSIM menu;
all global pennutation tests now create his~ograms of null distributions.
30) Several new global significance tests are provided (robust, non-parametric, permutation-based
and exact), to test for presence of meaningful structure before interpreting results and displays:
a) DOMDIS, which computes distances between the curves in dominance plots, which can then be
/~­
":.-'\£.· input to an ANOSIM test of a priori group structure;
f~ b) a BEST match test, which allows for the bias inherent in selection of a subset of variables from
.~-
'\:S.le!'"l
one matrix .(e.g. environmental) that best 'explains' the pattern in a second (e.g. species);
c) the SIMPROF (similarity profile) routine, which tests for evidence of structure in an a priori
-~,,..
unstructured set of samples. In combination with clustering and a new facility to condense
(~ sp~cific substructures in dendrograms (by clicking on vertical lines), this can generate trees that
~i' ·.
·-~~~ are pruned to objectively-defined groups. A factor defining these groups can be automatically
created.
·;:.~
.... ~ 31) LINK.TREE, a new routine to link biotic patterns to environmental variables (or similar linkage
4
problems). Linkage trees - a non-parametric version of multivariate regression trees or CARTs -
·~~~ opti~ise successive binary divisions of biotic patterns using threshold values of environmental

<:~ variables, and aim to provide piecemeal 'explanations' of assemblage groupings in tenns of single
environmental variables. It is effective in conjunction with the Bio-Env (BEST) routine, or may
··:~
provide better explanations where BEST fails because forcing variables are very non-additive. It
;~
can be combined with SlMPROf, to test for the 'significant' structure requiring explanation.
'··fl' 32) Species-accumulation plots now calculate and plot a wide range of richness estimators (Chao
~ 1&2, Jacknife 1&2, Bootstrap, Michaelis-Menten, UGE) as samples are cumulated in original,
-~ specified or permuted orders. Results are sent to a worksheet for easy export.
\

. _:ft" 33) A unique feature of v5 was the calculation of taxonomic distinctness measures of biodiversity,
based on taxonomic or phylogenetic relatedness of species in a sample, and testing for 'biodiversity
.::~~ loss' by simulating selection from a master species list (TAXDTEST). v6 enhances this not only
:'~'P' by better graphics but by the ability to use a different master list for the simulation than the sample
(:P" data (e.g. fossil lists can now be compared with modem ones), and the option to specify a data
sheet that determines the frequency with which each species is drawn from the master species list,
::~~ when constructing the 'null model' simulations.
c--
.,'f{_·'· ··•

1-:P
(~ .:..j,

Gr' 11
~
(if'
,,x~.
D. Conventions

D. Typographic conventions for this manual


Emphases & Text in bold indicates the menu items that need to be selected,
text symbols > denotes cascading sub-menu items, tab choices, dialog boxes or sub-boxes,
• denotes a button entry in a dialog box (so-called 'radio buttons' - only one can be selected),
./ indicates a tick in the specified box (so called 'check boxes' - either on or off),
text inside .a cartouche is an instruction to select the suggested entry (e.g. filename) or actually to
type it in, and
( ) & ( ) & ( ) indicate several steps that need to be carried out in the one box, where brackets are ,,.
'I
,~

used naturally to split up the different components of the dialog. .. .


For example:
Analyse>Resemblance>(Analyse between•Samples) & (Measure•Bray-Curtis similarity) &
(./Add dummy variable>Value: l)
is an instruction to select the main menu item Analyse, the sub-menu item Resemblance, and ./•
analyse between samples using Bray-Curtis similarity, adding a dummy species with value 1 for all
samples, prior to computing similarities (this is a modification to Bray-Curtis to make it behave in
a specific way as the samples become very sparse - see page 44). The dialog this corresponds to is:

00/'DJS ...
LINKTR£f ... Ekofisk oilfield mecrofauna
PCA...
SIMPER ...
S!MPROF,,,
I Abundance
.
S30
' :

SJS
.
:

S37
OorrWlanco Plot ... 0 0
1
- ---+---
1 ·.I
Or altsman Plot ...
Geometric Class Plot .. . 0 0
Spedos·Accun Plot.. .
• HarnQhoo sp
· SleMcfols l rrlcolo
Pholoe lnorneto

l
Anolldes groe
Ano~e bet;veen

0SOll'flles 0Btoy-Onis slrriarly


QV"'1obles 0 Euclidcon cistonee
0More (lob)

0Add d<mny vorieble


Voiue:

Finding your Cartouches in the margin refer to the subsection headings listed on pages l to 6.
way around
Numbers in margins allow the thread of analysis for a specific data example to be followed:
• -> This data set met for the first time
24~ These data last seen on page 24
38 +- This data set will be returned to on pag'! 38
• <- These data are not seen again in the manual
At the end of the manual there is also an index of occurrences of each dat~ set, in addition to a
general index of topics, including all menu or sub-menu items (displayed in bold).

12
.,.. ··:,·t"J.- ...
E. Brief tour :
:~
r::IJt E ..A brief four through the operation of PRIME;R v6
·"""'
,·~-~

::~ Reading in The first step is to get data into PRIMER. Data is held in worksheets; you can read in data files in
the examples several fonnats, or you can enter and edit data directly. The PRIMER CD contains a number of
"-~ example data files needed for this manual: if necessary, copy the Examples v6 directory from the
·f' CD to your hard drive. (If you installed to the default location you should find the Examples v6
·~ directory already under C:\Prograrn Files\PRIMER-E\PRIMER 6\Examples v6.) To read in data
select File>Open from the main menu and navigate to, for example, the Examples v6\Ekofisk
·!!A subdirectory, selecting the ekma.pri file. This is of marine species assemblages at sites round an
·._.,. oilfield drilling rig (see the description box from Edit>Properties). The data can be input either as
samples (rows) by variables (columns), which js the convention in statistics, or as variables by
-~
samples, the convention for biological matrices, but PRIMER does need to know which are the
~ samples and which the variables! Check this :from the Edit>Properties menu where (Samples
......·I!' as•Columns) is the correct choice here .
···~
":~ Reading in. As well as its own v6 fonnat (.pri extension), PRIMER can directly read in text files, Excel files
non-v6 files and the old PRIMER data fonnats. Check the environmental data file for Ekofisk, in Ekofisk\
·-~ ekev.xls, by opening it in Excel, to see the rectangular format that PRIMER expects. Note that it
happens to be transposed (samples by variables) compared with the species matrix. The entries
....·~ must be numeric, though row and column labels can be alphanumeric. The sample labels are
·;.~ unique and match those for the biological data, which will allow comparison of the two arrays,
.:~ though the samples need not be in the same order in the two matrices (a change :from v5). Larger
matrices than can be handled by Excel in rectangular format (i.e. >255 columns), can be read in as
<~t 3-column text files ('sample label, variable label, value' on each line, with tab, conuna or other
~~~ specified separator), or smaller sub-matrices read in separately and merged in PRIMER.
::~ Select File>Open from the menus. You can select the file types to read using the box at the bottom
.:·:~ of the dialog. The default setting will read PRIMER 5 (and PRIMER 4) files as well as the new
.·.·:··
.~\(_··'
PRIMER 6 fonnat, but you have to select the Excel option to display the *.xis files. Go through
··~
the steps in reading in this particular Excel file, by File>Open>(File name: ekev.xls) & (Files of
type: Excel files (*.xls))>Open, then in the Excel File Wizard take all the defaults except
.-~~
(Orientation•Samples as rows) & (Data type•Environmental) on the second dialog box.
:... ~
Entering data To 1hput data directly into PRIMER choose File>New. For worksheet type choose •Sample data
:/f' directly in v6 and in the Sample Data Properties dialog set the number of data rows and columns. You can
'_,(" specify whether the rows or the columns represents samples, and optionally specify the type of data
,·· t!1'
'•;
(e.g.; Data type•Abundance), which simply allows v6 to choose natural defaults. Add meaningful
(and· unique) sample and variable labels using, for example Edit>Labels>Samples. Save the sheet
:~
using File>Save Data As and save frequently as you edit. The default file type is a binary
---~
··~
PRIMER 6 worksheet format (.pri), which is not backwards compatible (i.e. cannot be read by
PRIMER v5, though v5 .pri files can be read by v6). You can also save the data in Excel fonnat.
.. ~ Add factors Some analyses require prior identification of the samples into groups, e.g. different sites, times,
(~ or indicators treatinents etc. This infonnation is held with the worksheet in the fonn of one or more factors (e.g.
.-::_~ time) with different levels (e.g. Jan, Feb, Mar, ... ). You can create or modify this information by
_______,__ the Edit>Factors menu, when the active worksheet is the data matrix. Similarly, grouping
,=~ structures on the variables are created using Edit>Indicators. Note that you can get the items on
c~ the Edit menu by right-clicking with the mouse when the cursor is over the worksheet.
':a ...
\2~··· Use Edit>Factors to look at the factor 'Dist' already set up for the ekma.pri worksheet. This
·~ groups the samples into one of four classes, depending on how close they are to the centre of
drilling activity. Then revert back to the data worksheet by Cancel or OK. (The four levels, A to
(~
D, are defined in the Description box under Edit>Properties).
(~
~ Pre-treatment Pre-treatment of the data (sometimes in more than one ~ay) is usually desirable. For assembla~e
data transformations will reduce the dominant contribution of abundant species to Bray-Curtis
I~ of data
simiiarities. Though not usually needed for controlled ('quantitative') sampling, standardising of
(~ samples to relative composition (so sample totals are all 100%) can be achieved by Analyse>Pre-
Cr' 13
'~
~.

CF'
rr'.
·E. Brief tour
~~.
''~.

treatment>Standardise>(Standardise•Samples)&(By•Total). Transformation of all values (which ~-·


should be after standardisation, if the latter is appropriate) is achieved by, for example, Analyse> ~-
Pre-treatment>Transform (overall)>Transformation: Square root. For environmental-type data,
such as contaminant concentrations, it may be appropriate to transfonn to approximately normal '"-'
distributions (to justify the use of a Euclidean distance measure), and then to normalise (a location
and scale change) to place different types of variables onto a common measurement scale. See the
Methods manual for the rationale for various choices. Other pre-treatment options include
:l'·
,-(:'
differential weighting of each variable (via supplied weights or by calculating dispersion properties
of replicates). ·Each pre-treatment operation creates a new worksheet of transformed/standardised/
normalised data, and sometimes the option to send summary statistics to a new worksheet (e.g. the
1''
~-
mean and standard deviation for each variable, when taking Analyse>Pre-treatment>Normalise).
~
Working with 'Resemblance' is the general term covering similarity, dissimilarity or distance coefficients. You '*t-··
resemblances can create a resemblance· matrix with the Analyse>Resemblance menu item. Similarity matrices ~-:
can be saved as individual PRIMER format files (they will be given a .sid exfension, as in v5), but ;(::
this is rarely necess"ary in v6, since it is now possible to save the whole workspace.
Click on the title bar of the ekina.prl window, one of the files you opened ~arlier, to make it the ~-
active window - the title bar becomes a more prominent colour for the active window. Pre-treat "'\...;
~

the data by a square root transformation. With the transformed matrix as the active sheet, select ~I

Analyse>Resemblance and take the defaults of (Analyse between•Samples) & (Measure•Bray-


Curtis similarity). This creates a triangular matrix of similarities between ev~'ry pair of samples, in

a new window, which now becomes the active sheet for·several further analyses; {There are a wide ~)
range of other resemblance coefficients that can be accessed via the •More ·bqtton and tab. The ~·
measures that appear in the general reference book of Legendre P & Legendr~ L, 1998, Numerical ~·,
ecology, 2nd Englished, Elsevier, are numbered in the same way as there.) ·
~;:
Creating plots When a resemblance matrix is the active window, the Analyse menu switches .to display only the ~.:
·a~
analyses which work with resemblances (e.g. Analyse>CLUSTER or >MDS). Running rout.ines
on the Analyse menu produces graphical output, results windows and/or further worksheets. The
properties of plots can be amended through the Graph menu. You can also get the graph menu
items by right clicking the mouse when the cursor is over the graphics window. Plots can be saved
in PRIMER format (.ppl) but more usefully in several export forms, either pixel-based (.bmp, .jpg,
.png, .tit) or vector-based Windows rnetafiles (.emf) for further manipulation in Powerpoint etc'.
Generate a dendrogram for a cluster analysis using the Bray-Curtis similarity matrix just produced.
With this as the active window, take Analyse>CLUSTER>Main>Cluster mode•Group average,
and the ~Plot dendrogram check box ticked (but not the SIMPROF box, for now). Click on the
OK button to produce the plot. Try altering the plot's appearance. Click on a horizontal line to
rotate parts of the dendrogram, as if it were a 'mobile' (demonstrating the relative arbitrariness of
the x axis ordering), and click on a :verticel line to collapse the tree below that point. Reinstate by
clicking again. Add symbols to the x axis, representing the four pre-defined distance groups for
samples from the oilfield, by Graph>Data labels & symbols>(Symbols:~Plot>~By factor>Dist)
& (Labels: ~Plot). Experiment with other options, e.g. changing title sizes and fonts on the Titles
tab, unchecking the Plot history box on the General tab, Orientation•Right on Graph>Special etc.

Exploring the By now, you have created many open windows and may be finding it hard to see the one of interest
workspace by shuffling the windows around. ·This is more easily done by clearing the PRIMER desktop with
Window>Close All Windows (this does .not remove the files from the workspace, unlike the
operation of vS) , and then using the Explorer tree on the left of the desktop to identify the plots,
results, worksheets etc that you want to view (click on their name). You can rename any of these
items by File>Rename... or, when the cursor is over a name on the Explorer tree then right
clicking on the mouse gives you Save, Rename or Delete options (the latter also removes from the
workspace all items that branch down from the deleted one). The branching structure is not just for
display - PRIMER 6 uses it heavily to suggest sensible defaults and to pass information, such as
newly-created factors, down the tree to use in graphs, or even up the tree to the original data!

14
.·~
E. Brief tour
(P'
.'.~
wdrkspaces can be saved in their entirety. Do this for the current workspace with File>Save
Workspace As>(File name: Testl); workspace files have extension .pwk and a different icon to
data, resemblance or graph files. Now File>Close Workspace then re-open it again with File>
::~~ Open>File name: Testl.pwk. It re-appears exactly as it was saved (it has no dynamic links to files
·~!r outside the workspace, so subsequent deleting/editing of these will not change the workspace data).
::·_··~
Selecting data Sometimes you may wish to omit certain species or select only a subset of samples for analysis.
.;~~ subsets You can apply selection to one or both of samples and variables by first highlighting the rows and/
r.~ or columns, by clicking (and dragging) on the row or column labels. These operations are toggled,
C» so are cancelled by a simple repeat. (A useful tri~k is to click in the blank box in the top left comer
of the worksheet, which toggles between highlighting all and highlighting none.) With the correct
·~~ subset highlighted (the darkest shaded areas), choose Select>Hlghlighted. The selected subset is
;~ displayed and analysis operates only on this subset. Revert later to the full worksheet by Select>
(:~ All and you can Edlt>Clear Highlight (another useful trick is Edlt>Invert Highlight and Select>
:~.~ Highlighted again, to get the complementary set of variables/samples to the one you had originally
,~~
selected). Note th~t if you use this method of selection you must remember that highlighting alone
i:~ is not selection: you must first highlight then select the highlights. Alternatively, select samples (or
~:·~ variables) by their numbers (e.g. Select>Samples>•Sample numbers: 1-5,11-15) or by levels of a
.,, factor (e.g. Select>Samples>~Factor levels>Factor name: Dist>Levels>Include: D) .
-:·:~
~ Any selection is only temporary - the selected portion is displayed but all the original data and
-~ factors are still there, in the background. Saving the current sheet to a file will save the full data
~~ (this is to stop you overwriting your full data with a subset, by mistake). To save only the selected
-~ portion to a file, you must first copy it to a ne~ sheet, with Tools>Duplicate. Only the selected

~f'
portion is duplicated. This is also the way you can do crossed selections using factor levels, e.g.
select a subset of the sites, duplicate this sheet, then select a subset of times on the duplicated sheet.
.~
Tools menu Duplicate is just one of several routines to manipulate your data or resemblance worksheets prior
'~ to (or between) analyses, using the Tools menu. When the active matrix is sample data the other
~ choices cover: Aggregate, which uses a different fonn of worksheet, an aggregation file (defining
:.f" which species belong to which genera, which families etc) to pool species counts, for example, to
higher taxonomic levels; Average and Sum, which are most commonly used to average or total
~
samples with the same factor level (e.g. average assemblages for each site, across all times);

·,"·"'
~:~
• Check, which identifies potentially problematic data entries for some analysis choices (samples
which are all zero, non-unique sample names, missing values, negative values etc); Merge, a
comprehensive joining tool, which will combine two worksheets into one, using the sample and
variable labels to identify how the join is made (e.g. two arrays of different samples but only
.~tF' partially overlapping species sets can be merged into a unified matrix - consistent spelling of labels
..
.·.~
:·"' is important here!); Missing, which provides 'best predictions' of isolated missing values -
maximum likelihood estimates from the EM algorithm - under strict model conditions (use with
great caref); and Rank variables, which can be a useful alternative, for placing environmental-type
.~-~ data on a common measurement scale, to normalisation after Transform (individual) operations
· .. ~~ are carried out. This individual transform option greatly expands the limited range of overall
\
"·-~
transforms on the Analyse>Pre-treatment>Transform (overall) menu. The latter are applied in
the same way to all data in the sheet, but the individual transfonns apply only to all values (V) in a
...f"t highlighted set of variables, and the user may build a specialised transform expression, using
··:pi scientific functions, operators, other variables etc. Finally, Transpose switches rows and columns
,•
for the purpose of exporting to another application (note: this is not the same thing as changing the
..J!!' designation of which labels represent samples and which variables, via Edit>Properties). The
<~ output of Tools are usually new worksheets, which become the input to an Analyse option.
t;,;~
Some of the same menu items e.g. Duplicate, Transform, appear under the Tools menu when the
:l'A active sheet is a resemblance matrix, but others are specjfic to resemblances, e.g. Dissim converts
c~ similarities to dissimilarities (and vice-versa), by subtracting from 100. This is really only nee~ed
for exporting to other applications since internally PRIMER knows which matrices are of wh1c~
(~ type (similarity, dissimilarity or distance). It will automatically match them in the right way, ~.g. tf
1::--r you try to correlate a biological Bray-Curtis similarity matrix to an envir~nme.n~l ~~chdean
distance matrix (using Analyse>RELATE), low distances will be matched to high smulant1es.

15
(:?'
\J-'
{~---------
~~)

E. Brief tour
"'~'
~
Results Nearly all choices from the Tools or Analyse menu will produce text-format results. These are ~
windows separate windows, denoted by notepad icons in the Explorer tree. It is thus easy to find and display ~)
the results from any particular analysis (th~ vS single large log of all results no longer exists, its
~)
purpose. being more helpfully taken by the Explorer tree) . Individual results windows can be saved
to ..rtf or .txt format files but, more simply, the usual Windows Edit>Copy (Ctrl+C) works on text ~)
in the results window that has been highlighted, e.g. it can easily be pasted to a Word file. When .'291('
results tables are pasted into Excel in this way the entries are placed into a grid automatically. In
Word files, they can be converted from text to tables simply by specifying tab as the separator.
With the Bray-Curtis resemblance matrix from the Ekofisk dat.a as the active sheet, run the one-
way ANOSIM test on the four groups of samples at different distances from the oilfield (the factor
,.
~:


Dist), by Analyse>ANOSIM>General>(Design•One way>Factor A: Dist) & (Max permutations:
999) & (,/Pairwise tests for worksheet) & (,/Plot histogram). Three new windows are created.
~
Firstly, the results window gives the ANOSIM R statistic (see Methods manual, Chapter 6) for ~-"
testing whether assemblage structure varies across the four distance groups, and shows that it does. ~·
The graph displays.. the histogram of the null (permutation) distribution of R, and shows why the A\.
null hypothesis of 'no difference in assemblage structure between groups' is decisively rejected ~·
(the observed R is well outside the null distribution). The results window tqen gives the pairwise
tests between pairs of groups. The values of ANOSIM R for each pair have also been sent to a 1',-.::
.......
further triangular worksheet (where they could, for example, be saved to E1(cel or even, in more ~, .
complicated cases, be used as input to a further analysis). • ·
r(::
Analyse Much of the above is preparatory to the core routipes in PRIMER, found Qn ~e Analyse menu, .1D
menu whose operation is covered in detail in. the following chapters,. and its ratiopal.e in the associated
Methods manual. All that will be given here is a brief list. When the activ~ window is a sample ~>
~I
data sheet, the choices include: BEST, selecting variable subsets in one matrix which best match
the multivariate pattern of samples in a different matrix, e.g. 'explaining' QioJogical assemblage ~
structure with a subset of environmental variables; LINKTREE, a non-p~rainetric multivariate ~'¥)
form of 'classification and regression .trees', tackling the same problem as 'BEST but in a more
piecemeal way, e.g. trying to identify .which individual abiotic variables, and what ranges of their ~J!
values, appear to be responsible for discriminating different assemblage groupings; DIVERSE, ~·
calculating a wide range of standard div~rsity measures, including some newer ones based on the
~:~
concept of taxonomic or phylogenetic breadth of an assemblage; PCA, ordination by Principal
Components, more useful for physical/chemical/ecotoxicological and similar abiotic data than for ~··
species abundances; SIMPER 'similarity percentages', identifying variables (e.g. species) that ~:.:
primarily account for Bray-Curtis dissimilarities or Euclidean distances observed between groups ~ ..
of samples; SIMPROF, 'similarity profiles', which test for significant evidence of structure among
samples that have no pre-defined grouping (this can be combined with both LINKTREE and 1;
CLUSTER to justify identification and interpretation of clusters); Draftsman Plot, a multiple ~-
scatter plot of pairs of variables (often environmental); Dominance Plot and DOMDIS, producing ~
~ .. ·
and testing groups of dominance curves and ABC plots, which are graphical forms of diversity
measures on species abundance (and biomass), as is a Geometric Class Plot. Finally, a Species- 4'.
Accum Plot will graph species-accumulation curves, showing how the observed number of species ~.j
increases as new samples are added, and also plotting a wide range of species richness estimators ~:.
~J
(the asymptote of species-area curves, assuming 'closed' communities). Some of these routines
(e.g. BEST or ABC curves) require two or more input worksheets: one is always identified as the ""f
active window before the routine is run (e.g. the environmental matrix that will be selected from in ..,:i
, ..
BEST, or the abundance matrix for ABC curves) and the other is supplied to the routine via a •f1
5:.
dialog box (e.g. the community similarity matrix in BEST, or the biomass sheet in ABC curves).
When the active window is a resemblance matrix, the Analyse menu covers (as well as CLUSTER
and ANOSIM already met): MDS ordi~ation plots, non-metric multidimensional scaling, which is
central to the PRIMER approach; MVDISP, which compares multivariate dispersion across
.,
~
groups; RELATE, covering non-parametric Mantel-type tests between resemblance matrices; and ~
2STAGE, a second-stage analysis (an MDS of MDS's!), useful for comparing the effects of
different choices of data aggregation, transfonnation, resemblance etc. One routine, TAXDTEST, ~
is accessed through the Analyse menu when an aggregation matrix is the actiye window: this tests ~
biodiversity measures based on taxonomic distinctness for departures from 'expectation'. t;
~.
16
~r
~
. '···· ..
I. Data input/~u·~~;·-.

IMANUALffUTORIAL I
1. Opening, editing and saving data (File, Edit) ·
Finding the The installation should have placed a number of sub-directories (BCzoo, Bermuda, Biomark,
examples Clydemac,. .. ) into an Examples v6 directory (the default installation location is C:\Program Files\
PRIMER-E\PRIMER 6\Examples v6\). They can also be read directly from the CD as you run the
package. You might find it convenient however, to copy them to a higher level data area on your
hard disk. In what follows, it is assumed that they are in a top-level directory C:\Examples v6\.
_ .. The various sub-directories contain the faunal matrices, and sometimes matching environmental
data, for most of the real case studies described in the Methods manual ('Change in marine
communities'). Check this by looking in the sub-directory C:\Examples v6\Ekofisk, of soft-
sediment samples of biota and matching sediment chemistry concentrations, for 39 sites at different
( "'··
·..,,, distances from a marine oil-drilling platform. The directory should contain files ekma(.pri), the
community matrix in internal PRIMER 6 format (which cannot be read by other software,
including PRIMER 5) and ekev(.xls), an Excel format file of environmental data for the same sites.
PRIMER Whether the extensions *.pri, *.xls display, or not, is a function of your Windows set-up; if you
file types have suppressed them it is still easy to distinguish the different file types by their icon. There are
PRIMER-specific icons and extensions for the following:
·tJ (* .pri) sample data in rectangular format, and associated factors, description etc;
~ (*.sid) triangular matrices of similarity, dissimilarity, distance etc, called 'resemblances' in v6;
~ (*.agg) aggregation file , assigning species to genera to famili es etc;
~ (*. pp!) plot files, holding all the internal PRIMER information that structures the plot;
8 (*.pwk) workspace files of everything, so the PRIMER desktop can be reconstructed 'as was'.
PRIMER 6 can handle, both for input and output, PRIMER 5 and PRIMER 4 data files (* .pri and
*.pm! ), similarity files (* .sid and * .sim), and aggregation files(* .agg and *.pm I), but note that plot
' l
fil es are not interchangeable with earlier formats in any way. In addition, PRIMER 6 recognises
..- several standard Windows extensions, e.g. input and output of* .xls (Excel) and * .txt (text) data
files or resemblance matrices; also *.csv, comma-separated text input. There are several options
for * .txt text-format data input. Results windows can be saved in * .txt or *.rtf (rich-text) formats.
~•·.\
PRIMER 6 can also output plots to some standard pixel formats (* .bmp, *.jpg, *.png, *.tif) and
-
I

vector-based enhanced metafiles (* .emf). All other file types (extensions) are not recognised.
l..J

Opening the Start the program by double-clicking on the PRIMER 6 icon f6 on your desktop, or selecting the
PRJMER6 program from the Start menu in Windows, giving the window below. A third method is to double-
desktop click on a file with a recognised PRIMER extension, e.g. a worksheet file (.pri) or a workspace file
(.pwk), and PRIMER 6 will automatically be launched, with the selected file or workspace placed
in the resulting desktop window. (Note that opening more than one file by double-clicking on a
series of names, in My Computer or Windows Explorer, launches parallel, independent PRIMER
desktops, which is usually not the required outcome. Multiple files can be opened simultaneously
by first launching PRIMER then selecti:ig several files in the File>Open dialog window.) The
.... PRIMER desktop is separated into two parts: to the left is the Explorer tree which will display
icons for all the sheets, results windows, plots etc, and their interconnections. To the right, the
actual worksheets, results and plot windows are displayed. The current workspace consists of all
items in the Explorer tree (irrespective of which windows are displayed on the right hand side of
the desktop), and all files needed for an analysis must first be opened into the current workspace.

File View Tools Window Help

.., D ~ r;i ~~i ~ ,l{ 1i1!i ii1 ; ~ fo fo ~ it l~ I 'ih ~ ' ~:1 ~

~f~¥~~J::;::~8·;.1.;'..~;;,. ;;;,i:~·~~}~r1
.1
. ..
~~ ' -.
.•
..
I. Data input/output

Entering Most users will already have their data stored in rectangular form ill some other software, e.g. as an
data directly Excel spreadsheet, which can be opened directly and straightforwardly (see later). However, data
arrays can be typed directly into PRIMER, if necessary. Select File>New>{•Sample data) and, in
the resulting Sample Data Properties dialog box, type in a title, specify a data type and which way
,,

round the matrix is to be, e.g. (Data type•Abundance) & (Samples as•Columns). You can also
give a description of the data (optional), and you need to enter the number of columns and rows

Hslory:
Doto tn>e
0AIU'dance
Clos• W0tkspaco QBiorn=
0
,.
Save Wcrk>poce Ctrt+S Envrot'lnCNol
S.Wo W0tkspaco As ...
0 t.ttcnown/QlhC<

Recent W"""-es 0 wertspaeo lol.rrbef or co0ms: j4 I


Recent ltelM
~Oll'!)lodola
~olrows: ~
c)).wegotlon cs.la
0Noto Doscrlplon:
Is a test dole set t>mo typed rto f'RMfR 6 ror
otlon l)<lposes O<Yy
,..-
OK Cancel

OK C<>neel

A worksheet of zeros is created into which you can type, by working down the columns, clicking
on the first cell, typing in the number and pressing the Enter key. To edit an earlier entry, double
click on the cell, amend it and again press Enter. To cancel an edit of a ceH you have entered by
mistake, press the Esc key. If you inadvertently click on a row or column label {the grey ce lls at
the margins of the table) that row or column will be highlighted; remove the highlighting by
clicking again on the label (highlighting is fl simple on-off 'toggle').

Labelling At this point only the default row (variables Vl, V2, .. .) and column labels (samples S l , S2, ... )
samples & have been defined, but a set of commonly used operations for worksheets can be found in the lower
variables part of the Edit menu, including a Labels jtem. This menu can also be called up by right-clicking
when the mouse cursor is placed within the body of the worksheet. Samples and variables can the11
be appropriately labelled: labels benefit from being succinctly descriptive; they must be
consistently spelt from one sheet to another and unique within a sheet. T his is because PRJMER 6
(unlike PRIMER 5) makes much use of label matching (e.g. abundance to biomass or species to
environmental variables at the same set of sites, merging of species lists from different studies etc).

Edl
Select oil Lobel 141e:
Select~ted IMacrolotne sptcies
Select samples
Select variables ,...---~Label

- - - - - - -• ITl>o<I... I Cheetozone setosa


Clear~
1Glycera Oba
Invert~
,.---
OK
- - , j 1L...mmerls
Cut Cbl+X
Janos Feb OS M«OS
Copy Cbl+C
(V4)
16 1 0 ,...---~ 1...
cv--'-
s>_ __
0 0 4
Poste
Insert
Cbl+V
Cancel I (VS)
9 10 2 Delete
4 5 0 Move
3 0 Sort
Proportlos ...

Foctors...
·:i
lndicotors...

Sovo Doto As .. ,

......·.

/
18 ••• l
1. J?ata input/output
J• 't:'''" ~ · ..
... • • ~

Deleting.& ,· .J In addition to attaching labels, the Edit menu allows a range of other edit functions on the data
.·· inserting . . entries. For example, to delete whole columns (or rows), highlight them by clicking on their labels
rows/cols and then take a menu sequence such as Edit>Delete>Columns. No 'Undo' operation is available
I •

so a prompt is given to ask whether such a deletion was really intended . (It is always wise to sav~
·· :._.

the worksheet - see below - before starting any manipulations that involve deletion or movement
so that the original worksheet can be reinstated if necessary). Note that the current cursor positio~
- the cell in the sheet outlined in black - is ignored; deletion works only on highlighted rows or
columns . In contrast, insertion ignores highlights and uses only the current cursor position, e.g.
Edit>lnscrt>Row will add a new row immediately above the current position of the cursor.

't'lew Anolyso Tocls Window Ho~


DI Clear HQhlQht a ~ foJ .fJ G ~ p
::ti; Invort Hk;tilloht
Clot ctrl+X
I Cepy ctrl+C
''
Paste Qrl+V
Insert ,
''J,"Delctt. ; . . : • . ;.·•
Move Delete worksheet values?
Sort

Prooertles ...
Labels
OK lI C.n:el l
F11etors ...
I Indicators... :"--1·---'
~
Myr10chete ocuota

Row1 Col 1

Moving & Movement of rows or columns uses both highlighting and the cursor position. Rows (or columns)
sorting to be moved are hi ghlighted, and the Edit>Move>Rows operation moves all highlighted rows to
rows/cols immediately above the current cursor position when moving up, and below the cursor position
when moving down (similarly with moving columns to the right or left - movement is always over
the cursor). In the simple case illustrated below, the same outcome would have been achieved by
Edit>Sort>Rows>( •By labels), since this is an increasing alphabetic sort of the row labels. (Note
that sorting can also be carried out according to some alphanumeric order, held in an indicator. The
latter is the term that PRIMER uses for information associated with each variable - a catalogue
numbering system here perhaps - see section 2 on setting up factors and indicators).

Sort
Properties...
L®els
Factors ...
Indicators ... 3t 0
~--t- - -8- lllH

! Row4 Col1

Cut/copying The Edit>Cut and Edit>Copy operations send part of a worksheet (or its factors/indicators) to the
& pasting clipboard, where they are accessible by other Windows software or can be pasted back into a?other
region of the active sheet (or factors). Cut and Copy operate much as in Excel etc, on highlighted
regions of the worksheet, the difference here being that highlights must be whole sets of rows, or
columns, or a combination of rows and columns, the highlighted data always being the da~kest
displayed cell colour (if nothing is highlighted the cell at the current cursor position is c?p1ed).
Edit> Paste places the data from the clipboard onto a highlighted area of the same shape?~· if there
(.• is nothing highlighted, onto a rectangle with its upper left comer at the current cursor pos1t1on.

19
I. Data input/output

Saving data The data sheet can now be saved (as can any item created in the workspace) from the File menu.
& warning File>Save Data As gives a standard type of Windows dialog box, shown below. This allows you
prompts to change to the desired directory, specify a meaningful name for the file (the default is Datan, ,.
where n just numbers each new data sheet in increasing order) and save it in PRIMER 6 format , '-•

with .pri extension, e.g. (Save In: Examples v6) & (File name: Testl) & (Save as type: PRIMER
Data Files (*.pri)). This is the standard (binary) format for PRIMER 6 data matrices, which cannot
be read by earlier PRIMER versions or other software, but there are other output choices: PRIMER
Windows v5 (* .pri) and DOS v4 (* .pml) formats, plain text files (* .txt) or Excel sheets (* .xls).
Any subsequent save of the file to the same directory, using the same name, will give rise to a
warning prompt that you are about to replace the previous file.

.-.
~Seament
~
BCzoo
Bemu!a e'.lSotie..9
~

Open ... My Recent Biom.uk (:)Tasnwia


11'1'9Qrt Docunentt Clydemac i:)WAiciet
Ccwals

Ctrt+S
@ I - C~b
De:klopEl<Dfisk
- Exo
Fal . t\ C:\fxatre>les "6\Testl.prl a~eady exists.
W F
Reiwnc Data ... 6
My Docunents
Grdfish
Lr.Me
Oo you want to replace U

Yes 11 No ,.
Delete Data ... Mossoldl
MoMx

~---·_J ~
Recent Wcwkspacos

~
Oslomac
Recent Items
Phulc.ot
Exit My Compulor

~
MyNetWO<k _
I
Flenamo:

Save as typo:
lrestl
IPRIMER Data Fies r .plij
Save

Uincel
... ..... '
PRIMER 5 Dela Files ('.pii]
PRIMER 4 Data Files (".pmlJ
Text Fies r.txtJ
E>«:el l'ks '.ids

Saving, A major innovation in PRIMER 6 is that entire workspaces are now saveable, so there is less
closing & necessity to make repeated saves of individual data files and derived windows, until the data (or
opening a plots etc) have stabilised into a final format that might be required in a different analysis, at another
workspace time, or exported for use in other analysis programs (or presentation graphics) etc. Regular saving
of the workspace is a good idea, however. This can be accomplished initially with File>Save
.-,
Workspace As, which gives a Save dialog box similar to the above, but with only the single option
of *.pwk for file types (this is a binary file in PRIMER 6 format, not accessible by any other .-
software). Subsequent saves of the workspace with the same name, overwriting the existing copy,
are carried out with File>Save Workspace; this will not generate a Save dialog box or a warning
prompt (unlike Save Workspace As). More detail about managing workspaces, and exploring the
linkages between their component items, is given in section 6 but, for now, note that other
functions frequently used are File>Close Workspace (or File>New>Workspace), which leaves a
new, clear workspace ready for opening of new files (and will prompt for a Save Workspace
operation unless one has just been carried out), and opening an already saved workspace (*.pwk
file) with File>Open. This restores the workspace exactly as it was at the point of saving.
The test data matrix saved above is too small to do anything useful with, typical species by samples l
matrices having tens or hundreds of species and samples. So, take File>Close Workspace. The ~

· ~ tutorial now opens a real 174 species by 39 samples array (ekma.pri), of soft-sediment macrofaunal
abundances, held in PRIMER format, and a matching 39 samples by 9 environmental variables
matrix (ekev.xls), in Excel format, both files being in the directory C:\Examples v6\Ekofisk.
PRIMER - - - - .- ~

The data for the fie Worlupace' has chanoed - )

Do yoo w6"ot to save t?


\.'
Yes I~ I C«>eol I
.-
20
J
•..• I. Data inpuVoutput

Setting Firstly, it is convenient to set the initial (default) directory to C:\Examples v6\Ekofisk so that
the initial PRIMER ~ill display_ files from that direct~I?' when opening ~r savin~. This is carried out by
directory Tools>Options, entermg the Path: on the lmttal folder tab. This step 1s not essential, however
since once the first file from a particular directory is read in, PRIMER defaults to that directory fo;·
further input or output, until a different directory is used or the program is closed down.

3 0Ex~sv6
.· 1..::i BC.zoo
'Cl Bermuda
- I
e:i lllomerk
r;:,
Clydemac
l~Ccr"'5
!Cl CoUter
t::> lm!I
bExe

I Make New Foldef l l Cancel


;,_ OK car.:fll I. -

Opening File>Open>(File name: ekma.pri)>Open reads in the existing Ekofisk species-by-samples array in
II . PRIMER PRIMER 6 format. The default file types ('All PRIMER Files') are any PRIMER-specific formats:
files Windows binary format files consist of* .pwk for PRIMER 6 workspaces, *.pri, *.sid and *.agg arc
respectively v5/v6 datasheets, triangular resemblance matrices (similarity/dissimilarity/distance)
and aggregation files , and *.pp! ('L' not ' one') are v6 plot files. Also included are equivalent files
.. in text-fonnat for the DOS PRIMER 4 : *.pml ('one ' not 'L' !) covers data and aggregation files
(numeric only), * .sim are similarity matrices and *.dis are distances or diss imilarities.

' IQ el<ma.prl
~
My Recent
Ekofisk oilfield macrofauna
Document$ Abunr:tan.::e
Save W0tl:space l ; : • ' • ~ • • .It':·~
··· ~·tl..l.li<,f'-
Save Worl:spece As ... @} SJO S36 S37 S31 SJ "

Rename Worl:spece .. .
Recent Worl:speces
Desktop
Certarchuslloyd
EUrowt>gAnlhotoa
.' TU'belorlo Ind
0
1 1_ _ _ 1.
0 0
0
--~ 1____
O'

0 ,-
0
2_
0
--=- 0 •
0•
0

Recent Items
ExR
:,.IJ
My Documents

~
My Compvtei
" I
..____ _____3
~
My Network
, Fie name:
. Fies ol type: Al PRIM.ER Fies (',pwk;'.pri;'.sid;' .egg;".ppl;",pml;'.sim; v I
• l •., • I I 1 t • I# • !J.I I

PRIMER 6 ~ 5 Fies ('.pwk;'.pri;'.sid;",agg;".ppl)


PRIMER 4 Res ('.pm1;'.sin;'.di$)
Text Res ('.hct;".csv)
Excel Fies ('.Ids)
AIFies ' ."

(Ekofisk oil- The abundance file ekma.pri is displayed within the PRIMER desktop window. It shows the typi.cal
field fauna) sparseness of many species matrices, with many zeros and some large counts, each sa?1ple being
the total of three Day grab samples at each site and the sites, S30, S36, S3 7, S3 l, ... bemg ordered
left to right in increasing distance from the centre of oil drilling activity - the design is rough!~ that
o f fi ve radial transects from the centre of the oilfield out to distances of 8 km, at gcomctncally
21
I. Data input/output

increased spacing. See Chapters 10 and 14 of the Methods manual for a diagram of sample sites
and other details of this study (original paper: Gray JS, Clarke KR, Warwick RM, Hobbs G 1990. ,,.(
Mar Ecol Prog Ser 66: 285-299).
The data matrix, and any window in the PRIMER desktop, can be resized by grabbing and
dragging a corner of the window, exactly as for the desktop itself, in normal Windows practice.
Note that the row and column position of the cursor within the data matrix is indicated at the

bottom right of the desktop.

Properties Edit>Properties produces the Sample Data Properties dialog box, seen earlier, where information e.:
'
about the data can be checked and amended, e.g. Title, Data type, Description, numbers of rows
and columns and that the Columns are the Samples in this case. The History box will accumulate
information on Pre-treatment operations such as standardisation, transformation, species weighting
etc (and for resemblance matrices will specify the coefficient used). Properties is also one of the
items that can be selected by right-clicking when the mouse is over the data sheet.

Hlslory:

0Abundancc
QBIOlllUS
Select~ed
0 Envi'orvnerCol
Clear HQi\lo'lt S31
S3S S37
0 lkWnown.Wlcr
lnvert~ght
0 0
Cit Ctrl+X
Copy Ctrl+C
0
I
~SN - -
0 0 llUN>er of colrMs: ~
Pasto Ctrl+Y 0Cobms
2 0
lnse<t
Delete
Move
01
0
0 0
0
0
QRows
--·
Oescripllon:
... ___ tunbor of rows: §=:J

Sort
0 0 slCS In 0 5.spoke radial design, IO•OUanged in order OI
. .Mij.J.!{ii ·dhL e · t - - - t - - - - t - -1
1 0 eosing cistonce lrOlll lhe cerue or drq octivly. The -"
labels ~ t H----t----:1-~ 1
74 5
roctor 'Olstonce Groups' celegorises the :les Into 4
Fl!Ctors... oups: D: <250m; C: 250m - lkm; B: 1 • 3.5km; k >3.5km,
2 s frOlll lhe oilfoekl centre. .., .
lrdcotors... - \

Save Doto As ...


OK Concel Help
R

~ ·-

Opening Often, rectangular data matrices of variables by samples, or samples by variables, will initially
Excel files have been entered into Excel, perhaps with different data arrays in different sheets (e.g. abundance
and biomass, % cover and environmental variables, or simply subsets of a larger set of samples). If
Excel 2000 or later is installed on the machine, PRIMER 6 can directly read in Excel files, one
sheet at a time. (Earlier Excel files will also be readable, provided the installed version of Excel
supports these). File>Open gives the Open dialog box shown at the bottom of the previous page,
and, importantly, Files of type: Excel Files (* .xls) must be selected to in order to display the
available Excel format file names in that directory. Clicking on a filename and then the Open
button generates a 'wizard' which guides the user through the choices that must be specified: which
Excel worksheet, selected by name, and what Data type (not just rectangular data but triangular
resemblance matrices, perhaps computed from other software, or alphanumeric aggregation files,
specifying which species belong to which genera, families etc).
On the next dialog box, make sure the Title box is checked (the default option) if there is a separate
title line in the top left cell (A l) of the Excel file. Failing to uncheck this box when there is no I.
" .;.:
additional title line is likely to be the commonest source of error when reading an Excel sheet into %
PRIMER. Similarly there is a check box for the presence of row labels, e.g. species names. (This
is the default since they are almost always present. The possibility that they are not is retained
because the earlier DOS PRIMER 4 required these non-numeric row labels, typically species
names, to be in a separate file.) Other choices in this window specify whether the Samples are to
be interpreted as columns or rows, and whether the data is of Abundance, Biomass, Environmental
or Other type, allowing PRIMER 6 to select natural defaults for analysis choices.

22
I. Data input/outj,ui "'~
0

: . .,· .

r;r""'{ ~.·:-. -· ..
( Missing or The final option is whether any blank cells in the Excel sheet should be interpreted as Missing
~- zero values? values or Zeros. Typically, they will be Zero for species data and Missing for environmental or
.}v.'.~~~
other data. The distinction is important for subsequent analysis: most species-by-samples matrices
have large numbers of species that are not present in many samples - they are indicated by zeros,
and this information is properly catered for by an appropriate choice of analysis (e.g. similari ty
coefficient). If an environmental variable is not detected at a sample site then that should a lso be
recorded as a zero, or as the lower detection limit (or perhaps half that limit) . If a specific variable
is simply not measured at a site, through random loss of a sample or whatever, then that is properly
a Missing value. Reading in a blank cell from Excel, or editing it to a blank aft<:!r it has been read
into PRIMER, will display a Missing I entry. Most analyses will require that a selection o f sampl es
or variables be used to exclude all missing values, though section 10 describes the Tools>Missin g
option: t}:iis attempts to estimate small numbers of values missing at random, i.e. missing irrespect-
ive of their likely value, under strict model assumptions (multivariate normality), which may be
appropriate to some sets of environmental or other variables. Note that, in contrast, random loss of
a replicate community sample from a balanced sampling design is not thought of as generating a
missing value: all the variables (species) are lost for that sample, so it is simply omitted, and the
design becomes a- slightly unbalanced one, which does not cause PRIMER problems 'in general.
Genuinely missing data for community matrices is rare. It supposes that ~ome species were not
looked for in a set of samples, but were identified in a second set of samples, and that both sets
need to be part of one analysis. The species that were not consistently identified would need to be
omitted from analysis in that case using Select>Variables?'( •No missing val!les).
(Ekofisk Also available for the 39 sites around the Ekofisk oil-field are environmental data on the levels of
abiotic data) total hydrocarbons and metals such as Barium,. Strontium, Copper etc in lhe sediments, and
measures of physical properties, such as % mud and distance from the oilfield c entre. These are in
the Excel file ekev.xls, which you should first open in Excel to see the data format. Note that there
is only one sheet defined in this case ('Environmental'), and it is transpqsed in relation to the
assemblage data, being samples-by-variables rather than variables-by-samples. Its sample and
variable labels are unique, and can be a mix of numbers, letters and spaces of any length (as can be
the title, if present), but data entries are always strictly numeric for any data sheet - avoid formulae
in the cells, and for species data use 1 and 0 for presence and absence. Open the fi le in PRIMER
with File>Open> (Files of type: Excel Files (*.xls)) & (File name: ekev.x ls)>Open>(Exccl work-
sheet: Environmental) & (Data type•Sample data)>Next>( v"Title) & ( v"Row labels) & (Orientation
• Samples as rows) & (Data type•Environmental) & (Blank•Missing value)>Finish .

Excel worl<sheet:
IEnvironmenlol vj

Close W0tks1><1Ce
Sbve WO<ksP<ICO
Sbve WorksP<OCo As..
Sbve Data As ...

Rename Workspace ••
[0 0ruo Dato type
My Recent
Rename Doto .. . Docunenh 0Rowlobels QAbundonce
Delete Dato.. .
Recent WOl'ksP<ICOS @ Orlenlotion
0 S~s os colrrm
QBiomoss
0 EnvVomientol
Recent I tems Desktop

Exl • 0 soll'f)losGS r~ O UrAAownJother

8
I My Documents ·
01&sno value
Ozero

-~ I Fio nomo:
_J~ly.N ot~k . ..I Fies ol twe: · Exec! Filer (".xis) v
- - - - - - - - - - - 4A1 PRIMER Fies ('.pwk;".pri;",sid;".09!l:' .ppl;".pml;'.sim;'.6s1 - - - - - - - '
PRIMER 6 & 5 Fies (".pv-4<;'.pr~·. sid;". ogg;".ppl)
PRIMER 4 Fies r .pml ;'.sim;'.6 s)
Text Fios • bd:'.civ

23
. .
1. Data inpuVoutput .,

E3 Microsofl Exte!I (;]~~

,,·

•'...

e .·
0.33 32 1 50_ . 2:.~e , 2.96 13096 179 L ~.3!_~
o.45 1 57 168 ] 62 3.1 4913 208 .~L~--.!
05 39 213 4.34 2.97 2876 299 5 10 e
l."lft---ll--:-
O.-:-t
5 --=-:
61+-:-
130 3.36 2.9 12999 128 6, ~
05 251 158 . S.02 3] 2175 307
I 4 25 7
538 o.s1. 121 132 , 5 2,99 l 2545 199 4 29 1 "

Save the workspace in the C:\Examples v6\Ekofisk directory with File>Save Workspace As>(File
31 ~ name: ekwk.pwk), for later use, and File>Close Workspace to produce a clear workspace. Further
files will be opened from the C:\Examples v6\Tasmania directory, to demonstrate text file input.

(Tasmanian This sludy concerns meiofaunal abundances from a two-way layout of samples on a sand-flat in
meiofauna) Eaglel1awk Bay, Tasmania, see Chapters 6 and 12 of the Methods manual. Separate data arrays are
available of nematode and copepod communities associated with disturbed and undisturbed patches '°'·
-~ of sediment at four locations across the sandflat, the disturbance being caused by natural burrowing
activity of soldier crabs (original paper: Warwick RM, Clarke KR, Gee JM 1990, J exp mar Biol
Ecol 135: 19-33). The two disturbance conditions (D and U) are referred to as the 'treatments'
(though this is an observational study not a manipulative experiment) and the four locations as
'blocks' (B 1 to B4). For each treatment/block combination there are two replicates. Each replicate
is a sediment core for which both nematodes (39 taxa) and copepods ( 17 taxa) are counted.

Opening The directory C:\Examples v6\Tasmania contains the PRIMER 6 format files of separate nematode
several files and copepod data, tana.pri and tapa.pri. Both can be opened in one step by taking File>Open,
at once clicking on tana.pri, then clicking on tapa.pri with the Shift or Ctrl key held down. (This operates
in the usual Windows way, with Shift-click highlighting all fi les between the two items selected,
and Ctrl-click highlighting individual, non-consecutive files in the list.) Both (in general, all)
selected names appear in the File name box and are opened with a single press of the Open button.

Opening PRIMER 6 will allow the same file to be opened into the workspace more than once, since the
the same response to the Open menu item is to create a copy of the file for entry to PRlMER at that time,
file twice and there is no physical link maintained from the workspace to the original file. Thus amendment
of data entries in the workspace does not affect any outside files (unless, of course, the file is then
Saved As .. ., using the same name), and vice-versa, so that a second copy of the file can be opened
without difficulty. PRlMER 6 does, however, demand unique naming of all workspace items, so

24
--.............
I. Data input/ourj,ut

any second (and subsequent) attempts to open the same file will ~dd a version number, e.g. tapa(2).
Similarly, the default names f~r new windows derived from an analysis sequence: Data l , Data2,
... , Reseml, Resem2, .. ., Graph! , Graph2, ... are unique, and if changed to more identifiable
names, these too should be unique. Thus an attempt to give the name tapa to a similarity matrix
and a d~ndrogram plot as well as the data sheet will result in names tapa(2), tapa(3) being assigned.
11' :':':?.·-!.:- -. .. ,,
!. Text-format The Tasmania direct~ry also contains three different text format versions of the copepod samples,
: input files tapatx.txt, tapacs.csv and tapa3c.txt. They are simple text files, the first two being rectangular
variables by samples (or samples by variables) arrays, and differing really only in what is used as a
separator ('delimiter') between the data entries: * .csv files are comma-separated and *.rxt are
typically tab-separated (e.g. outputting to *.txt format gives tab delimiters). However, input from
*.txt format is more general: it can also cater for comma-separated or space-separated entries or the
use of any other specified delimiter. In all cases, rows are separated by (hard) carriage returns but
for columns there is no limit on the length of each line, and these will typically be wrapped (with
soft carriage returns) when displayed with a text editor or word processor, as seen below. The third
file is an example of 3-column format, in which each line of the text file has only three columns
separated by tabs· (other delimiters are also allowed). Column 1 is the sa1T1ple label, column 2 the
variable label, and column 3 the numeric data entry. The advantage of this fonnat is that only non-
zero entries need be listed - the blank cells can be automatically filled with 'zeros - and again there
are no fixed size limits. N aturally, each combination of sample and variable label can only occur
once, otherwise an error results (and the content of the first duplicated combination is displayed).

I Elle ~~ 'Liew Insert F~mot Iools TAblo W!ndow tielp Adotlo POf
Plain Text • Lucido Console • II • IB I J1 IS: • 'II 13 .jj .,., • t'• • !85°1. • • I nE ~ ~r-­
·~ ~8· I• 1' I ' 2' I' 3 •I• 4 ' I L,5' I• 6 L.' • 7 • I L,8 •I • ' L,I • 10• I LJl• I •12 1.,1 •13• I\)~ · I • 151.,1 •1'• I \) 7• I • lb_!_.'J'~
i Tasmanian copepods _J
' BlDRl BlOR2 B20Rl B2DR2 B3DR1 B30R2 B40Rl B40R2 BlURl
~ BlUR2 B2UR1 B2 UR2 B3UR1 B3UR2 B4UR1 84 UR2
.Amei ra sp 43 63 4 5 7 6 69 124
105 91 57 10 60 142 96
Apodopsyll us sp 0 0 0 0 0 0 4 1 0
0 0 0 0 1 3 2
Ecti no soma sp 0 0 0 0 0 0 1 0 0
2 o o 1 4 6 ""1iiia~iWiiiU:jjijiJiii
Ecti no somat i dae s p 1
7
15
4
14
5
4
5
2
0
32~11i:'lllAlll.mcs.i:ri3~i!:i:i=a!lll.l~i!21'!.!J~~:!!t!.!!eJ=:
..:!!!:
£3d.t
Haloshizopera sp 0 0 0 0 0 I
0 Eie ~d~ ~iew lnsert F2rmat lools T~ble '!':/.indow
-..; 0 0 0 0 0 O l t!~Adol>.ePOf
Leptastacus sp A 30 97 27 35 3 1 - - - - - - -·- - - - · - - - - · · - - -
117 15 88 3 0 6 j Lucida Con~ole • 11 • B ': , -."") • ': ~ ':
Leptastacu s sp B 1 11 3 0 0 0 - - · r-=-=----------- - - ,1
0 2 5 0 0 0~ _:;9· 1 · 1 · L·2·•·3· • · 4 · • ·S· 1 ·' 1..• ·7~!
~ ~a smanian c opepods _J
BlORl .Ameira sp 43
: Eile 'dot Yiew (.nsert Frumot Iools TAble ~ tfelp Ado2e POF BlORl Ectinosomatidae s p 1
BlORl Lept astacus sp A 30
• 1-;--=-fn I !! l-fi'-il-~·ii]
Pl&nText - • Luc.id4Console
1 Bl DRl Le pt astacus sp B 1
·R 8... -~ 1 • •• 2 • • • ) •• • 4 ••• 5 ••• , • • • , • • • •••• , •• • 10 • • • 11 • • • 12
BlDRl
Bl ORl
Rhizotrri x sp
undetermined B
1
1
i Tasma~ian ~opepods . ' ' ' ' ' BlDR2 Nneira sp 63
BlOR2 Ectinosomatidae sp 15
' ,OBl.Rl,OBl.R2,DB2Rl,OB2R2,0B3Rl , OB3R2 , DB4Rl , OB4R2,UBl BlOR2 Leptast acus sp A 97
~ ,UB3Rl, UB3R2,UB4Rl,UB4R2 BlOR2 Leptastacus sp e 11
~ ·. .Amei ra sp, 4 3, 63 ,4 , 5, 7, 6, 69 , 5,124 ,105, 91, 57,10, 60,142
l.!I Apodopsyllus sp,o,o , o,o,o, o,4,1,0,0,0,0,0 , 1,3,2 B2DR1 .Ameira sp 4
Ectinosoma sp,O,O,O , O,O, O,l,0,0,2,0,0 ,1, 4,6, 7 B2DR1 Ectinosomat idae sp 14
Ect inosomatidae sp,1,15,14,4,2,3, 1, 1, 7, 7,4 , 5, 5,0,2 ,1 " B20R1 Leptastacus sp A 27
Haloshizopera sp,0,0, 0,0 , 0,0 , 0 , 0,2,0,0, 0,0 , 0 , 0 , 0 B2DR1
B2DR1
O Leptastacus sp B
Mictyricol a typica
3
8
Leptan:acu s sp A, 30 , 97, 27, 35, 3, 1, 29 ,4 7,151,117 ,1 5, 88
Leptas~acus s B 1 11 3 O O o O 1 o o 2 5 O o o o B20R1 Parevans ul a sp 2
Leptas ~~:A~ B20R2 .Ameira sp 5
B2DR2 Ectino somatidae sp 4
~!;t!7' =-=:.o=-.
J
B20R2 Leptastacus s p A 35
B2DR2 Mictyricola typica 3
.. B30R1 .Amei r a sp 7 ..:.]
.9. l••lllllfilltimDl~DJl!llr!li!mJBmlt'l~~~:IJ
· •· • •
e10R1 B1DR2 B20R1 B20R2
B3 ORl
B3DR1
I "' B3DR1
Ecti
Lept nosomat
astacus i sp
daeAs p
Leptas tacus sp c
32
10 °•

r ·~ ' ·~
- - - - - - + - - -4
- 3+ - - -63
-'l' - - - - - ' - - - -5.o. ·- B30R~ .Ame1ra sp . . 6
..' mr::::..:.=---i--~o+---~o+---.,,-ior---o:'.-1- "' "' 111 '°' ill ~

1r.:1r= =:. : .:;: :. -i--~l+---:=lf----::Ci:-l mm ~ '


i====.:=-1- ---..:+-_ _1:..;.1l -- -'4
3 0 -.~
.> I

25

mm -r<ZT'?C --. --, - -


I. Data inpuVoutput

Read in the first of these text files, by Flle>Open>(Files of type: Text Files (* .txt, *.csv)) & (File
name: tapatx.txt), using the Text File Wizard dialogs shown below. Repeat for tapacs.csv, the only •.
difference being addition of (Text delimiters..fComma) to the third dialog box. For tapa3c.txt, the
below shows the error message obtained having inserted at line 7 (with a text editor such as
Notepad) a mistaken repeat of the line 'BlDl Rl Lepta stacus sp A 30'. When the error is
corrected, the Wizard dialogs proceed exactly as for the other two cases, except that the option
(Shape•3 column) is selected in the second box. Note that the 3-column file enters PRIMER with a
different species order (the order in which the variable names are encountered in the file); it can be
made to coincide by re-ordering the rows alphabetically with Edit>Sort>Rows>(•By labels).

,., Text dermlers ·- Text encoding:


I
I iuTFe ..., I
I
~Tab~
•Dettilype - - ----

(!)Stimple dlltt1~ ...


l. ospace

o comma
0 Resemblance metrtx I

D other Specify: 0
0
l·" t ....;.·.
Aggregetlon det!I
-~
0Resutts D Trell! adjacenl dellm~ers as one

__ I.__ QNote ~----·


-----------·--- - .
T.m .F.i~. Wizard · 'tapatx.txt' ..I Cancel I·: Help

0rne , Dale type_-


1
Shape 0 Abundance

0 Rectangular QBlomass

~ Orient!llbn 0 Envirorvnenlal B1DR1 B1DR2


0 Samples tis columns 43 4

0 Samples tis rows


0 Unknown/other
ApodopsyiUs sp 0 0 oL
___, \..: (.
O'
Ectlnosoma sp 0
~ ---1
Blank" ~ Ecfuosomatldae s 1 15 14 -
- - -·
-~I
• Heloshlzopere sp 0 0
0 3column 0 Missing value -··--
30
--
97
Leplastecus sp A
-3
0zero Leptaslecus SP B 1 11
~!
Leptestacus sp C 0 0 _,_
0 v
< >I
Cancel I Previous I
< Next> t f11,;...h _I Help

Clear Hoh&Qht 0Bvlabe!s~


.,· fie f:dt Ylew Lnsert FQrmat Iools T4ble
, ~ !je!p Adob.e POF
Invert HkjVlc;# oev factorMdcetor
Cl.t
! 11 !.'1I Dtaw • I.6. • !.'ii in !.' I~ !.' Copy
.,.,
IL e...
• •

I •• L2 • •• ) • • • 4 ••• 5 • • • , • L. 1 .... Paste


Insert
OK Cencel
Tasmanian copepods Delete
BlDRl .Ame1ra sp 43
BlDRl Eci:1nosomat1dae sp 1 Move
:·;, .. ,
BlDR1 Leµtastacus sp A 30 Sort • Mk@ii4(1'\
BlDR1 Le pt as tacus s p B
BlDRl Rh12oth- 1x sp
1
1 Properties... l Colurms...
11
~
BlDRl undetermined B 1 Labels
/1131DR1 Leptastacus sp A
71' BlDR2 .Ame1ra sp
BlDR2 Eci:1nosomat1dae sp
30
63
15
Factors ...
lncicators ...
~ -., I

BlDR2 Leµtastacus sp A 97
BlDR2 Leptastacus sp B 11
B2DR1 .Ame1 ra sp . . 4 B1DR1
43

OUpllcate s~, varlllble name cOll'baatlon


Data row: 7
BIORI Leptastacus sp A 30
lii'l~;:.:=~::..::::;-=-t-~~-:+~~--::~-~~~o~
:
3 -
.
....,
0

~!
0
0 .Y

>
26
1. Data input/output ~

W~~~.F7£L~-:~~.~·?:.' ~~.":~-", ·
~:.J~n~enng , .,-..: There are no fixed size limits for arrays within PRIMER 6, simply an overall limit determined by
~i;(&;merging). the amount of available real memory on the machine. There will, of course, be significant time
~\:l&rie~iirrays :: constraints for some of the more compute-intensive routines, and there is a limit to how many
:~·~~4.~'.; ~;~f,·:~ ~·· ' .'
samples it is sensible to try and view at the same time in ordination plots, but it is a viable strategy
to place all related data into a single workspace,. and data of the same type (e.g. counts for a
specific faunal assemblage in a complete set of samples) into a single worksheet. Defining a
structure of factors on the samples (see next section) will allow the selection of subsets, or
averaging over replicates (or factor levels), needed for a specific analysis. Note that the speed of
simple manipulations on worksheets - amending factors in particular- has dramatically improved
from PRIMER vs (by using native grid controls), and it is straightforward to perform simple
manipulations on an array of hundreds (or thousands) of species and thousands of samples. The
difficulty is sometimes in getting such data assembled in the first place, given the 255 limit to the
number of columns in Excel, and in cross-tabulations in Access. There are now several viable
PRIMER approaches to this Microsoft constraint, however. When using Excel for data entry, if
one or other of variables and samples is below this 255 limit, then they could be placed in the Excel
columns. The worksheet can always be transposed after entry to PRIMER if this is desired (with
Tools>Transpose): The latter is just a matter of preference: it is irrelevant to PRIMER whether
matrices are displayed as variables-by-samples or samples-by-variables, and transposing does not
alter the identification of samples and variables (e.g. if samples are defined as columns on entry,
after the Transpose operation they will be defined as rows).
If both array dimensions are >255 then the samples could be split into separate sheets (each <255
columns) in a single Excel file, then each separately read into PRIMER, and stitched together, two
at a time, with Tools>Merge. It is preferable that the column labels across all entry sheets be
unique and the variable (species) names consistently spelt, because a default Merge operation can
then be carried out. Note that Merge can even cope with variable (species) lists in the separate
sheets that are not of the same length or in the same order. Typically, this is what is produced by a
simple cross-tabulation from a 'flat-form' file in Access, for example: sp~cies that are entirely
absent for one of the subset arrays will not appear in the variable list for that matrix. But if the
species names are consistently spelt, PRIMER will match them up in Merge and fill any newly
created cells with zeros (or Missing!, if this is specified as more appropriate).
All these difficulties can be circumvented, however, by use of text format files. For example, 'flat-
form' Access files can be simply made to output a 3-column format text file (as displayed above)
of all the data, with many tens or hundreds of thousands of rows. This reads simply and directly
into PRIMER 6. The rectangular format text files also have no size limits on entry to PRIMER, if
the database program of the ·source data can generate such unlimited variables-by-samples or
samples-by-variables arrays.

Output data Output format options, with File>Save Data As, are generally the reverse of input choices, with a
fonnats few constraints. The default is a PRIMER 6 (binary) file but data sheets (or resemblance matrices)
can also be sent to Excel using Save as type: ~xcel Files (*~xis). These will always include a title
and row labels, and will lead to an error if more than 255 columns are attempted to be output. The
default text format output from Save as type: Text Files (* .txt) is rectangular, also with a title line
and row labels, and with no size limits on the number of rows and columns. Alternatively
./3 column format can be selected. In both cases, the separators (delimiters) are Tab characters. It
is also possible to output data in old PRIMERS (*.pri, binary) or PRIMER 4 (*.pml, ASCII-text)
formats, though the latter will lose much of the associated information (species names, factors etc).

Handling Opening of PRIMER S files, using File>Open is automatic; there is no need for any import dialog.
PRIMERS In comparison to opening PRIMER 6 files, the only adjustment that might arise is the need to
& 4 files define a Data type using Edit>Properties. This is because PRIMER 5 files did not have a defined
data type, so are all read in by PRIMER 6 as Data type•Unknown/other. PRIMER 5 files also have
no History box (defining which standardisations, transformations etc have been applied, to obtain
the sheet in its present form). It follows that outputting PRIMER 6 files in PRIMER 5 format will
lose the information on Data type and History.
Differences in file format to the old DOS PRIMER 4 data sheets, which are in text format, are
much more substantial. The v4 format is shown below for the file tapav4.pml in C:\Examples
27

-····--·-· ·--- .. ·------


I. Data input/output

v6\Tasmania. This has a restrictive format (e.g. species-by-samples layout, space separators,
maximum of 80 characters to a line etc) and only the sample labels and not the species names are .....
given; the latter were held in a separate file, tapav4sp.spc here, with each species name on a new
line, in the same order as the data rows. The format is likely only to be of relevance in the context
of rescuing old v4 data directories. A v4 data file can be read in using File>Open>(Files of type:
All PRIMER Files .. ) & (File name: tapav4.pml)>Type•Species-sample, and the species list can be ...
imported by File>lmport>PRIMER 4 Species File>(Files of type : Old PRIMER Species Files
(* .spc)) & (File name: tapav4sp.spc). The latter will simply overwrite the default species labels 1,
2, 3, ... with the species names. Precisely the same effect could be achieved by highlighting and
copying the contents of the *.spc file to the clipboard, and pasting into the label fields, as shown.

g
t
:-!- e• I • I: I •

Tasmanian copepods
2• 1
0
• ~ • I .. 4 •
'•
• 16 Type ··•·-···-·--
810R1
~ BlDRl
r.• BlOR2
0 Specles-semple 43
l.:J B20Rl
B2 0R2 0 Envltorvnentel 0
~ B30Rl Select al
o B3 0R2 1 0 Aggregidlon
Select~ed

f B40Rl
B40R2
BlURl
BlUR2
i
~--·--· ... Select somples
Select v"'lables
B2UR1 Cleatt-tt;t.llQht·
B2ut2 1. OK Cencel Invert HlghlQht
" B3UR1

J
o · B3ut2 Ctt Ctrl+X I 0 0

·-~----8! ~
B4utl CoPY Ctrl+C II 0
0 B4UR2 Paste Ctrl+Y
~
43 63 4 5 7 6 69 5 124 105 91 57 10 60 142 96
0000004100000 132 Insert •I >
Delete "-·
0 0 0 0 0 1 0 0 2 0 0 1 4 6 7
L. • 10 15 14 4 2 3 1 1 7 7 4 5 5 0 2 1 * Move •1
A
- •
0000000020000000
30 97 27 3 5 3 1 29 47 151 117 15 88 3 0 6 2
11130000100250000 •
° Sort :I
Properties•..
!--.::: a ra, m ~ ~ •
l.abcl; • . ' . • '• Samples... I
PbQe l I/I !At 2.San l1l I Col I fl6: [lR /.
lP$@SnHM(j
Iii)~ --1
Cut Ctrl+X !I
CoPY Ctrl+c
I f.1o '-d<l 'Liew lruert FQtmet Iools T~
I '.r[ndow tlelP ~e PDF J(, c~ L=o=b•=ll=lle=:-----~llimli£?~j~·!'t!~!¢Ill
ILucido Console ': J'IG \;.opy .
Ii' - I. 5('. I • I • I • 2 • I • ) • Cll!i e.aste ~ ~ ...
IOJ - E:J I I - - - - - --
Edt
:t A f.ont ... Lobel tile:
~ OK
I
"'-
,.. ';..!L
Coned Import ..• -
i'
tl
~·~

Help
!;.;;.,
Cancel
1/1 j;~ .;

Save the workspace in the C:\Examples v6\Tasmania directory, with File>Save Workspace As>
(File name: tawk.pwk), for use in the next and later sections.

~
Ir.

~q.


~
\;£1
28
~
2. Factors

2. Factors (and lndicutors), identifying sample (and species) groups

If you have been carrying out the manipulations in section 1, by now you will have several sheets
open in the C:\Examples v6\Tasmania workspace, one of the nematode assemblages tana, and
several identical versions of the copepod matrix, tapa . Unclutter your PlUMER desktop by
Window>Close All Windows and then re-display just tana and tapa by clicking on their icons in
the Explorer tree. (If the workspace is clear, then File>Open just tana and ta pa.) Il is fundamental
to the operation of PRJMER to note that only one window in the workspace can be active at a time,
denoted by the deeper colour title bar. You can select which one to activate by clicking anywhere
on its window, or click on it in the Explorer tree. Menu selections apply only to the act ive window,
e.g. File> Save As, Edit>Labels, and the Analyse and Tools items (though these may specify one
or more secondary sheets needed for a composite analysis). It is also important to realise that all
the menus are dynamic, with content that will change depending on the context. When the active
window is a rectangular data sheet, different Analyse options (e.g. Resemblance, DJVERSE, PCA,
.. .) are available then when it is a triangular resemblance matrix (e.g. CLUSTER, MOS, ... ).
•.
With tana as the active window, select Edit>Factors from the main menu (or use the shortcut right
click when the cursor is over the data matrix to bring up the Edit menu), and observe that there are
already two factors defined. The 'Treatment' factor splits the 16 Tasmanif}n sandtlat samples into
two levels, namely whether they are from disturbed (D) or undisturbed (U) areas of sediment. The
'Block' factor divides the samples up in a different way, into four levels, the four separate sampling
patches across the sandflat (1 to 4). In statistical terminology, the treatment and block factors arc
crossed, meaning that there are samples at every combination of levels · of the first and second
factors. Factors are heavily used throughout PRJMER, in at least two main ways:
a) to define a group structure for multivariate hypothesis testing (see ANOSIM, Chapter 6 of the
Methods manual). Such a priori structuring of the samples (prior to seeing the data) plays an
important role in formal inference about sample patterns, and also the interpretation of which
variables (e.g. species) are primarily responsible for distinguishing specific groups (Chapter 7);
b) purely as a means of labelling points on plots, in dendrograms etc, in which case there might be
a different 'level' for every sample, e.g. a fuller or more abbreviated site name than is held in the
sample label. There is no limit on the length or alphanumeric content of a factor level.
Factors are carried around and saved with the PRJMER *.pri form of the worksheet, not saved as
separately identified data. This is in contrast to environmental variables associated with each
biological sample, which are held in a separate sheet - preferably with the same sample labels, and
which could have some or all of the same factors defined. In general, environmental variables are
continuous, and they must be at least ordered categorical, so they can be analysed as numerica l data
in their own right. Factors are categorical, and often not ordered (though PRJMER can treat them
as ordered if they are numerical, e.g. to 'join the dots' of a time sequence on an ord ination). The
block levels 1 to 4 here are not to be treated as an ordered scale, with 4 considered in some way as
'closer' to 3 than to 1: they are just four separate locations at which observations were made.

j '

I "'

' Row 1 Col 1


Lreaung In the Factors dialog box, obtained from Edit>Factors on tana, add a third factor (which might be
factors (for useful in unambiguous labelling of later plots) which produces block designations B 1, B2, B3, B4
samples) as an alternative to 1, 2, 3, 4, by Add>(Add factor named: Blok name). (The misspelling is
deliberate!) The cursor should then be at the top of the new (blank) label column, ready to start
typing: B 1 (return), B 1 (ret), B2 (ret), ... , B4 (ret). When the first 8 rows are completed, highlight
the whole column by clicking on its label, 'Blok name', then take Edit>Fill Down>Pattern, to
place a repeating copy of the highlighted non-blank entries into the highlighted blank portion.
"-'

\..

'b·
''-~;

182 'i..v
83
.., ,I _,.
_
Edt B3

I Add... I ~---- Blok..;;;;;----


- - - - - - l81ClR1 81
ICombine ... I
810R2 81
, -- --, l820R1 82
I Renome... I B20R2 82
3
Delete... D
I Re0<der ... ) I,...:~-:i
-1---1~
-3---- •--~ - - - - - -+D-
Koy... U
- - -1---
Delcie ... _B4____1_ ___,~~----..---.
j'
DR u 1
, -- - - , 1B40R2 64 l'nport... r-------t - - - --+2- -
Koy... I B1UR1 ~--- 2
B1LR2 B1
OI(
l'nport ...
I 102l.R1
- -- -1
B2
B2l.R2 B2 Cancel
OK j ,83l.R1 B3
B3UR2 B3
Concel I B4UR1
----1
64
B4l.R2
He~ I <I ' I>

An alternative way of achieving the same thing, in this simple case, is to use standard Copy and
Paste operations from the Factors Edit menu. Try this out by first deleting the bottom 8 entries in
the Blok name column, by highlighting them (click on the first cell and drag down, as usual with
Wihdo~s) and hitting the Delete key on your keyboard, or Edit>Delete>(Delete data values? OK).
Now highlight the first 8 rows in this column, take Edit>Copy, put the cursor in the 9th box and
Edit>Paste. (Incidentally, do not confuse Edit>Delete with the Delete button on the left hand side
of the Factors dialog box: the latter is for deleting the whole factor that the cursor is currently
positioned in, and will give you a query box saying 'Delete factor(s)?', rather than 'Delete data
values?'. It can also be used to delete several adjacent factors simultaneously, by first highlighting
them - by clicking on the first name, and Shift clicking on the last, at the top of each column). If
you delete something you didn't intend to (and there is no selective Undo button, notice), you can
always cancel the whole edit by taking the Cancel option on the left of the Factors dialog box.

11
~

~
c
~
lb
<&-
~

30
2. Factors

Now correct the spelling of the factor name with Rename>(Rename factor Blok name to: Block
name), and make the Block name factor the second, rather than third factor listed, by Reord er,
clicking on Block name and Movet. OK the changes you have made in the Factors dialog and
·~· ~
..).:;) resave the workspace with File>Save Workspace .
I

·~
I .
I Add... I i:;ILo11e1=----F=.::......4=:=..--F=~m
I eo.nt*le ... I 1---
B10R1
B1 OR2 0
Treolmort
0
1
1
Block

B1
Tr elllnlert
0
0
Blocknome
Bl
:Bl
Block

- - - 1- - -- t - - - - -; - - -1
0 2 B2
0
o- - - -:B2- --··-22 - -
'B2

I. ;-:~_DR2
___ 2 ---
-t-0 - - - t - B2
0 IBJ
-.:!- B30R1 0
0
3
3
Add ... 0 B3 3
B30R2

..
.~
0 I
eon-me ... I 0
0
iB4
:B•
0
.../
u I
RMMle ... I u ·Bl :1

,...
I u Bl .1
~ .
I Re0tder .. ~ Move
u B2
-----
2
u B2 2

.~
Oeleto ... .} u B3 3
u 63
Key ...
u 'B•
I u
'r
I
ITf>o<j ...
~ I Concel
64

31 f--
OK

Multiple The distinction between factors in a biotic matrix, and a separate environmental matrix associated
PRIMER with that data, is sometimes blurred, and similar information could be held in both ways; it depends
sessions simply on how that information is to be used. An example would be the Ekofisk datasheets ekma
(assemblages) and ekev (environment) seen in section 1, where the distance of each sample from
24 -> the centre of the oilfield is held in continuous form in ekev, but in discretised form as a factor in
ekma. At this point, you might like to open a second PRIMER desktop, by double-clicking on the
PRIMER icon on your main desktop - this will produce an independent run of PRIMER 6. It is a
• good idea to keep unrelated analyses in different workspaces, and the way to open more than one
workspace at a time is to launch separate runs of PRIMER. In this second PRIMER desktop, re-
open the Ekofisk workspace saved in section 1, by File>Op en>(File name: ekwk.pwk)>Open . Or,
...
you should be able to open this workspace with. File>Recent Workspaces>C:\Examples v6\
..' Ekofisk\ekwk.pwk. If you haven't saved this workspace, simply re-open both ekma and ekcv .

Window Help
l •·.,.. New... Ctrl+N----ii. ~
j;; J) [.? ,? ;::: fi'::_ f."i'..
Open.. . Ctrl+O ~------
Clos_e _W-orkspd
-- ce- - - -1

54ve Workspece Ctrl+S ~~~"'U\111'\a1'!7'..-.:l'lr;r:"J"


:S tawk 54va Workspece As ...
'·D tana
.• v I -1
1
-D 1apa Tasmanian nematodes RenMle Workspace .. .
r { ) 1apa1x
Abundance
' ·D topacs I C:\Exemples v6\TeS1Mnie\tewk.pwk

_ .:.
,-p tapa3c
- {) tapav4
..
~. - - :::_:__:~_·- - d
I Row1 Col1
Recent Items
·Lijj#flt'.irlt;L~
..J
More Edit> You can see from looking at the first column of ekev that the sampling sites S30, S36, S37, ... arc
Fill Down ordered in increasing distance from the oilfield. Make ekrna the active window and examine its
options factors, with Edit>Factors. The 'Dist' fac tor groups the samples into one of four, increasing,
distance ranges. However, the levels D, C, B, A are alphabetic, not numeric, so PRJMER will treat
...,· them as unordered groups. Much later (section 13), we will require ordered groups, so create these
now with Add>(Add factor named: Dist#) and type in 1 opposite all the D levels, 2 opposite C, etc.
' ·-
It does not save much time here but, as well as the Fill Down>Pattem feature seen above, there is
~J ' also a simple way of entering a run of the same values: type in the first one, say 2 (return) opposite
·-.,i' the first C, then click on that 2 again and drag down to highlight all the consecutive entries that
need to be filled with 2 (or Shift click on the last one), and take Edit>Fill Down> Value.
• u1,, 11,; i:. C111uw1:::1 i-w uown opuon, which could make sense to use in this context: add a further
factor. called 'Distance order', click on its heading to highlight the column, and take Edit>Fill
Down>Label number, which will fill the column with the integer sequence 1, 2, 3, ... This is a
ranking of the sites (with no allowance for ties) in terms of their distance from the o ilfi eld, which
can be seen as half way between the discretised grouping of 'Dist' and the actual distances in ekev.

.
'-/

!-----f',;,;;,;__ __.Ol:.;,:Sl;,;,;,l_ _- t ;..,


11
l~arce ~d~-1
2

~....i-~--~~--i.;.._~~-i ;
•.

- ,11
.,,2
,__- - --4 I ...,,

Closing a OK the Factors dialog and resave the workspace (File>Save Workspace), and/or data sheet if you
PRIMER wish (Filc>Save Data As), and close down this second PRIMER desktop with File>Exit (or click
session the '~' in the top right of the window, as usual). If you have not just saved the workspace, you will
·~
be prompted with a warning message, asking whether you want to save the workspace. An answer
38 (-- of Yes will invoke the File>Save Workspace As dialog box. Whilst PRIMER 6 is multi-tasking
within a workspace, and the different threads from parallel sessions of the software should not, in
general, be able to interfere with each other (a copy is taken of the current version of each file at
the time it is loaded into the workspace, so the original is not then modified until a Save), it is
inefficient to fill the memory of your PC with multiple sessions where this is not necessary.

Combining Retuhl to the other PRIMER session, the Tasmania data, that should still be open on your Windows
factors (e.g. desktop, and with the tana sheet active, again open the Factors dialog with Edit>Factors.
to average) Combining factors with Combine, another button on the left of the Factors box, can be a very
quick and effective way of creating new factors or composite sample names, in nested or crossed
31 ~ layouts. Here, take Combine to display a typical selection box (PRIMER uses a similar dia log for
man~ther analyses, e.g. selecting a subset of the data by levels of a fac tor). Click on Treatment
and~, then Block and 8.
to set up which factors are to be combined and in what order. (Note
that the double arrowsEJ move all items from the 'Available' list to the 'Include' list, or back, and
a selection of entries can be moved in one operation by holding the Ctr! key down as the items are
clicked - or the Shift key to obtain a range of items - as in normal Windows practice.) Pressing
OK then gives the new, composite factor with name TreatmentBlock and levels Dl , D2, ... , Ul,
U2, .. .. Don't be confused by the fact that there may still only be three factors displayed: if so, the
first column has been shifted to the left, but the box is scrollable horizontally, as well as vertically,
to display all information. You may wish to resize the window by grabbing the side or corner in (
the usual Windows manner - all boxes that would benefit from it are resizable in this way.
The combined factor can be used as a composite label on an ordination plot and is essential for
averaging over the replicates in the data, to obtain a matrix of mean values - for each of the 8
treatment/block combinations, here. This is simply achieved with an OK for all the changes you
have made to the Factor information, and then Tools>Average>(Samples•Averages for factor:
TreatmentBlock) & (Variables•No averaging). This creates a new data sheet, Datal , in which the
sample labels are the levels of the combined TreatmcntBlock factor (and which carries across the
factor information from the original sheet, as can be seen by taking Edit>Factors on Data l).

32
2. Factors

Treatment Block name Block Treatme,..Block


0 'B1 11 01
0 .B1 -11 - -0-1 - - - 1
I -"dd... :.=Lllbel=--+-'Tr..;;..eal;.;.;.mert~FBlock=.;..;;nome~~Bloclc~ Select factors
I~

E
0 'B2 02

~ : E: ~ I~ Avlllable
Bloek name
lncklde
Treatment
~ B20R2 0 162 2

I Reorder... 1 1=:JOR2: : 30R: ::::::::::::::


i =~=====::::::::::::::::::::::,~==:
Delete ... 11 _B_ 40_R1_-+o _ _ _1
__ __IB4 ;4
_ _,
!B40R2 0 ,64 14
~K-ey-...~j IB1UR1 u_ _ _•B_1_ _.._
11_
~-~ ,_B_1UR_2_-+u
I __ _ _B_1_ _1
.1
__
1
~... I IBM1 u 'B2 12
.----. I_B_
2UR_2--+u----
,B-2--1~
·2- Mov~

0K Ii,_B_JUtt
_ _,_u_ __,B3_ _ _1,3_3_

....
B3UR2 U iB3
Car.eel I JB4UR1 u
i~B4_
UR
_2-~U
64
--~,B4
----
,4-
14 1
OK C4ncel Help

SM\ples

0 No averaging
0 Averegos for fact or:
~ Axonolaimus sp
lrrestmc,.. vj BlllhY1•""'-'s sp
- --
9.4975
....
0
13
----·0.5-·
.. Calyptronema :p 0 o.
0 0,
n n
Oupkate ...
' MisstnQ ...
Meroc ...
0 Averages for indicator:

. .
i
Rank variables...
Sum...
Transform (lndivldulll) ...

_,
I Transpose...

Stop Tasks
()I( Cone el Help

L_:>ptlons ...

Factor keys A further button on the Edit>Factors dialog box is Key, which you could try with the factors for
tana. With the cursor somewhere on the TreatmentBlock factor, clicking on the Key button gives a
display of the symbol type and colour for each of the 8 factor levels that will be used on ordination
plots etc, and also the line style for joining points (e.g. in dominance curves, see section 15). Any
of these (local) defaults can be changed by clicking on one of the cells of the display: clicking on a
colour gives a colour chart (of 48 basic colours and a vast range of custom colours), on a symbol
gives 13 symbol types, and on a line gives 5 line styles. Changes will only apply to the specific
)l
factor. They can be made in advance, like this, or later on when the plots are displayed - and the
changes propagate through derived windows, or ones that are precursors to the current window.

IIB r r Uil lB (:i]O


~I C l!.:i'l 13
a 111 Sil
11 - - - - - - - I aClLBD•lll•a
•ITa••••••
Car.eel
OK
I ---------
Con:: el 1 ·---------
Help
••••••••
•mm mimr. • r
../..• 1 ·----------- Custom colors:
Help --.___,.------1 11r111 r r
1 1 r 1 r 111
lielp
l Define Custom Cobs» ~
I OK ll Cancel

./.

/
.' 33

I-·•-=-- - --.:..,...,... -
2. Factors

Importing We shall see later that new factors can be created at several stages during an analysis (e.g. when the
factors active window is a resemblance matrix or even a plot) and the information is propagated both
forwards and backwards to any sheet or plot connected to the current window. However, when two
sheets are in the same workspace but otherwise unconnected to each other (neither is derived from
the other) then factor infonnation can be transferred between them using Import on the Factors
dialog box. An example is the tapa sheet already open in the Tasmania workspace. Edit>Factors
shows that it currently has no factors defined, but its samples (and, importantly, their labels) are
identical to those for the tana data sheet. Taking lmport>(Worksheet: tana) & (Select) gives
another selection box, listing the factors that will be Included in the import from tana. Drop one of
·-
those, the purely numerical Block factor, by clicking on it, and moving it to the Available list with
the GJbutton, then three OK's will produce the desired transfer of three factors to the ta pa sheet.

;B tawtc
S ·{) tana ·...:.·_
8 ·" Average!
I - ·D Dota1
1-b tepa
-{) taps!x
1 D tapecs lnckJde
r -D tapa3e Trcotmert
-0 IJIPOV4 I Add ... 0
GJ Blocic name
Trealmcnt01ock
0 '..!:..

I Combine ... I 8 '--


I Rename... I
I Reorder ... I
EJ
ltana vj
Dcle1e... I EJ
~
Mov e
Key... I Sclcct ...
1 ,(.

rnpoo ... ~ OK I c~ Help


Cancel Help

OK I
Label An alternative, but clumsier, means of transfer is to Add blank new factors to the tapa sheet and
matching Copy and Paste the contents of the tana factors to these cells. This is also less general, because it
will clearly only make sense when the samples are stored in the same order in the two sheets. By
contrast, Import operates by matching up the sample labels in the two files and can therefore re-
order the factor levels appropriately if the samples are in a different order. This is a general feature
of PRIMER 6 - a lot of use is made of label matching across data sets in this way. A corollary, of
course, is that labels have to be linique within a set, and carefully checked for accuracy across sets.
If the t\vo sets of sample labels are not identical, but do refer to the same set of samples, in the
same order, then a Copy and Paste of the factor content is the only way of transferring the factors .
An example of this would be in transferring factors to tapacs.csv from tana.pri, since the samples in
the comma-separated file, tapacs, were labelled slightly differently (DBlRl not BlDRletc).

Importing/ An important feature of PRIMER 6 (unlike PRIMER 5) is that factors can now be held with the
exporting spreadsheet that is opened from Excel or a text file, and will automatically be available in the
factors for PRIMER format. Similarly, files that are saved from PRIMER to Excel (* .xls) or text (*.txt)
.xls/.txt files formats will automatically export the factors also. When the data has samples as columns, any
i-.
factors must be placed in the Excel sheet as additional rows at the bottom of the array, separated
from the data by a blank row. When samples are rows, the factors are held as additional columns
to the right of the array, again separated by a blank column.
Examine the format for text files, by looking at tapatx.txt in Word or a text editor, and look at the
Excel format by first saving tana.pri from PIUMER into *.xls format, with File>Save Data As>
(Fi le name: tana) & (Save as type: Excel Files (*.xis)), and then opening the file in Excel. For a
precise description of how text format files of rectangular data arrays (and triangular resemblance '
matrices) hold factors and indicators (defined below), see the Help system: Help>Contents>File
types>Text file formats (under the section headed 'Text'). The best way to see the appropriate

34
·~r:-~1
. 2. Factors~!
J l..l.1

input text fonnat is to output an existing PRIMER sheet as a text file. Note that the 3-column text
fonnat does not allow import/export of factors with the data. They are best held as columns in a
separate text or Excel file, and copied/pasted into new factor columns added to the Factors dialog.
- ·- ·-··
e.,t11pa~t~.; Notepad ....
- -

- .....
.
··'·· ~§r8]
F4e Edot Format 'llow Help !'ID f4o f;dit ~ lnwl fQ!'IMt lads Q4(o 5-1'1.US !i!'.ndow t;jcb AdotlcFOF ~~
,..
Tasman1an c~epod s
BlO B1DR2 B2DR1 B2DR2 B3DR1 :...... Io ~Iii 15 1~ ra I ....,· · ~ ·I: r- ~i o 100---:--: rJf -:~ ;-i ~ -:
Ame1ra s~ 43 63 4 5 - Al ·I • Tumiinian ne malodes- - - -- .. -
Apodopsy lus sp 0 0 0 0
Ect1nosoma sp O 0 0 0 ,•. 1--' A I B_ _ _ ~___J_Q . _ _E
Ect1nosomat1dae sp l 15 14
'# Halosh1zopera sp 0 0 0 0
810R1 81 0ITT- B2QR1 82DR2 830R1
~- Leptastacus sp A 30 97 27
.."• Leptastacus s p e l ll 3 0 0
- 0 0
Lept astacus sp c 0 0 0 10 B.995 12 14
M1ctyr1cola typ1ca 0 0 8
Parevansula sp o 0 2 0 rP 0 0 0 0 0 •

,~I
Qu1nquelaophonte sp 0 0 0 I 0 0 0 0
Rh1zothr1 x sp l 0 0 0 0 1.265 0 0
undetermined A o 0 0 0 r
undeterm1ned B l 0 0 0 -
undeterm1ned c 0
undeterm1ned o 0
0
0
0
0
0
0

~I
0 0
82
0
ti~-
0
83
undeterm1ned E 0 0 0 0 f' 2 2 3
-
Treatment
Block l
0
l . 0
2
0
2
0
3
-
I
02
**IM'*~ ll
03

( ,i ·•. :
"'. - I ,:>.,:~
,1

i;:. Importing The DOS-based PRIMER v4 held factors in text fonnat conversion files, )Vi~h * .cnv extensions.
·;r, factors from Each file held only one factor, with each row of the fonn: [sample number]=(factor level]. The file
old v4 files tapacv.cnv displays the fonnat for the combined treatment/block factor. ·If relevant to your data,
./' examine its fonnat with Word or a text editor, then import it to (say) the 'tapatx datasheet: with
...,, tapatx as the active sheet, take File>Import>PRIMER 4 Conversion FIJe=::-(File name: tapacv
.cnv)>Open>{Factor/Indicator name: TrtBlk) & {Type•Factor). Check tha~ it has imported as it
should, using Factors on the right click (or Edit) menu.

Creating 'Indicator' is the tenn PRIMER uses for a factor defined on the variables not the samples. It is
indicators convenient to use a separate tenn because 'factor' has a well-established statistical mean ing (e.g. in
on variables ANOVA-type layouts), and refers to structures defined on samples, not on variables. Indicators are
also much Jess used than factors in practice, with the main application being to select out subsets of
variables for use in the analysis of samples (e.g. only analyse the metals data, rather than all
environmental variables; only analyse the zooplankton da ta, omitting the phytoplankton species
etc). Creation and manipulation of indicators, however, is almost exactly as for fac tors, with
parallel choices obtained via Edit>lndicators.
An example is given in the tapa.pri data sheet. Edit>Indicators (also on the right click menu when
the mouse is over the data sheet) shows the indicator 'Genus identified? ', which records whether
each taxon has been identified to generic level (1) or not (0). Note the form in which indicators are
held on input or output to Excel (or text) files, by saving tapa in Excel format with Filc>Save Data
As>(Save as type : Excel Files (*.x.ls)) & (File name: tapa). Opening the created tapa.xls file in
Excel, you will see the additional indicator column to the right of the sheet, separated by a blank
:ti column from the data matrix itself.

:3]§.1]~

1 Add... 1 rlebol o= ldertH , - - - - • - - • , --=::.-_ _·_G) ~J >: ~ ~ ~_.._

Invert HIQhiQht
- - - - IAmeirosp 1 El •I .;
Cut
I Combrle... I ,Apodop, yA;s sp1 A I B J. p I a I R . s T
Copy
- - - - IEctrosome sp 1 ,..1_ Tas~;ini~ copepods •
Posto
Insert
I I
Rename ... !Ectinosometklaes 1
P
..1...
3 Ame1ra
.
sp
-· ,81DR 1
43
84UR1 °84UR2
142 96
Genus iden1ified?
1
Delete I I
Reorder... !~slllopera sp 1 "'T APodopsyllus sp· -0 3 2
Move 1Leplostac= sp A 1 5 Ect!!!~.P.'1'.l~. sp_! _0 6 7
Sort Delete... j ILepeMtacus •PB 1 G E~lino~'!matidi!,!',_ _ _ 1 2 '1
~--- ILeptMtacus sp c 1 · 7 Haloshizopera SJ O • O o
Propetties ...
L..bels
I Key... I IMictyricolo typica 1 ..!.. L!JJE~~C'!!..sp!_:.·=- _30 6 2
!Parevensulasp 1 ..!.. l,!pt~~e...L_ _J __ o· __ O
Foct0<s ...
~ ... IOulnQuoloophorte • 1 .1!L Lapt~sl~c'!s sp ~. __ ·-· O ... o · o- -- - ·
~--- 1runoUY1xap 1 ...11. ..~·W.!ico!~ 1ypic.. o o o. - •·· · , •
I OK j ll.klclatenrkledA 0 1• : ~ H ~j.!.1-!J • :... _J.!_)
llhleterrTWledB 0 ~ I r - §:ima47 rr-r-1 r -· ·
Concel l.JndcternW\ed C o_____ Jv
""·.
35

-,.: .. -:. _!; ~.;._-:;-- _ _


"::.:;:::,__
2. Factors

Indicators · Selection by indicator.}evels is demonstrates by Select>Variables>(•Indicator levels)>(Indicator


in·selection name: Genus identified?)>Levels>(lnclude •lJ & (Available 0), giving a subset of the tapa data
sheet which drops the undetermined species. Of course, for such a small data set there are simpler
ways of dropping these last five species - see the range of selection options in section 8.

tiohll#ed
~ ... (!)ncteoeor levels

0
hdic:oCor name:
lOerus lderllfJed? v I
Uso n-most mportorl where n b
Levett... ~
Avaloblo

~-1
8
- Leptostoeus sii A
0
30
0
97
0 Uso those thol contnWe oC I0<1st 10 EJ ~.:.::Le::::.pto:.:;st:.:.:ocu:.:.:t:...:$1'""B::......i ·---::1--· ~
~ Leptostocus sp C 0 O

El ..
~
OK Concet
OK Cone el Rtl::r.otl'tlx sii 0

Now reverse the selection by Select>All (and Edit>Clear Highlight if you wish), and resave the
tawk.pwk workspace, using File>Save Workspace, for use in the analyses of later sections.

Aggregation Finally, one apparently obvious application for indicators is to specify which species belong to
files (versus which higher-order taxonomic groups. If separate multivariate analyses are required by maj or
indicators) phyletic group, for example, then the different phyla should certainly be set up as an indicator on
the species, since this will allow selection. However, the full range of 'nested' indicators
represented by a Linnaean classification (which species belong to which genera, which genera to
which families, families to orders, etc) are usually best held as a data matrix in their own right,
termed an 'aggregation file' in PRIMER. There are options for reading in aggregation files in
PRJMER 6· & 5 (binary) or 4 (text) formats (* .agg or *.µm 1), also from rectangular Excel and text
files, ahd most of these formats are also available for output. An example w ill be seen later, in
sections 4, 9 and 14, though you can see the simple rectangular format now, if you wish, by
opening the file gfagg.agg from the C:\Examples v6\Grdfish directory (and saving for later in
workspace gfwk.pwk).

Taxonomy for NW European shelf groundfish


Species
Roja rrictoocelato
Oeros
Roja
Farnty
Rojdae
Order
RAJFORMES
Ooss
CHONORICHTHYES
"'-
Rojo bradlyuro Roja R&Jdoo IRAJFORMES CHONDRICHTHYES
Roja mon1oiµ Roja Rol<IOO IRAJFORMES CHONORJCHTHYES
'-
Torpedo rnonnorato Torpedo Torpednldoo TORPEOINFORMES CHONORJCHTHYES
Torpedo nobleno
Squolus ocorthles
Torpedo
Squolus
Scytorhilus c:oNcUo!Scyliorl'inu3
Torpeclnidoo
S<\UoldOCI
Scylortr.kloo
TORPEOINFORMES CHONORICHlHYES
'SOUAlf'ORMES CHONORJCHTHYES
ICARCHARJlllFORMES CHONDRICHTHYES ...,
r-
Individual columns from aggregation files can be copied and pasted into indicators, if needed for
selection operations, but there are several reasons to hold aggregation information primarily in a
separate data matrix. Historically (v4), as the name implies, the main use was for aggregating u p
species abundances to genus, family, order, . . . level infonnation, to judge the changing
interpretation from a study under coarser identification of the taxa (see Chapter 10 of the Methods t:- .....
manual). The Tools>Aggregate routine to accomplish this is discussed in section 9. More "
recently (v5), diversity measures were introduced which utilised the taxonomic relatedness of
individuals or species found in a sample. These taxonomic distinctness measures (see Chapter 17
of the Methods manual) use aggregation files as part of the calculation - see section 14. Newer
developments (v6) also use such files in constructing resemblance measures between samples
(taxonomic dissimilarity) - see section 4.

36
:J"r.· ··
3. Pre-treatm; nt

I 3. 'Pre-treat1~ent optloris'.·' I
;. · s"'"l.,,.,......,:C·
r:· tan ar 1S1ng ·· How the data are treated, prior to computation of a' resemblance matrix (e.g . similarities), can have
.......~_. .,"~: d., ·.,'·a·'"''""'-
·
·~ . ·:--·.~ ·/'.
an important influence on the final analysis, and such decisions often depend on the practical
context rather than any statistical considerations. For example, standardising the samples (by tota l)
divides each entry in the data sheet by the total abundance in that sample, across all variables
(species). This would turn assemblage counts for each sample into relative percentages (what is
referred to by statisticians as compositional data), all samples then adding to 100% across species.
It thus removes all differences in total abundance in each sample from the multivariate comparison
of samples. Sometimes this may be desirable, e.g. where the unit of sampling cannot be tightly
..,... controlled. An example is seen below, for the prey taxa in the gut contents of fish predators: the
1
quantity of food in the gut varies across the fish in an uncontrollable way so is not rel evant to a
multivariate comparison of the prey composition, and the data should initially be standardised. On
the other hand, a typical marine impact study, using sediment-dwelling fauna sampled by a corer of
·' fixed size, more _strictly controls the quantity of material in each sample. It might then be
a
important to use the fact that potentially impacted site contains 5 times fewer individuals, in total,
than a control site, so standardisation would be undesirable. The philosophy in PRJMER 6 is that
users control all such pre-treatment decisions, combining them in an order under their choice,
appropriate to the context. Each pre-treatment step results in display of a revised datasheet so the
user can see its effect, before proceeding to analysis (or in some cases a further pre-treatment step).

('V-./ Australia Dietary data on the gut contents of 7 marine fish species found in nearshore waters of the lower
fish diets)
. --)
west coast of Western Australia are reported by Hourston, Platell, Valesini & Potter (2004) J mar
Biol Assoc UK 84: 805-817 and Schafer, Platell, Valesini and Potter (2002) J exp mar Biol Ecol,
278: 67-92. Data is volumetric quantity of each of 39 dietary categories (broadly classified taxa) in
a total of 68 samples across the 7 fi sh species (unbalanced replication), each sample being from a
pool of 5 fish guts. The data matrix can be found in C:\Examples v6\WAfdict\Wafd.pri.
If you have a workspace currently open, File>Close Workspace (you may be prompted to save the
current workspace before closing, if you have not saved it since making a change), then open
WAfd.pri. Take Analyse>Pre-treatment>Standardise>(Standardise• Samples) & (By •Total) &
(-./Stats to worksheet). You will see from the resulting sheet (Data!) that samples are now
expressed as % composition of each prey category, the columns adding to I 00.

E®®'-'·~1f
Co.mktc slll!1ples~ -
- - -- - --1 Tr«lSform (ovetal) ...
BEST... Weioht variables ...
CASWELL... Dispersion wekjltinQ ...
DIVERSE.·· Normalsc v!lllables
OOr-D!S ...
LINKTREE...
PCA...
SIMPER .. .
S!MPROF .. . TOI(!)
56.8
-"
OomNnc:e Plot .•.
61.3
Or.>ltsman Plot ...
57 .72
49.54
38.36
60.8
StllMardise< 28.84
0 S~es (usualy) 36 .3
7.4
Qvorlables
41 .48

....
0 Stlllls to wotksheet

~- eance1 I. ·,

...,

...! 37
3. Pre-treatment

Stats to Several of the routines in PRIMER 6 also incorporate a 'check box' for sending summary statistics \.
worksheet used in that routine to a further worksheet. Here, this results in a second sheet (Data2), which is
just a single column of totals across prey spec~es for each of the 68 gut samples. This is therefore
an easy way of obtaining sample totals in a form that can be further manipulated. Another example
of summary statistics being sent to a separate worksheet is for the Normalise pre-treatment option -
see below - for which the mean and standard deviation of each variable, used in the normalisation \.
86 ~ process, can be sent to a separate sheet. Save the workspace for later, with File>Save Workspace
As>(File name: WAwk.pwk), and close it.

Transforming Transformation is usually applied to all the entries in an assemblage matrix of counts, biomass, %
(overall) cover etc, in order to downweight the contributions of quantitatively dominant species to the
similarities calculated between samples. This is particularly important for the most use ful, and
commonly-used, resemblance measures like Bray-Curtis similarity, which do not incorporate any
'
form of scaling of each species by its total or maximum across all samples (an example of the latter
would be the Gower similarity, which is generally less effective than Bray-Curtis because it
overcompensates _by giving rare spec~es _exactly the same weight as common ones). The more
severe the initial transformation, the more notice is taken of the less-abundant species in the matrix.
It is for the user to choose a b;ilance between contributions of dominant and rarer species, in the
applied context, picking from the sequence: None, Square root, Fourth root, Log(X+l) and
Presence/absence. (Reduction to presence/absence, i.e. 1/0, is thought of as a transformation ~
I
because it would be the logical end-point of taking ever more severe power transforms: square root,
4th root, ·sth root, ... , and it is clearly one way in which less abundant species are given the same
weight as abundant ones.) If standardisation of samples by total is also required, for example to
ameliorate th~ effects of differing sample volumes, it is logical to apply the standardisation first,
then transform. For further discussion of transformations see the Methods manual, chapters 2 & 9.
32~ Open the previously saved workspace ekWk.pwk from the Ekofisk directory, with File>Open. If ·.:..
this does not exist, open the files ekma.pri and ekev.xls into a clear workspace. With ekma as the
active sheet, choose Analyse>Pre-treatment>Transform (overall)>(Transformation:Square root)
and repeat on ekma with the fourth root transform. Look at the two resulting data sheets, Data 1
and Data2, in relation to ekma. (One neat way of doing this is to Window>Close All Windows and
click ·in the Explorer tree to display only the three windows you wish to see, then take
Window>Tile Vertical.) Note how the (rather severely) fourth-root transformed matrix reflects
mainly the presence/absence structure, with a modest amount of added quantitative infonnation
from the more abundant species.

Resemblance ...

BEST ...
CASWELL .. .
DIVERSE .. .

Fie Edt Select .,._ ~ Took """"°" ~ ,


D ~ i;J JS ~ 1 ~ Ill!. ~ fJ J:) ~ ft p !J1i ~ I ~ ' ·~
;.t,e1cw. I
.;; nS 1'
'"""0-..el Tr'""jI l ..
0 0eta1 I
Abundor.ce Abundano"'13 ,,bundanca
EC 1' 0-..al Tr.,,.-
-t:i Oetol I
1

n... I .............. '°


S30 S36

-·p S30 S36

-·p S30

I
Slemelolslrrlcola
- ncrnot•
•• Anolklesgroeri
: Beene n.....
Ner..,,.,.,. p.n::lal•
0
1.6818
1.6266
0
1
2.933
1.1892
0
--·
Slemeleb lilricola

' Metidesgr
• Eleone !llVI
Nerrinyre l>U'dll• _
2.6458

__
0
0 _1 _
I
_.:i
s i . -........
Pholoenc.rn.:1
Meotides gr
• Eleonellave
Net.......• pu'lciola
----o·
--··-~

----
- - -0·
- - -0 -·
8
7
0
1
'
74
2

Nol*tv• tonoo selos 1.8212 Nopttys tonooselos 1 3.3166 1 · Neot'CY! longo!d:OS


"
•• Nei>ttyo ceece
NelHYs tP
Ne!lltv• .,_.
Ne!lltvstP
H·"·ii
l .3166 .•1'2
frrltoottys caec•
Ntottys so
- -6- "
2
2
Ot,'COr•lbo 1.5M1 Otrc«a- 2.'495 2.828( Otveer• et>& 8
~mocuoe. 3.41141 12 8
~--
1.8812 0«'4&Cla tMOl.Jate
--0- ~
Olydndo- 0 o~ Ol)oCrde nordmoml 0 Cllycroe nordnWrl 0
<i I >I ( ~ l ( I
Row 1 Col 1

38
3. Pre-treatment

~#T:"·~..r't ;µ1~..c::- ~-

~ Normali sing Transformations may be appropriate for environmental variabl~s too, though usually for a different
:·'y~riabtes :. · reason (e.g. in order to justify using Euclidean distance as a dissimilarity measure on normalised
:.:.1 ~~..;_:_:£~. ~:.. . variables). However, these are usually selective transformations, required only for some variables,
and with different transforms potentially applicable to variables of different types. The Tools>
Transform (individual) menu item caters for this, and can allow user-defined expressions if a
specific formula is appropriate to a certain variable, rather than the global Analysc>Pre-treat-
ment>Transform (overall), which applies the same simple power or log transform to all variables.
Individual transforms are discussed in section 9. Here, we shall simply select three of the variables
..., from the ekev sheet, which do not require transformation, to demonstrate normalisa tion.
It is typical of a suite of physico-chemical or ecotoxicological ('biomarker') variables that they are
·· ~ not on comparable measurement scales, unlike species counts or biomass etc. All multivariate
analysis methods are based on resemblances between samples that add up contributions across the
~~~
variables. If the similarity or distance coefficient does not have some form of internal adjustment
to place variables on a common scale (for example, Euclidean or Manhattan distance measures do
not), then it is important to pre-treat the data to achieve this. The standarcj means of doing so is by
normalising. Literature terminology is inconsistent here, but what PRIMER means by normal ising
is that from each entry of a single variable one subtracts the mean and divides by the standard
deviation for that variable. This is carried out separately for each variable. It does not convert the
variable to normality - that must be done first (approximately) by transfonnation - but it makes the
j
mean 0 and standard deviation I, so that all variables now take values over roughly the same limits:
typically -2 to +2 covers most of the entries. This process is sometimes known, especia ll y in the
statistical literature, as standardisation. PRIMER reserves the term standardise for scaling positive
quantities only, by dividing by their total or maximum - variables are not additionally centred.
Normalisation achieves a location shift as well as a scale change, and thi s is crucia l in making
environmental variables which are not positive 'amounts' of something comparable wi th each other
(e.g. zero has no significance for temperature; redox potential can be negative as well as positive,
as can some biomarkers like 'scope for growth' ; marine salinities will often fluctu ate over a tight
range well away from zero, and so on. Standardisation makes no sense in these cases.)
In the ekwk.pwk workspace that should still be open, find the environmental matri x ekev and select
just the three variables Redox, %Mud and Ba(riurn). Section 8 describes the different ways of
selecting but here this is most easily done just by clicking on those column labels, to highlight the
three columns, then Select>Highlighted, which results in only these three being displayed. They
do not need transformation (if they did, this would need to be carried out before normalisation).
Take Analyse>Pre-treatment>Normalise variables>(./Stats to worksheet), and note how the
resulting variables now take values over comparable ranges, roughly - 2 to +2. They are now ready
for entry to Analyse>Resemblance>(Measure•Euclidean distance), on page 45. Save ekwk.pwk.
'li'.

~.,
~l~I~

"12, ekwk
~ tJ ekma BEST...
E ~ Overol Tr' CASWru.... Redox %Mud Bo I'll ,..
! ' 0 0 11101 OtVERSE .. . . --- ·1.3992 2.8275 0.11 688
I 8 ~ Overol Tr OOMOtS.. . l. 536 ~s...=.· 2~226.= 0.1~s5f
:.. D
- ':.) O!do2
elcev
LINKTRfE...
PCA ...
SIMPER.. .
· , " · · Variabt~s·: ... •:, · · . · ~·,,• .· ..
Bo
l
•'f~
~ ~7 ~: -~~~_!;5_~~~
·0.494161
.0.12394
-· 1.001 ,
.0.11211
7.3901E·2
1.684 R -::
.. 11 .sq 1ss1
Q.10iil-. - 1:"304_1_1_ 1.:2399
I
SIMPROF.. . ~
11 1es ! 10.ssl 1997
Dominance Plot ... 12.29 1913 ~ : 4~~-j" i·~-:-~~~- _}:issi-
'I Otoftsl!Wl Plot ... 7.58
. o.41 083 o.41601 r 2.9300
~4 ·- -· 098496-
Geometric Ooss Plot ..•

..~
. ••• -0 .37075 1102•
532 0 .20515 . 0.31551
538 .Q.32962
. ·1.0084 ;·- - - - ·-

0 511111: 1o worksheet

Meon SO
43 f-
~ I Cencel I Redox 148.03 48.619 ws:~--' ----l---------
')(,Mud 5.2012 1 2.3706
\~, "- 18-44.6 I 1047.1

''.\
39

·---~----- ~ -. ~~·-· ...


· ·'f ~. - H.J•"I"-
3. Pre-treatment

Dispersion When variables are on different measurement scales, there is little viable alternative to nonnalising
weighting of each variable (as above) thus equalising, in effect, their contributions to the multivariate analysis.
species When variables are (ostensibly) on the same scale, e.g. species abundances, then their respective
contributions to commonly-used similarity coefficients, such as Bray-Curtis, will differ, based on
the relative magnitude of counts (or transformed counts). Larger abundances are always given more
weight (unless 'transformed out' to purely presence/absence). This may not always be desirable,
however. For example, some numerically dominant species may give highly erratic counts over
replicate samples within a site (or time or condition), perhaps due to an innately high degree of
spatial clumping of individuals (individuals of that species arrive in the sample in clusters). This is ~
likely to add 'noise' rather than 'signal' to the multivariate analysis, and downweighting of such ~
species is called for, in relation to other species which are not spatially clustered, but have the
lower variance associated with Poisson counts (the individuals arrive in the sample independently ~ ··~

of each other). The weighting is achieved by the Analyse>Pre-treatment>Dispersion weighting ~- ...... a

procedure, one of the new research routines found in PRIMER 6 (Clarke KR, Chapman· MG, ~·
Somerfield PJ, Needham HR, 2006, Mar Ecol Progr Ser). ~t
\

The differential do\vnweighting is carried out by dividing the counts for each species by their index ~
of dispersion D (variance to mean ratio, a 'clumping' measure), calculated from replicates within

a group (site/time/treatment etc), and then averaged across groups. The procedure is valid under ......\
rather general conditions, which are not unrealistic, but does require: a) data which are genuine ~~
~ species counts, not densities standardised to some unit volume or area of substrate; b) independent
replicates within each of a set of sample groups, so that there is a basis for assessing within-group i
variance structure; and c) those replicates to be of a uniform size (strictly 'quantitativ~ sampling'). l.
Downweighting is only applied where a species shows significant evidence of clumping, this being ~
tested by an exact permutation test, valid for the very small counts that are typical of many species. ~
The resulting dispersion-weighted matrix has a common (Poisson-like) variance structure across
~
-
species but unchanged relative responses of species in different groups. This is an important point: ..
there is no attempt here to place greater emphasis on those species which best show up a given
group structure (e.g. best separate control from polluted conditions). Such 'constrained' methods "~
run the risk of circular arguments: selecting out only those species that tell you the answer you
wahted in the first place! All that dispersion weighting does is divide through each row of the
matrik (species) by a constant, so that a different balance of species contributions will be obtained
byjh~ subsequent analysis. These weights are calculated solely using information from replicates
wiihid each group, not across groups, so a consistent species (low variance-to-mean ratio within
,
~

~.
~

gro~ps) will be given a high relative weight even if it shows no difference at all between groups.
~
If dispersion weighing of a count matrix is contemplated, this pre-treatment step must be carried ~ {~

out before any transformation. It may still make sense to transform the dispersion weighted data A)
sheet::: a species which has large mean abundance at some sites, and is found in very consistent ~
numbers in all replicates from those sites, will still tend to dominate the resulting Bray-Curtis
simila.rities, for example. Transformation· now has the strict objective of balancing up the
~-4,
contributions of consistent' abundant spe~ies With equally consistent but less numerous species. ~­
\~!
Previously, it was really used for this purpose and '.for reducing the impact of large but erratic ~
numbers of some species. The latter can now be catered for by dispersion weighting and experience
shows that subsequent transformation is, ~~quired less often, if at all, and not so severely.
~ ·~.

~
(Fal estuary Sediment copepod assemblages (and other fauna) from five creeks of the Fal estuary, SW EQgland, .;{

copepods) · were analysed by Somerfield PJ, Gee JM, Warwick RM 1994 Mar Ecol Progr Ser 105: 79-88. The ~
sediments of this estuary are characterised by high and varying concentrations of heavy metals, a I\
result of tin and copper mining over hundreds of years. The copepod data consist of 23 species
·~ ~
found in 27 samples, consisting ofS replicate cores spanning each creek (Mylor: Ml-MS; Pill: PI-
PS; St Just: Jl-JS; Percuil: El-ES; and 7 from the largest creek, Restronguet: Rl-R7). These are in
the Excel worksheet Biota. 2 from file C:\Examples v6\Fal. There are also environmental cores (of
silt/clay ratios, heavy metals etc) matching these 27 sample locations, held in sheet Env vars, and
nematode communities, in sheet Biota.. i .. The Excel file also holds abundance and biomass data for
soft-sediment macrofauna, Bio~ ~. abund and Biota. 3 biom, and aggregation matrices for all three
faunal types, specifying which species are in which genera, families etc. (In fact, this data will not
be used again in the manual but is included here partly as a good example on which to try out a
wide range of th~ PRIMER tools, when all have been met, and compare with the source paper.)
40
3. Pre-treatment

r- · .:-: -~- 'f"· •

Rename Data Close any existing workspace, then in a new workspace, File>Open>(Files of type: Excel Files
I
....:.. .
"; .. .. '
·~.

'· .. (*.xis)) & (File name: Fa.xls)>(Excel worksheet: Biota 2), from the C:\Examples v6\Fal directory .
You might wish to look at the Excel file first (to note that the sheet has a Title, Row labels,
Samples as columns, and is of type Abundance). When the sheet is in PRIMER, change the data
name, w ith Filc>Rename Data>(Rename: fapa), change the sheet title, Edit>Properties>(Title :
Copepods), and add a factor, Edit>Factors>Add>(Add factor named: Creek), fillin g the factor
'f~. entries with repeats of the site letter (R, M , P, J or E).
.
'"'I Recent Items Open the Biota 1 and Env vars data sheets also - .h ere just to demonstrate a slight short-cut when
\ different sheets are to be opened from the same Excel file . You can cut out a couple of steps by
taking File>Recent Items> 1 C:\Examples v6\Fal\Fa.xls, which takes you straight into the Excel
File Wizard to read in Biota 1. A further repeat of this reads in Env vars. They will come in with
the same Data name, Fa (except that PRIMER will preserve the required uniqueness by calling the
second sheet Fa(2)), so you again need to File>Rename Data to fana from J3iota l, and faev from
Env vars. Again, change the title for the former to Nematodes, and Edit>Factors>lmport the
factor Creek from the fapa sheet to both fana and faev. Save the workspace if you wish to explore
later these files you have set up, with File>Save Workspace As>(File name: fawk.pwk).
Returning to the dispersion weighting procedure, with the copepod data sheet fapa as the active
window, take Analyse>Pre-treatment>Dispersion weighting>(Factor: Creel<) & (Num perms:
1000) & (./Stats to worksheet). The Datal sheet gives the dispersion weighted counts, which are
either ready to go into the Analyse>Resemblance routine, or could be mildly transformed before
they do so, with Analyse>Pre-treatment>Transform (overall)>(Transformation : Square root).
There seems little need for this, however, since the dispersion weighting has already succeeded in
downweighting the larger, erratic counts coming from P. /ittoralis, R. celtica, E. gariene and T.
discipes and the somewhat less erratic P. curticorne and M. falla - the matrix Data I now has no
dispersion-weighted 'counts' in double figures, and the subsequent untransfom1ed anal ysis will not
be dominated by a small set of species. Data2 gives dispersion indices D for each species, whether
there is evidence of clumping (i.e. that D departs from 1), and the actual divisor used, which is I if
the test is not significant at <5%. Absent species (e.g. Tisbe) are identifiable by a va lue of D = 0.
(They have a divisor of 1 and remain absent, of course).

~
Bl~~.1!-:-'."""~~~-r..;.._~-=f'..:;_~--=-L..:-~---:J,:..;.;,.~~-:-1'
~ 141l f ~.
Brlonol~ sp. O O
Pseudobfodyo 1c •· ·· i-:i_E~C·o.41014 1.1290
0 0 0 0
---~-
I Coned l --Hol-e-ctino_soma_?_g_
ot t -- -0- - - -o·i - - - o

< ••. 1 )
• f- Row t Col 1

41

""------------~--- ·---
3. Pre-treatment
.....,.

Other There are other cases in which variables (species) might need prior weighting, e.g. when a species
variable is known to be often misidentified, its contribution (and those of the species it is mistaken for) can
weighting be reduced by multiplying the entries in the two species through by some downweighting constant.
This is achieved by placing weights for., each species in an Indicator (see previous section) and
taking Analyse>Pre-treatment>Weight variables, supplying the indicator name. In this context,
most weights would be 1, with a value less than 1 used for downweighting less-reliably identified
species (since the common similarities, such as Bray-Curtis, are invariant to a change of scale,
default weights could be 100, or any number). A further context in which this routine might be
useful is to convert counts to approximate biomass, using a known average weight of an individual
of each species. Note that dispersion weighting is also, of course, just another special case of
variable weighting, in which weights are the reciprocal of the Divisor column in Data2, above.

Mixed data Another example might be in attempting to reconcile two different types of data in the same matrix,
types e.g. counts of motile ·organisms and area cover of colonial species. These cases are always
problematic. One solution is to use a similarity measure such as the Gower coefficient, which
scales the range of .each species across samples to be identical, but this generally performs badly
because very rare species are given the same weight as very common ones. A preferable alternative
is to use Bray-Curtis similarity as usual, but prior to that Weight variables to convert counts into
approximate area cover, species by species, or both counts and area cover into a rough estimate of
biomass, or even just to balance the two sets of variables against each other in some arbitrary way
(e.g. give the cover numbers 10 times as much weight, or 10 times Jess weight, keeping the counts
unchanged, and see what difference it makes to the analysis).

Cwnulating The remaining option· on the Pre:-treatment menu is Cumulate samples, which successively adds
samples up the entries across variables, separately for each sample. It can only be appropriate when all
variables share a common measurement scale, and when the order in which they appear in the
matrix is important; it is thus inappropriate for standard species-by-samples data. It may be useful
in analysing arrays in which variables are different body-size categories of a single species, or
different particle sizes classes in Particle Size Analysis (PSA) etc, and entries are the frequencies or
quantities of each size class found in each sample. Such data is traditionally analysed by univariate
methods, fitting parametric particle-size distributions and comparing parameter estimates across
samples. That can be problematic: histograms do not fit the models, summary statistics like mean
and variance do not capture subtle features such as bimodality, tests are inappropriate because the
data are not real frequencies, it is difficult to synthesise many such samples etc. This can be neatly
side-stepped by multivariate analysis, defining the simi,larity of pairs of size-class distributions. To
take into account the ordering of the sizes, and where histograms are not particularly smooth, it
may sometimes be preferable to compare pairs of cumulative distributions (sample 'distribution
functiohs') rather than the histograms ('density functions').

(Particle sizes Sediment particle size data from 6 size ranges at 3 sites (A, B, C), at 2 depths (2m and 5m), and 5
for Danish replicate samples from each site/depth combination are in C:\Examples v6\Sediment\ sed.pri. Open
sediments) the file and note that the data are already standardised to % composition (if not, you would need to
take Pre-treatment>Standardise>(By•Total) initially). Run Analyse>Pre-treatment>Cumulate
• -> samples>(Variable order•As worksheet). The variable labels are not now accurate, so you could
replace them by copying/pasting from Edit>Indicators: cumulative to Edit>Labels>Variables.
Save the workspace as sedwk.pwk for use in a later section.
Tools Wildow H

~ .
• v.. 1ati1e~ ·
A 2m r ep1 A 2m rep2
0
0 As worksheet ~- I
4E-2 0
0 .7 1 051
~
32.96 2159 0 By iicfalor. '
~
34.79
9.74 1
39.92
11 I . . .'.re;;;·~- -~2- j
21.74 I 21.02 1 ·~; .· .. ·.. ~lt:.1
,)
~ . · '...-_ I ·· I
~;
135 ~
Cancel J He_, .>

tf"
42
· 4. Resemblance:·.:1

4. ·Resemblance; similarities, dissimilarities and distances

Resemblance Fundamental to the operation of PRIMER and (explicitly or implicitly) any multivariate analysis, is
matrices an appropriate defi~ition of resemblance between every pair of samples, based on whether the suite
of recorded variables (species, environmental variables, biomarkers, particle-size classes or
whatever) take similar or dissimilar values. What is meant by 'similar' is a function of the context
and purpose of the analysis, and the PRIMER 6 user now has nearly 50 definitions to choose from
(many are covered by the general reference work Legendre P & Legendre L, 1998, Numerical
ecology, 2nd Englished, Elsevier, called L&L from now on). Within PRIMER, similarity is taken
to range over 0 to 100 (perfect similarity), dissimilarity is the complement (100-similarity),
whereas distance ranges from 0 to infinity. PRIMER 6 uses the tenn 'resemblance' to cover all
three concepts: similarity, dissimilarity or distance. All coefficients are symmetric (resemblance of
samples 1 and 2 is the same as 2 and 1, clearly) so the resemblances between every pair of samples
form a lower triangular matrix, without a diagonal. They· are displayed with the upper triangle
absent, and data type (second heading in the window) of Similarity, Dissimilarity or Distance, so it
should always be clear when the active window is a resemblance matrix and when it is a data sheet.
(This matters because the available menu options change with the active window type}. There are
two other triangular data formats: Correlation, which is defined over the range -1 to 1 and is
therefore not directly a similarity (though it might be transformed into one, in at least two ways -
see the Transform option in section 9); and Rank, whose values are the positive integers (with
averaged values for any tied ranks) and which can be used directly as a distance matrix.
Standard A detailed discussi~n of the competing properties of different resemblance m.atrices is outside the
resemblance scope of this manual (see L&L or Clarke KR, Somerfield PJ & Chapman MG, 2006, J exp mar
choices Biol Ecol). Novice users are recommended to take one of the main options (the defaults): Bray-
Curtis similarity for biological assemblage data; Euclidean distance (bavins first normalised) for
physico-chemical, biomarker, morphometric data etc, in which variables are not on comparable
ranges or the same measurement scale at all; and (non-nonnalised) Euclidean distance for
comparing body- and particle-size histograms (previously standardised), growth curves etc.
~
I
#!" Bray-Curtis The most commonly-used similarity coefficient for biological community analysis, because it
p similarity obeys many of the 'natural' biological axioms in a way that most other coefficients do not (see
Methods manual), is the Bray-Curtis similarity, defined between samples I and 2 as:
:~
:~ s,1=100(1 L;IY11 -yi21 ) = ioo. L;min{y,,,y12} .
.P L1Y11 + L;Y12 <L1Y11 + L1Y12)12
~~ The two fonns may not look identical but they are! Here y 11 is the count (or biomass,% cover, ... )
~
·r for the ith (ofp) species from sample I, and :E,( ... ) denotes summation over those species. Original
·~ references to coefficient definitions will not be given here: see L&L, whose numbering scheme is
followed wherever possible (hence S17 for Bray-Curtis) .
.~
:~ 39~ Open the workspace C:\Examples v6\Ekofisk\ekwk.pwk from earlier, and click on Datal to make it
the active window. If the workspace is not available, open ekma and Analyse>Pre-treatment>
.:~
Transform(overall)>(Transformation: Square. root) to get Datal. Take Analyse>Resemblance
\-1 and then the defaults on the Main tab (Analyse between•Samples & Measure•Bray-Curtis). The
·~ defaults are set by the data type; since P.atai is an abundance sheet this will be Bray-Curtis. A
~ lower triangular matrix is produced, R~s~ml. Edit>Properties (or right-clicking when over the
!"' matrix, to get the Properties menu item) shows it is of Resemblance type•Similarity from 39
~ samples. The History box carries through the knowledge of how it was created to a subsequent
)~ Cluster or MDS ordination plot. This box is not user-editable, though the Title and Description
\
boxes can be altered (changes to the Title, for.example, are carried forward to a subsequent plot but
~ not backwards, of course, to the data sheet Datal).
l~
Now repeat Resemblance directly on ~ICma, without Pre-Treatment. PRIMER tries to help here:
~ you will get a warning message that no transform has been applied - community matrices usually
~ require some transfonnation before calculating Bray-Curtis, though you can happily ignore this
~
warning if you are interested in the pattern of the few most dominant species only.
~
43
~
~
\ llA
'«:·"..'~..·,.;;:--:rr.~immnt ~.,:.. '.r':
~. l[f.JC~~~"·:<.
I.
:· ::•~ ·!;,:: :tllj'i~:'1'/!l}ll1·
• ,,. ,. . ., I.. , • }•f!tJIU'l:ili:;\i;:Hfi~W:W·
( •• • ' .~!~'hfl·Z.•:'H#llPlid'. f.,,, ·.~,

\.
4. Resemblance

Edit
D ~ ~ ; ~ [0. 11
h
~ el<wtc
e:H:.:i e1cma
I Re-;embl~c ...

BEST...
I Er1' Overal Tronsfl CASWEll. ••• Ekofisk oilfield macrofauna
I 1 ·ClDll!a1 DIVERSE.. .
Abundance
I S 1' OveraJ Tronsl DOl"OIS.. .
-D Dala2 UNKTREE .. .
'I

e-1) elccv PCA .. .


S30 S3S S37
E-1' Nonn~se1 Slerhelals imco1a o· 11
S!11'ER.. .
f ·D Doto3 SIMPROF .. .
Pholoe nometa 2.8284 B.6023 , .I
·b Doto4 - - - - - -- Analldes~oerland 2.6458 1.4142
Domlnonce Plot... ( I ii'J
<1
Drol~Plot ...

., (

Re~rnblance type
Al\l!lys~ belwc~ Transform: Square root
0Slmtorly
cs~blonce: $1 7 Broy Curtis simlorly
0Somplu 0 Bray-CU1Js simlorly
0 Dissimilarly
Ovorioblcs
1
0 Eucldcon cis!ancc .
0Dis1ance
0More(lab)
0 Correlotlon

QRank

·'

60.563 73.092 53.519


66.467 42.475
OK Cancel Help
.: S35 44.645 ss.n2 39.841 ; 66.4
( I :r

Zero-adjusted A simple modification to the Bray-Curtis coefficient adjusts its behaviour as samples become
Bray-Curtis vani~hingly sparse. Standard Bray-Curtis is undefined for two samples containing no species, and
can fluctuate wildly for near-blank samples - two samples containing just a single individual can _,
flucfuate between 100% similarity if the individuals are from the same species, to 0% if they are
not. The zero-adjusted Bray-Curtis coefficient (Clarke KR, Somerfield PJ & Chapman MG, 2006,
J exp mar Biol Ecol) damps down this behaviour, in an analogous way to the addition of the
constant 1 in the log( 1+x) transformation (to cater for x=O), by adding +2 to the denominator of the
ratio in S 17 • The simplest way of viewing this is as adding a 'dummy species' to the matrix, taking
the value 1 for all samples. This forces two samples with no content to be 100% similar (they share
the dummy species) and two samples with a single real individual now have some similarity,
whether that species is shared (100%) or not (50%). It is clear that once there are a modest number
of individuals, in either sample, then the adjustment makes no difference. It can on ly come into
force when the assemblage is virtually denuded, and should only be applied if it makes biological
sense to regard two blank samples as 100% similar, because both are denuded/or the same reason.
If blank samples can be present in completely different treatments/sites/times, because of small
sample sizes and highly clustered spatial distributions of organisms, it is probably wiser to remove
the blank samples and use standard Bray-Curtis, rather than calculate the zero-adjusted form.
The adjustment is made by taking the option:(./ Add dummy variable)>(Value: I) on the Main tab
of the Resemblance dialog box. The constant 1 is appropriate to integer counts, being the lowest
·. non-zero value attainable. This is true whether the data sheet has previously been transformed or r
not (the constant remains 1 under any power transform). For data on biomass, % cover etc, the
value could sensibly be chosen similarly as the lowest non-zero entry likely to be recorded (again
the analogy with the log(c+x) transform is appropriate). 'Adding a dummy variable' can be carried
out with all resemblance measures, but will only be effective for those coefficients which treat joint
absences of species as uninformative (e.g. Kulczynski, Czekanowski, Canberra similarity etc). It is
not offered as .an option if the Data type is Environmental (it makes no sense in that context).

44
4. Resemblance

l Demonstrate the effect of the zero-adjustment in a simple case by opening a second PRIMER 6
session (double-click on the fo icon) and typing in the simple data sheet shown below, by File>
New>(Type•Sample data)>(Title: ,T est) & (Samples as•Columns) & (Numb.e r of columns: 5) &
(Number of rows: 3). Then Analyse>Resemblance>(Analyse between•Samples) & (Measure•
Bray-Curtis similarity) & (~Add dummy variable>Value: 1). The warning message about trans-
formation will appear again but can be ignored (press OK). Contrast the resulting similarities with
_, those from Bray-Curtis without the dummy species - the adjustment makes a big difference here of
course. Now close this second PRIMER session with File>Exit (no need to save the workspace).

WARNING
(!)samples 0 Bray-Curtis slmllerly
Brey C\Xtls ·No ovcreDtransform or welghtlnQ hes been ~pplied to the dote
Ov111lables 0 Euclidean dlstence soc Pre-Treatment options
0 More(tab)
~I Cancel J

0 Add dummy verlable


Value:

66.667
S3 66.667 I 100 l.Xldc foned'
I
~I
S4 50 , 66.667 01 0
SS eo: so · so! 66.667 f 0· - - -0
<• I .. .I ... >

Euclidean Euclidean distance, an appropriate measure for environmental (and some other) data, is defined as:
distances
D1 =~L;(Y11 - Y12) 2
where the Yil & y;2 result from pre-treatment by transformation (sometimes) and subsequent
normalisation (often). The outcome is a triangular distance matrix, which orders in the opposite
direction to similarity: high similarity = low distance (= low dissimilarity). Note, however, that
the user does not have to worry about which way round the resemblances are ordered: all routines
will utilise the information given in the Resemblance type to make sensible choices.
The Ekofisk workspace ekwk.pwk should still be open, with the three environmental variables
(Redox, %Mud and Ba) selected from ekev, and then normalised in the data sheet Data3. (Note
that the original ekev matrix can deselect these variables with Select> All, to revert to the full data -
the previous selection will still be shown by highlighting though this can be removed with Edit>
Clear Highlight - but the derived, normalised sheet Data3 contains only the selP.cted three
variables, not the rest of the matrix.) With Data3 as the active matrix, take Analyse> Resemblance
and the default is now Measure•Euclidean distance, because the Data type is Environmental. The
result is a resemblance matrix (Resem2) of type Distance; the History box on the Edit> Properties
49 (- dialog shows its derivation as Euclidean distance on normalised data. Save the workspace again.

45
4. Resemblance

Accessing PRIMER 6 allows the user choice of a further 13 distance measures and 6 similarity measures for
other quantitative data, and 14 similarity measures utilising presence/absence (PIA) information alone,
resemblance mostly referenced by an L&L number. There are also a further 2 PIA dissimilarity measures which
measures exploit the taxonomic relationships amongst the species, and 10 'other' useful measures (including
correlation coefficients) that mostly do not feature in L&L. These can all be accessed, when the
active window is a data sheet (say DataJ or Data3 in the Ekofisk ekwk workspace), by Analyse>
Resemblance>(Measure•More (tab)) and clicking on the More tab at the top of the dialog box
(don't forget to click the More button as well as the More tab or all options will be greyed out).

Anoly$e between

0Samples
0 Sinll111ly P/1'. 0 Dl$1enee:
Overlebles l~s_1_~
__ll'llll_ ctw1g L~
_· _ _ _ _ _.... o. ,,. ..,,~(;'· r.i.~~-..,,,
Or •Y. qu~artt_e1_1v_e:_____ Mhkow.std r:
Ii
SViC.o·~---- _ . --"
C· .t..'*1 c1Urrrny QOlhers:
,,
V~tJe.

CY zero replllcem~nt V<1lue


101 I
OK Cone et
exc0-0 • exckd'lg Joirt absences

OK Cancel Help

S2 Rogers & Tenimoto


SJ
0
..
OfstllllCe:

03 chord distence
54 ~ 04 geodesic distance
SS 0Slml11tly quanlletlve: 06 Minkowskl metric
ss "' 07 Manhotten distonce
S7 Jaecard 518 KIAczynsld (qullrt) DB (exc 0-0) Czekanowsld
se Sorensen 519 Gower exc 0-0 09 Index ol essoclatlon
.., 9 _ _ __ ___,521 Chi squered sinlarly 0 Texonomlc disslmlerly P/1'.
5..,... D10Cenberre metric
S lO o.o
Cenberre si'!'i11tly exc 011 (exc 0-0) dvergence
S l1 Russel8 Reo ~Octl81
_ _s_ lml_ar_l._.C....,cr.J_n~>---1Thele• D13non me1ric coetnclcrt
513 Kuk:zynskl (P/A) B• 015 Chi squored metric
S140ctl81 K• D16Chisquered clstance
S26Felh "'-------.....,D17He erdlstence Mextnumdlstence
"'--~~~~~-;;~~~!!::!:....~~~___:j

Distance The distance measures defined by L&L and calculated by PRIMER 6 (in addition to Di) are:
measures
Di= -1 "L.iY11 - Y12) 2 average distance,
p

where the number of species p is fixed for all pairs of samples (so this is simply a constant multiple
of Euclidean distance Di and will therefore give identical dendrograms, ordinations etc);


Orloci's chord distance;

geodesic metric;

Minkowski metric,

where r can be specified by the user (note r= 1 gives Manhattan, and r=2 Euclidean distance) ;

Manhattan distance,

whose use of absolute rather than squared differer.ces confers slightly better robustness to outliers;

46
4. Resemblance

Demonstrate the effect of the zero-adjustment in a simple case by opening a second PRJMER 6
session (double-click on the fl; icon) and typing in the simple data sheet shown below, by File>
New>(Type•Sample data)>(Title: Test) & (Samples as•Columns) & (Number of columns: 5) &
(Number of rows: 3). Then Analyse>Resemblance>(Analyse between•Samples) & (Measure•
Bray-Curtis similarity) & (-'Add dummy variable>Value: 1). The warning message about trans-
formation will appear again but can be ignored (press OK). Contrast the resulting similarities with
those from Bray-Curtis without the dummy species - the adjustment makes a big difference here of
course. Now close this second PRUvfER session with File>Exit (no need to save the workspace).

_,
J

,,
".
_,
01
:I 1:
0 1

An..'llyse between
WARNING
0Samples 0 Bray-Curtis slmlatly
Bray Curtis· No overal transform or weiohth<;l has been appled to the data
0 Variables 0 Eucidoan distance see Pre· Treotment options
0More(tab)
~ I Cancel

E}Add <Unmy verlabkl


Value:

S2 S3

66.667
66.667 I 100
50 66.667 I o o· o
60 50 -SS.Ssi. - - -01-·---o· - -- o
·' •>

Euclidean Euclidean distance, an appropriate measure for environmental (and some other) data, is defined as:
distances
Di =~L;(Yil -yi2) 2
where the Yn & Yri result from pre-treatment by transformation (sometimes) and subsequent
normalisation (often). The outcome is a triangular distance matrix, which orders in the opposite
direction to similarity: high similarity = low distance (= low dissimilarity) . Note, however, that
,. the user does not have to worry about which way round the resemblances are ordered: all routines
will utilise the information given in the Resemblance type to make sensible choices.
'.(
The Ekofisk workspace ekwk.pwk should still be open, with the three environmental variables
(Redox, %Mud and Ba) selected from ekev, and then normalised in the data sheet Data3. (Note
that the original ekev matrix can deselect these variables with Select> All, to revert to the full data -
the previous selection will still be shown by highlighting though this can be removed with Edit>
Clear Highlight - but the derived, normalised sheet Data3 contains only the scl<-!cted three
variables, not the rest of the matrix.) With Data3 as the active matrix, take Analyse> Resemblance
and the default is now Measure•Euclidean distance, because the Data type is Environmental. The
result is a resemblance matrix (Resem2) of type Distance; the History box on the Edit> Properties
49 <- dialog shows its derivation as Euclidean distance on nom1alised data. Save the workspace again.

45
4. Resemblance

Accessing PRIMER 6 allows the user choice of a further 13 distance measures and 6 similarity measures for
other quantitative data, and 14 similarity measures utilising presence/absence (PIA) information alone,
resemblance mostly referenced by an L&L number. There are also a further 2 PIA dissimilarity measures which
measures exploit the taxonomic relationships amongst the species, and 10 'other' useful measures (including
correlation coefficients) that mostly do. not feature in L&L. These can all be accessed, when the
active window is a data sheet (say Da~C!.l or pata3 in the Ekofisk ekwk workspace), by Analyse>
Resemblance>(Measure•More (tab)) and clicking on the More tab at the top of the dialog box
(don't forget to click the More button as well as the More tab or all options will be greyed out).

Ovariobles
·0 g,jarlyPIA:
, IS1 Slrr1lle matcHng
"
~ ..
OD1stence:
o;:·~ Cf.1Q(· ~-;;- 1
......._________

I 0More(lob) •'·•
MH<o~r:
. b Sftllarly ~olive:
l.sisc-o·~r--_==-=-~----
,, I•
I .. ...o~ ritJTm\y 0 0thers:
v~~Ja:
0 Taxonomic dissimlartty PIA:
11 r: ·--·-· ... __ _
C;.mn...,. CY zero repl8cement v~iuo·

!01 I .-
exco.o • exckJdi'>g joint absences
OK Cone el

OK Cancel Help

0 Smlar:y PIA:
... . .. ....
0Dlstence:
.... . .
S2 Rogers & Tenanoto
SJ
S4
SS
J
0 Slmaer ly quonHatlve:
03 chofd distence
0 4 geodesic distance
06 Mlnkowskl metric
:

00thers:
.:.. ....
....,
••11 1.1.ti ,..
~ 11..
-... t1

S6 07 Menhatten distonce Spearmon rOllk correlation


SI 6 KIJczymld (qun)
S7 Jaccard
SB Sorensen
S9
S19 Gower exc 0-0
S21 Chi squared slmiarly
0
I
Taxonomic dlsstnaerly PIA:
06 (exc 0-0) Czekenowsld
09 Index of a ssoclellon
01 OCenberra metric
Weighted Spearman rank correlelion
Kendal rank correlation
CV
:
~nberra $inilarly axe 0-0
0 11 (exc 0-0) <ivergence
SIO
OcHal $inilarlv (auadl
m .
0 13 non metric coefflclert
Binomlal de't'lence (scaled) -
S11 Russel & Rao Theta• Binomial deviance

~-I
015 Chi squared rnelr1c Wald test (chi-squared) coe ff
SI 3 Kutcz ynsld (PIA) B• "'
S14 Ochiai
S28 Fo~h
K+
01 6 Chi squared distance
017 Heninger distance
Chi statistic
Maximum distance
-
v

Distance The distance measures defined by L&L and calculated by PRIMER 6 (in addition to D 1) are:
measures
D2 = -p1 ""'
L.{Y;i - Yi2 )2I
average distance,

where the number of species p is fixed for all pairs of samples (so this is simply a constant multiple
of Euclidean distance D 1 and will therefore give identical dendrograms, ordinations etc);

Orloci's chord distance;


·:..

D4 = arcco{ 1- ~ D 2) 3 geodesic metric;

D6 = (Lj y;i - Y1l fr Minkowskj metric,

where r can be specified by the user (note r = l gives Manhattan, and r=2 Euclidean distance);

Manhattan distance,

whose use of absolute rather than squared differences confers slightly better robustness to outliers;

46
Czekanowski' s mean character difference,
....

in the fonn where p 12 is the number of specie~ that are not jointly absent in samples 1 and 2 (the
changing denomjnator across pairs of samples, from excluding joint absences, can make a big
difference to a coefficient's behaviour, so is indicated clearly by 'exc0-0' in the drop-down box);

D - ! L l..1n_ _~ Whittaker's index of association,


9-2 'I~~

whose fonn is clearly seen to be (to within a constant) Manhattan distance calculated on a data
sheet whose samples are first standardised by total;

D 10 = L, t -y, ~
11

11 + Y12
2 Canberra metric of Lance & Williams,

which must exclude joint absences so that it can be defined, but is less µseful than its averaged
form, divided by P12, found as Canberra similarity in the quantitative similanty list;
2
_ 1 ~ Y11 -
D11- Y12
Clark's coefficient of divergence,
-£..J (
P12 I Y11 +Y12 )
also in the fonn in which double zeros are excluded from the summation and the divisor P12; ·
2
D - L 1 Y11 Y12 x2 (chi-squared) metric,
IS - I (
Yi+ L1Yi1 - L1Y12 )

where Yi+ =Li Yu, the sum across all samples of the entries for the ith species, and effectively the
same, to within a constant, as the following;

x2 distance,
the implicit distance underlying Correspondence Analysis, which is seen to be a type of Euclidean
distance, from samples which are standardised by their totals across species, and then inversely
weighted by species totals across samples (the double standardisation being responsible for the
practical difficulties 1 2 distance can have with rare species, for which the divisor is near zero); and
2
D 17 -
-
L
I [ ~ L1Y11 - ~ L,Y12
Y11 Y12
]
Hellinger distance, advocated by Rao,

the only omission above being D 13 , which is simply the complement of S0rensen similarity, S8 •

Similarity to L&L also assign D14 to Bray-Curtis dissimilarity, the complement of St 7, defined earlier. Though it
dissimilarity would only be needed if exporting to other software, this can be simply achieved in PRIMER by
calculating Bray-Curtis similarity and taking Tools>Disslm, which can turn any similarity matrix
into a dissimilarity (or vice-versa), using the relation D + S = 100.

Quantitative In addition to Bray-Curtis S11, and its ze~o-adjusted modification, PRIMER 6 also calculates:
similarity
measures
S1s =lOo.1-r,[1JYn
p
/I 12
1], whereR, =max{yq}-min{yy}
. J J
Gower's coefficient,

where standardisation is by the range R, of values for the ith species over all samples (effectively
by the maximum since the minimum will usually be zero), and thus shares with x. 2 distance the
(generally undesirable) property that adding further samples can change existing similarities;

47
. . .. ' '·~·.

4. Resemblance"-·~

S1s=lOO. Ltmin{y,1,Yt2} . Kulczynski similarity,


2/((1 I Lt Y11) + (1 IL, Y12 )]
which can be seen from the second fomi of 8 17 to be related to Bray-Curtis, replacing the arithmetic
mean of the sample totals in the denominator of 811 with a hannonic mean; . · ~

1
S19 = 100.--:L,[l fYn - 112 1] . Gower (excluding double zeros), ~·
Pu ~ ~
which is S15 with the fixed total number of species in the matrix (p) being replaced by p 12, the ~·
number of non-jointly absent species in the two samples being compared - an important difference; ~'
..
S21 = 100(1 -Dis) x.2 similarity, ~ ~

the complement of the x..2 metric;

scan =100.(1_ _.!:.Lt 1111 -1,21) Canberra similarity, in the form used by Stephenson et al, •.;

P12 (v,. + Y12) 4'


'"
~
not numbered by L&L but of more use for species data than its distance form (Canberra metric) ··u~
Dio, because of the division by the variable species numbers p 12 (i.e. excluding double zeroes); and ~
~
s0 ch =100. Lptin{y,.,ytl} quantitative Ochiai similarity, ~
~U11LiY12 ,.;
Qot defined by Ochiai as such, but ~t reduces to Ochiai's coefficient (S14) when applied to PIA data. ~
Clarke et al, 2006, J exp mar Biol Ecol, construct this coefficient (which is intermediate between "-
~-
Bray-Curtis and Kulczynski because it replaces the denominator with a geometric rather than
arithmetic or harmonic mean), to illustrate that measures with reasonable properties are not difficult ~ ""
'..,1(

to invent, explaining the superfluity of coefficients available from the literature I ~


~

Pres/Abs There are numerous similarity measures defined (or simple species lists, i.e. when the data consist ~~-
similarity only of presence (1) or absence (0) of each species in each sample. Any similarity coefficient ~
measures between samples 1 and 2 must then be a combination of four numbers a: the number of species ~
pr~sent in both samples, b: the number present in 1 but absent from 2, c: the number absent in 1 but
-(~

present in 2, d: the number absent from both. Clearly, the coefficient must be symmetric in b and ~
c, and the more biologically useful coefficients are also not a function of joint absences, d. There ~
..,,
still remain a large number of options, given by L&L as: ~
:_,
S1 =100· a+d --100· a+d R &T . ~
simple matching; S 2 ogers ammo to;
a+b+c+d a+2b+2c+d

S3 =100· 2a+2d a+d


S 4 =100·--;
2a+b+c+ 2d b+c
a d
S 5 =.25· - a- + -a- + -d- + -
d -] ; s6 = 1oo. .--;::::=====
[ a+b a+c b+d c+d .J<a+b)(a+c) · ~(b+d)(c+d)

a 2a
S1 =100·--- Jaccard; S8 =100·--- S0rensen;
a+b+c 2a+b+c
3a a
S 9 =100 · ; S1o =100 · ;
3a+b+c a+2b+ 2c
a
S11 =100·---- Russell & Rao; S13 =50·[-a-+_a_] Kulczynski;
a+b+c+d a+b a+c
. a S =100· a+(d/2)
S 14 =100 · Ochiai; 26 Faith.
~(a+ b)(a + c) a+b+c+d

48
4. Rescmblan~~-, ·.

The most frequently met are S0rensen, which is Bray-Curtis calculated on P/ A data, and Jaccard.
The definition shows how alike they are (in fact they are monotonically related, so the procedures
in PRIMER which are based only on rank resemblance matrices - i.e. most of them - will give the
same outcome for these two coefficients). Note that a quantitative matrix input to one of these
calculations will automatically be reduced to 1 & 0 form before computation (but for .clarity of
working it might still be advisable to carry out the initial presence/absence step explicitly).
45 -7 Demonstrate this point for the Ekofisk abundance data, by calculating three similarity matrices
which should be identical: re-open the workspace ekwk.pwk and carry out the fo llowing.
a) With ekrna as active window, Analyse>Pre-treatment>Transform (overall)> (Transformation:
Presence/absence) to produce a purely PIA matrix, which by now will probably be called Data5 .
Compute Bray-Curtis on this with Analyse>Resemblance>(Measure•Bray-Curtis similarity), and
no added dummy variable - there are no sparse samples here, so the latter would make negligible
difference to the relative similarities in any case. The resulting resemblance is probably Resem3.
b) Compute S0rensen on Data5, with Analyse>Resemblance>Measure•More(tab) and More>
(•Similarity PIA: S8 Sorensen) giving Resem4.
c) Recalculate S0Tensen, directly with the quantitative ekrna as active window, to give Resem5.
Tidy up the desktop by Window>Close All Windows and clicking on the three resemblance icons
in the Explorer tree, Resem3, Resem4, Resem5, to display only these three, resizing the windows
manually as necessary (or with Window>Tile Vertical). Save the workspace again .

..,.}
Fde Ed~ Select View Analyse Tools Window Help
D ~ f;I &'> Vi ! ~ liGJ (la tt fo fa [ill 12: :;:_:. I I<?. ~ •!• ~
~ ekwk
8 ·D ekma
: r::! " Overall Tronsform1
· t:;-l_) Deh11
' 3 " Resemblence1
• • • ~ Resem1
I e l!t' Overoll Tronsform2
• • · ·D oa1oi
B" Overall Transform3
· 5·l.) DataS ·
1 ; 8 f!t'Resemblence3
' . • -&, Rescm3
; ; 9 ·" Rescmblance4
:~ Resem4
' 5 " ResernblenceS
·--~mma
B··t'.J ekev
E. ·1' Normeflse1
8 4) Dote3
: S " ~escmblance2
'· ~ Resem2
·D Deta4
.)
81 ~ Row 1 Col 1

.>

Taxonomic A later section (14) discusses univariate div~rsity indices that can be computed from each sample,
distinctness/ including biodiversity measures that are based on the relatedness of the species making up a simple
aggregation species list (presence/absence data) - see Chapter 17 of the Methods manual. Though in theory
files this relatedness could be genetic, phylogenetic or even functional, PRIMER 6 implements the idea
in terms of taxonomic distinctness, namely the distances travelled in connecting eve1y pair of
species through a tree with a fixed set of levels (typically, a Linnaean taxonomy). If, on average,
these distances are large, then the sample is considered biodiverse. A necessary input here is an
'aggregation' file (seen on p36), defining which species belong to which genera, families, orders,
etc, and from this, path weights roiJ are calculated between every pair of species, i and j. Always, roiJ
takes the value I 00 for two species that are connected at the most distant level. To be more precise:
if the final column heading in the aggregation file is phylum then two species in different phyla are
defined to be I 00 units apart (do not add a final column, say kingdom, for which all species have
the same entry, Anirnalia; you could then only attain value 100 for species in different kingdoms).

49
,,,
4. Resemblance ,,
By default, intervening levels are considered to be equally-spaced. For example, for a hierarchy of 1
species from different classes all in the same phylum, with the five levels of species, genus, family, -i:
order and class, two species in the same genera are 20 units apart, in different genera but the same ~,
family are 40 units apart, etc. This can be ovetTUled in two ways: either a user can defme his/her ~-
.J
own step branch-lengths, which will again be rescaled to a maximum of 100 for two species in -I
different top-level groups, whatever scale is input for the absolute steps; or the information in the '1:
aggregation matrix about taxon richness at each hierarchical level can be used (a level in the tree
which has almost as many taxa as the level below it gives rise to a step of shorter branch-length).

Taxonomic This concept of taxonomic distinctness can be carried over from a diversity index to a dissimilarity
· dissimilarity coefficient. Two such measures are given under the .Analyse>Resemblance>More>(•Taxonomic
measures dissimilarity P/A:) menu tab. Both are presence/absence measures only, indicated by the plus sign
superscript: r {upper case Greek gamma) is a natural extension of Bray-Curtis dissimilarity on
PlA data, i.e. the complement of Serensen S8, and Et (upper case Greek theta) similarly extends
Kulczynski P/A dissimilarity, the complement of S13 • The~ are Connally defined as:

r• (L:!,mjn<mu}+ L~~,mjn<mul)
(s1 +s2)

where there are s 1 species present in sample 1 and s2 in sample 2, and ro11 is the distance through the
tree from species i of sample 1 (i = 1, 2; ... , s 1) to speciesj of sample 2 U= 1, 2, ... , s2). This is
almost simpler to express in words: for each species one finds the most closely related species in
the opposite sample, then averages these minimum path lengths over all (sl + s2) species, to obtain
r. (If the nearest relation in the opposite sample is the· same species, the path length is defined to
be zero, of course). For e+, these averages are calculated separately, i.e. the average path length
for all species in sample 1 to their nearest neighbours in sample 2, then for all species in sample 2
to their nearest neighbour in sample 1. These two averages are then themselves averaged.
e+ ~as defined (and referred to as an 'optimal mapping statistic', denoted M) by Clarke KR & ~·
Wan\rick RM, 1998, Oecologia 113: 278-289, and r is (to within a constant) the TD of Izsak C & ~
Pried ARG, 2001, Mar Ecol Progr Ser 215: 69-77. They are clearly closely related, and will be '~
idenltcal when s 1 = s2• Their use is in ordinating samples from widely-spread biogeographic ~
regidns with few, if any, shared species, but which will always have higher-order taxa in common. J.
T~ey~ also provide a certain amount of robustness in dissimilarity value to mistakes or inconsistent ~
identification at the fmest taxonomic levels (see Clarke KR, Somerfield PJ, Chapman MG, 2006, J ~
exp ~ar Biol Ecol for examples of their use in ordination) .
.; - ~
As doted above, these taxonomic-based measmes reduce to well-known P/A dissimilarities (r to -:.
S0rehsen and Et to Kulczynski) when the hierarchical tree collapses, i.e. all species are in a single "'
high~r-order group. Then the path lengths are either 0 or 100 (species have a match in the opposite ~
sample or they do not). The· other two measures lis~¢d under the Taxonomic dissimilarity menu are ~
thus µiese limiting special cases,:the dissimilarity. fopn~ of Bray-Curtis P/A and Kulczynski P/A: ~
B+ = 100-S8, , '· :.·:· ::_:' i· ·:::- .1. ';.•·K+ ~ 100-Su. ""'
., The same coefficients can also be ~b~~d, of c~~~~·by selectµtg Ss or S13 from the Similarity PIA ~
menu, and then taking Tools>Dissim to conv~rt fro~. similarities to dissimilarities. ~
.. .
(Groundfish Assemblage data from 93 groundfish species, those that could be reliably sampled and identified in ~
·'-
of European beam-trawl surveys by research vessels from several countries surrounding NW European shelf ~

.
shelf waters)
--)
waters, were analysed by Rogers SI, Clarke K.-R ~J.leynolds JD, 1999, J anim Ecol 68: 769-782.
The data matrix, C:\Examples v6\Groundfish\gfa.pii~ is of 277 locations (ICES quarter-rectangles)
sampled in the third quarter of the year, over the period 1990-96, with the values being mean catch
rates corrected to number of fish per Sm beam trawl per hour. The 277 sites are divided a priori
~t

·~
~
into 9 coastal areas (1 to 9 in factor area: !=Bristol Channel, 2=Westem Irish Sea, ... , 9=E Central "
North Sea, see the Edit>Properties box). Also availa~~e .is a PRIMER format aggregation file, ~
gfagg.agg, see~ briefly on page 36. Open both gfa and gfagg into a New workspace, saved as gfwk. ~

~.
50 ~

~
4. Resernblance

First transform the gfa.pri data to presence/absence with the usual Analyse>Pre-treatment>
Transform (overall) - this step is logical but not essential, as noted earlier, because P/ A measures
do this conversion automatically if quantitative data are provided. On the resulting Data I sheet,
take Analyse>Resemblance>•More(tab) & More>(•Taxonomic dissimilarity PIA: Gamma+)>
Taxonomy, and observe the default settings on the Taxonomy (Data) dialog box, which are the
ones you need. The aggregation matrix is gfagg, the sample data (gfa) does consist of Species level
variables, and you wish to use all the taxonomic links in gfagg (from Species to Class levels) when
calculating 'nearest neighbour' distances. Also, the Weights•User specified> Weights menu shows
how the branch-length steps are set by default to be equal: the rescaling will then g ive possib le path
lengths between species in opposite samples of 0, 20, 40, 60, 80, 100, as described earl ier.
Alternatively, the tree could be flattened at the top end, for example, by setting the final (Class)
branch-length step to zero, so that two species in different classes are considered no more distant
than two species in different orders. In a similar way, the finer taxonomic groups could be
compressed by a zero Species entry, or perhaps a graduated scale could be used: entries of 5, 4, 3,
2, 1 would distribute more weight to the finer-level taxonomy.
Timing bar, Take the defaults .to set the calculations running. Dissimilarities are computed between all pairs
interrupts, from 277 samples (i.e. 38226 of them), each one involving a comparison' from the aggregation
multi-tasking matrix of each present species with all species from the opposite sample. That this runs in a few
/processors seconds is testament to the computing power of the modern PC, and the efficiency of coding in the
Microsoft .NET environment: compute-intensive operations are typically 4 or 5 times faster than
for the VB code in PRIMER 5. A timing bar appears (in light and dark greep) 11t the bottom of the
desktop, allowing the user to judge how long the full calculation is likely to take. Though this will
not be long here, in other contexts it may become obvious that too demanding a task has been set to
be run at the current time, and the execution should be stopped. This is ac!1ieved by clicking the
Stop Tasks icon •!•
on the Tool Bar (equivalently, take Tools>Stop Tasks); which will result in a
clean interruption to the task. PRIMER 6 is also fully multi-tasking, thus if an analysis looks like it
will take a long time, it can be set running and other, less intensive, manipulations carried out
simultaneously in the same PRIMER workspace. PRIMER 6 will also automatically exploit
multiple processors, where configured. Stop Tasks will interrupt all current processing in the
PRIMER desktop, note, but should not damage the workspace in any way. Save gfwk.pwk for later.

'°M-"' ~;;-l ______________


I 0 Sirniarily PIA: 0 Oistonce: Currcrt level ol $ample date:

~· S1·1:1., 1r.r.~1r;U .• ·:___:_~·~--' f~:: ~:~.11£~~~.;';~. !Species

I OSlmiarly qvonthtivc: . _
Mrikow~~r: U:elrv:s
from levet
W~ighl:
(!) User specified
!Species
Oothers:
- [ Weigt«s~
0 Te.xonomic dissimiarly P/A: j f<o,.\r~ CJ'I Cl'.'l te(•UJl:ITT Tolevet
!oammo• 1,vJ CY tero repl.,o.em laoss 0 Toxon richness

11)1

rr::;~-· ~~·1~h .
!Species 1 '

- - -11 .
- - -1

Groundfish NW European shelf


.A.bundance
J
S1 S2
0
0
01 S3
S4 1 1.515 .~

62 (- motrtx .•. < .I .».


RC$el!'blonce l Row1 Col 1

'· 51
!
4. Resemblance ;;i• . ·:.. . ~!. ~·;

.-, ..... .
Analysing Chapters 2 and 7 of the Methods manual. discuss ·'species analyses', in which resemblances are
tietween·. ·.:· _<:' calculated between pairs of species, rather. than pairS of samples. Thus, two species are considered
·variables. : t. ·" perfectly similar if they co-occur. across:.samplei .-with numbers or biomass in strict proportion, for
· "- · ·. quantitative data. As with sample:·similarities,'.',·Bray-Curtis (and related) coefficients are often
appropriate here too, since they refuSe to regard the· absence of two species at a particular site as
evidence for 'similarity' of those species (a clay-living and a gravel-living species are not similar
because neither are found at sandy sites).' ·A typical analysis would first standardise with Analyse>
Pre-treatment>Standardise>(Standardise•Variables) & (By•Total), but species which are absent
from all sites must first be removed to .avoid division by zero. Then Analyse>Resemblance>
(Analyse between•Variables) & (Measure•Bray-Curtis similarity). However, this approach is often
not successful because of the large degree of random 'noise' in the sampling of most species,
especially the rarer ones. Typically, an initial severe pruning of the species to only the commonest
is required, to niake any progress at all - see section 8 for selection options. (A better approach to
identifying informative species is to retain resemblances between samples which, importantly, does
not require any reduction in the number of species in the matrix, and identify the contribution of
each species to between sample similarities-as in the SIMPER routine of section 12).

Correlation One context in which resemblances between variables are potentially more useful is in dealing with
between environmental variables, biomarkers, morphological variables and so on, partly because there can
variables often be fewer of these than species in a diverse assemblage, and also because most variables have
non-negligible content: the problem of rare species, containing little information and giving highly
variable resemblances with other rare species, does not arise. Given that environmental-type data
usually has variables that are either on different measurement scales or are non-comparable on ·the
same scale (e.g. 1 ppm of Hg or TBT content is not equivalent to 1 ppm of Fe), correlation is an
obvious choice, since it automatically normalises each variable in the resemblance calculation. The
final set of options under Analyse>Resemblance>More, under •Others, thus gives four choices of
correlation coefficient p, namely standard product-moment correlation:
p L/Y11 - Y1.)(Y21 -y2.)
P = I 2 2
Pearson correlation,
~ L/Y11 - yi.) L1 CY21 - Y2.)
where. YJ. =(LJYIJ )jn is the average of then sample readings for variable 1, etc; and three non-
paranittric choices, based only on rank values (rq), the numbers 1, 2, 3, .. , n across samples j, for
each Variable i. Spearman and Kendall are standard non-parametric coefficients, Spearman simply
being Pearson correlation calculated on these ranks, which reduces to:
s . 6 2
P =1- n(n 2 -1) L1(r11 -r21) Spearman rank correlation,

(this is modified in a standard way when there are tied ranks). The construction of Kendall
correl~tion appears somewhat different (see Kendall MG 1970, Rank con"elation methods, Griffin,
London), but in practice it tends to track Spearman cl<?sely, with lower absolute values. A weighted
form of the Spearman coefficient gives more emphasis to small ranks (high variable values):
2
w · 6 ~ (r11 -r21)
p - 1 ·- £.J Weighted Spearman rank correlation,
- n( n -1) 1 r11 + r21
but this really only makes sense in an asymmetric context (see Chapter 11 of the Methods manual
where this coefficient is motivated by correlating the entries of two resemblance matrices, thus
e~phasising matching pairs of high similarities). All four correlations are in the range (-1, 1), so if
required for subsequent ordination in which points denote variables (so that strongly correlated
variables are placed close togeth~r), c8re is needed in deciding how best to convert them to
similarities, ranging over (0, 100). The choice is usually between S = SO{l+p) or S = lOOlpl, both
treating strongly positively correlated variables as highly. similar and the latter also implying high
similarity from large negative correlations,· but the . former regarding correlations near -1 as
implying very low dissimilarity. The context should usually make clear which is the right choice.

52
4. Resemblance

(Biomarkers Data in the file C:\Examples v6\Biomark\brbm.pri are a subset of those from the IOC Bremerhaven
from N Sea workshop (Stebbing & Dethlefsen 1993 Mar Ecol Prog Ser 91); consisting of a multivariate ; uite
flounder) of 'health' biomarkers measured in flounder caught at 5 sites in the N Sea (S3 : ss, S6, S7, S9),
with 10 replicates per site. The 11 variables are biochemical and subcellular (e.g. EROD induction,
·~ levels of oxyradicals, lysosomal stability), which mix ordered categorical and continuous data, so
relationships between variables might arguably best be captured by a non-parametric correlation.

Close any existing workspace and open brbm.pri. The automatic reduction of variables to rank
order across all samples, in calculating Spearman or Kendall correlations, makes it unnecessary to
carry out either prior transformation or normalisation, so simply take Analysc> R esemblance>
(Analyse between•Variables) & (Measure•More(tah)), and on the More tab: • Olhers>Spearman
rank correlation. Note that high lysosomal stability (AO or N RR) is associated with low levels of
oxyradicals etc - both indicating contaminant impact - so this is a case where, if an ordination plot
of variables was required, correlations should be transformed to similarities by taking absolute
values. Transform expressions are covered in more detail in section 9 but here a simple Tools>
Transform on th_e correlation matrix, \.\Sing Expression: 100*ABS(V) would provide the right
input to a variables ordination. More usual, and useful, would be a samples ordination: the
compatible sample similarities here would be produced by Tools>Rank variables on the brbm.pri
data matrix, putting the resulting data sheet into Analysc>Resemblan~e>(Analyse between•
·' Samples) & (Measure•Euclidean distance).

-·.,
·'.) btbm BEST ... ~~~ l
CASWELL •• •
DIVERSE...
Bremerhaven IOC workshop - biomarkers i
Environ mania I I
DOMDIS•••
llllll..11111111..ll~mlf[lil~ni!rm;ll~~~~~::Z?~~~·wr:~~~rn~. I
LINKTREE.••
PCA•. •
l - - - t - - --
EROD LYS A O
0-' _ _ _ 7.~ _ _ _ _ _ , _ _ _7_6
OXYRAO
.. 43_
I~ I
I
SIMPER•••
SIMPROF ••• ___ _o___ 6S --,so '.
01 66
0
150 105
50
55
i.;3

rs_7-_4_ ~ _ _ _0T._ _ _ ----1so


-
1 1~
s 71
---·- - 43
77 26
96 54 I
791 46 -I

r
;.:;--~I 73 50 I
...
53
•R OO • ••••• •• · •·.-~---·
40:. v t
Analyse between Me~ sure
>
Q seimples 0 Bray-Curtis slmllerft y -.·:.: .~\~'

~erlebles 0 Euclidean distance

' I

0 Similarl y P /A: ODlstence:


.; .

1tJ:1kow : ki r:
OK Cancel
I•
0others:
Pcorson correlotion v
' ,) Bremerhaven IOC workshop - biomarkers
C orrelation (-"I to 1)
r:
'J EROD
EROD
' >I
:v
· LYSNRR
- PNOCYT
OXYRAD 0 .1151

53
4. Resemblance

'TRANSFORM ~
Model Matrix... ..
Selected data taken. Only highlighted data transfonned.
Check. ..
Dtsslm... Expression:
Dup&cate... jHXJ'ABS(V)
Rank...

Stop Tasks
l'tck - -
Options ... · Bremerheven IOC workshop - biomerkers
1 Type • lem:
0eet val.oe
0Fo.nction
Correlation (-1to1)
..
I S...'11\!)!~ EROO
! I V• rl.•l-16 LVS AO
LVS llllR
24.503
24.587 t 63.524
f<!<:ft't'
Pl'IOCYT 7~.7~.J_~_2_
18.052 . _ _ _ .
OXVRAD 11 51 72.82 64.227_ __ ~~ .,

( .•' ) I

CMcel Help

lrkshop - biomerkers
Bremerheven IOC workshop - biomerkers I
Environmentel LVS NRR ?lllOCYT OXYRAD ~

901 76 43 :
l~j:__.-FER.:..:00;.:._-:::"'.~L:..:V.::.S.:..:.AO=--:~L.:..:VS:..:~.::..:.:,,,.....,..+:...:..:.:.::.YT:...:...,....,..cO::::
PNOC XY.:.:.RAD=.~ " . __Trans_pose_._" - - -- 1 - - + - - 150 ~105 '
S7-1 375 55R t
.S 55 1 41.S I .- · --~ - ---- .. ,,
-I-
63
' . S7-2 315 _
l«ii.'!~;:__1~-...;.;.:;,;... 17
__:_ 9 ~-~ ·:
~ S7..J 37.5 16 9 --'
26
.: S7-4
- S7-5
37 5 ,

5.5
11 .5
9
9
Mein More . "'
·-
S7-6 37.5 14 .5 3 Anlllys" 1>6lween Meo,txe
=--_.= _:_37-:Sf-- -.i----4
5.5 3
~e~s
1
01hy-Co..r11s siml<rly
13.5 16
37.5 11 .5 3 ~erlebles I 0 Eucllde"" chtonee
- --23'-i- ----!-
5.5 1 3
''1..

l Bremerheven IOC workshop - biomerkers


Distance (0 to inf)

SJ-1
53-1
S3-2 46.976

OK
. 53-4
53-5
9.3408
24.794
44.749
I
27.226 40.302
..
.....
S3-6 31556 43.284 29.824 1 43.489 ....
( 'I -~ 1 >I -4.o.:;.

Saving & A resemblance matrix can be opened or saved in internal PRIMER v6 (the default) and v5 formats,
opening both having .sid extensions, in the earlier DOS v4 .sim format, or as a .txt text file. The text option
triangular gives the choice of triangular or 3-column formats (the latter being especially useful for ' unpeeling'
matrices a set of resemblances into a single column, or vice-versa). Another text file option (¥'Save whole
matrix) allows output of the full square matrix, i.e. including diagonal entries and the upper triangle
(the transpose of the lower triangle, naturally). For exporting triangular matrices to other software,
the text format therefore has the greatest flexibility, and would be needed for large arrays because
of tqe <255 column constraint in Excel, but the final option (.xls) would be a common choice.
Save the original correlation matrix (Reseml) in Excel format, with File>Save Resem As>(Save as
type: Excel Files (*.xls)) & (File name: brbmcor), and open brbmcor.xls in Excel. Note the simple
format of a title (in box Al), row and column headings in column A and row 2, and lower
triangular entries. Contrast this with 3-column .txt format output of the sample distances (Resem3). '-

54
4. Resemblance·

.-.....
~

Fie Edt Form& View Help


~remerhaven I OC wor kshoP-:--61omarkers-~
53 -1 53-2 46 . 9 760577315722
T~fl~Pro;~;.~ -------------- r- - I 53-1
53 - 1
53 - 3
53-4
9 . 3407708461347
24.79415253 64349
53-1 53 - 5 44. 7493016705289
53-1 53-6 31.5555066509793
i:! S'l·,~ ' 'llrll ~•Ire-'. 53-1 53-7 12 . 6589889011722
jurFe 53-1 53-8 29.748949561287
0 3 column format 53-1 53 - 9 28.0802421641 979
53-1 53 -10 30.3 850292 08477
Concel Help
53-1 55-1 71. 54369294 35432
53 -1 55 - 2 69.4550214167413

Factors or indicators would be held, as with data files, as extra rows at the end of the sheet,
separated by a blank row. You could see this by also saving to Excel the samples distance m,atrix,
Resem3 , above. (For more precise details of text output formats when saving triangu lar matrices
see the v6 Help system: Help>Contents>File types>Text file form a ts, under the section 'Text').
' .
These Excel and .txt fonnats (and options) are also what is requi red (and available) for entry of a
triangular matrix into PRJMER 6, if resemblances have been calculated by external software , e.g.
genetic routines. Entries are allowed in the diagonal and upper right triangle of an input sheet but
will be ignored by PRJMER. Now close the file brbmcor.xls in Excel and try re-opening it into
PRJMER. In the Excel File Wizard, specify Data type•Resemblance matrix, look at the check
boxes for whether titles and row labels are expected (they will be present on a matri x output by v6
to Excel in the first place), and set Resemblance type•Correlation - the information about the type
of triangular matrix is lost in the process of outputting to Excel and re-inputting to v6. Note that
the newly input sheet is named brbmcor(2) to distinguish it from its previous version, brbmcor,
11 8 ,_ which is still in the workspace. Save the workspace as brbmwk.pwk for later analys is.

Other Returning to the listing of options under the More tab in Analyse> Resemblance, fi ve resemblance \\
coefficients measures loosely based on likelihood-ratio tests are also included in the •Others list. They are
- essentially distance-type measures, m that they mcrease as samples become less alife. All are
motivated by a (usually unrealistic) model in which individuals of a species are randomly
distributed in space or time (i.e. the data are strict counts, Poisson distributed), independently of
other species, and with the mean count differing over species . A generalised likelihood ratio
-. (GLR) test that two samples come from the same assemblage produces the test statistic:

D
8
inD =2·L;[Y;i log( Yn
Yil + Yi2
J+ Y; 2 log( Yi2
Y11 + Yi2
J+ (Y;i +yi2) log2] Binomial deviance,

·.. where the sum is over all p species as usual (note the first tw o tenns do go to zero, unambiguously,
when y; 1 and y 11 are zero, respectively). In fact, the coefficient is of the form 2L[ Olog( 0/£)], where
O=yn or y;z and E=(y11 +y 12 )/2 are the observed and expected values in a chi-squared type test of
equality of species i counts. The more familiar Wald test statistic for this situation is L[( 0 -£)2/EJ,
but the two measures are likely to behave very similarly in practice (bot h having large-sample
distributions of x.,2 on p dt). A more useful variant of the latter is therefore given under •Others, by
simply dividing the chi-squared by the number of non jointly-absent species for these two samples:
2
D wald = _l_L;[ (Yit - Y12 ) ] Wald (chi-squared) coefficient,
P12 (y11 + Y12 )

·. 55

-r------ --- --- - . ·-· --- . ·--·


4. Resemblance

thus· making this form of the coefficient independent of joint absences. This could be further
modified in a natural way, to make it more robust to large Yu (outliers) whilst preserving similar
behaviour, by replacing a sum of squares with a sum of absolute values:

Dch1 =_1_2:,[ IY11 - Y12I] 'Chi' statistic.


P12 ~Y11 + Y12
All three coefficients above are .not dimensionless, i.e. they make sense only when applied to real
counts and not densities, biomass,% cover etc. Millar & Anderson 2004, J exp mar Biol Ecol 305:
191-221 therefore suggest a scale-invariant form of the first one:

Dss1nD = L1 1 [Y11 loj


0 Y11 •) + Y12 log( Y12 ) + (y,. + Y12) log 2]
(y,1 + Y12) lY11 + Y12 Y11 + Y12
Binomial deviance (scaled).
(They choose to drop the 2 outside the sum and work in logs to the base 10, so for consistency with
that paper, PRIMER does the same. Resulting analyses would be unchanged either way, since the
difference is just the same constant multiplier for all pairs of samples). Because of the close link
between likelihood ratio and Wald statistics, JYBlnD is seen to be a form of Clark's divergence, Du,
though without the adjustment for double zeros that comes through the p 12 divisor.
Cao, Bark & Williams 1997 Hydrobiologia 347: 25-40 suggested a coefficient which has been
advocated or used in subsequent studies. It looks very reminiscent of the likelihood ratio, but with
an important switch of the Y11 and Ya inside the logs:

Der =--1-L, (y I [Y11 log( . Y12 )+ Y12 loj Y11 ) + CY11 + Y12)log2] CY.
P12 II + Y12) Y11 + Y12 °lY11 + Y12
(It does take positive values in spite of the negative sign outside the sum!) It, too, contains the
important p12 denominator adjustment, but is now undefined when either y 11 = 0 or Y12 = 0, which is
much of the time, in fact. Zeros have to be replaced with a small positive number therefore, and
the ;otitcome is sensitive to this choice. No theoretical basis has been advanced for this coefficient,
and i~ does not have an intuitively simple fonn, so any good operational properties it may possess
must be somewhat fortuitous, and it is probably best avoided by the novice user.
..
'
Between- FinaHy, another potentially important application of multivariate methods, introduced under
curve 'Cumulating samples' (page 42), is the analysis of structured sets of curves or pseudo-frequency
distances distributions (termed profiles), e.g. particle- or body-size·anatyses, or growth curves, with several
replicate profiles from each of a number of sites, times, treatments etc. Simple univariate statistical
treatment of the size variable is often impossible because of the inherent serial correlation problems
('repeated measures') of, for example, tracking the body size of a single organism through time, or
the laQk of a proper frequency distribution ~tructure in histograms of particle sizes (in no sense is
one cdunting independ~nt particle~ into the sampling device, to give multinomial frequencies). A
viablej· multivariate alternative is"· to: µ-eat .. the ..independent units as the profiles and define
dissin1ilarities or distances among· tP.em~· inputting these pairwise resemblances into, particularly,
the ndh-parametric testing structure (ANOSIM: 1ests) discussed in section 12. Suitable distance
measures to calculate between pairs of,. curves· include Euclidean distance D 1 (or its square),
Manhattan distance D1 and, particularly for 9omparing cumulativ~ curves;
nmax =max.IY11 -Y12I Maximum 'distance,
I

which is the final option on the •Others list. The maximum by which two cumulative frequency
curves depart from each other, taken over all the size categories, is the basis of the Kolmogorov-
Smimov test, but the testing stru~ture there relies heavily on real (multinomial) frequencies. Where
this is not the case, as often, maximum departure may still be a sensible measure of dissimilarity of
two curves to feed into multivariate analysis, though Manhattan (or Euclidean) distance is likely to
be at least as good, since it sums positive contributions across the entire size range.

56
5. Clustering

5. Hierarchical clustering (CLUSTER, SIMPROF)

. Clustering PRIMER carries out simple agglomerative, hierarchical clustering (see Chapter 3 of the methods
& linkage . · manual) using the Analyse>CLUSTER menu, when the active sheet is a resemblance matrix. (In
choices contrast, a form of binary, divisive clustering is also available, though mainly used for a more
specialised purpose - see the LINK.TREE routine in section 11). The outpu! is a dendrogram, i.e.
tree diagram, displaying the grouping of samples (usually) into successively smaller numbers of
clusters, of ever-larger size, as the threshold level of similarity at which two groups are considered
to merge into one is steadily decreased (or the dissimilarity/distance increased). The routine can be
applied directly to any of the triangular matrices produced by the Analyse>Resemblance menu,
with the exception of the correlation matrices whi9h need first to be converted into (positive)
similarity values, in some.appropriate way for the practical context (see page 52). The Analyse>
CLUSTER routine allows a choice of linkage options (single or 'nearest neighbour', complete or
'furthest neighbour', or group average}, which defme how the resemblance between two groups of
samples is calculated from the cross-group resemblances of pairs of samples.

SIMPROF PRIMER 6 also now incorporates a series of 'similarity profile' (SIMPROF) permutation tests,
tests looking for statistically significant evidence of genuine clusters in samples which are a priori
unstructured (e.g. single samples from each of a number of sites). If this option is taken, tests are
performed, at evecy node of the completed dendrogram, that the group being sub-divided has
'significant' internal structure (e.g. that samples in that group appear to show evidence of multi-
variate pattern). The test results are displayed by a colour convention on the dendrogram plot
(samples connected by red lines cannot be significantly differentiated) and test statistics are given
in the results (for general information on Results windows see section 6). The dendrogram itself is
very rapidly calculated, but the SI~ROF routine is highly compute-int~nsive, given the large
number of permutations necessary for each nodal test, and potentially large number of nodes, so the
SIMPROF option should not be routinely employed for large resemblance matrices. A selective
form of it, providing more detailed output for a single set of samples, is available through the
Analyse>SIMPROF menu, when the active sheet is a (selection of a) data matrix. (The SIMPROF
test can also employed in conjunction with LINK.TREE, see section 11).

Modifying Though PRIMER 6 does not attempt to replicate all the facilities available in graphics presentation
plots in software, there are a large number of graphics options available to modify dendrograms, many of
PRIMER them shared in a consistent interface for ordinations and other PRIMER plots. General features are
that plots can ·be: resized; titles, sub-titles and axis titles edited, and font colours, sizes and types
changed; these and history display removed altogether; data (usually sample) labels replaced with
factor levels and/or symbols, the latter having a choice of symbol size, type and colour; lines
thickened (unselectively); axes rescaled to specification or even logged, though this will not usually
be appropriate for a dendrogram), etc. Features specific to dendrograms include: the ability to
orient the plot in any of the four directions; to display a slice through the tree at a fixed
resemblance level, and create a factor that defines the (sample) groups at that threshold; to rotate
sub-groups of the tree, in any permissible way; and to collapse the detail of specific sub-groups, so
that the overall structure of a large tree can be better displayed. The fine detail is seen by another
significant addition to all PRIMER 6 plots: a flexible zoom operation which maintains the: position
of labelling and axis scaling while zooming in on the content of the plot. Importantly for dendro-
grams, the aspect ratio of the zoomed box can also be changed, allowing clear presentation of
detailed structure (this latter feature does not operate with ordinations, for good reason - see later!).
For printing or saving dendrograms (and other plots), in a variety of formats, see section 6.

(Exe estuary Assemblage data on 140 species of free-living marine nematodes at 19 sites (S 1-S 19) in the inter-
nematodes) tidal soft sediments of the Exe estuary~ UK, is in data file C:\Examples v6\Exe\exna.pri; the entries
are average counts over 6 bi-monthly. samples, an analysis of the full data showing no clear
• --+- evidence of seasonality. The file exev.pri in the same directory contains 6 environmental variables
for the sediment~ at those 19 sites: median particle diameter, depth of the water table, depth of the
anoxic layer, height up the shore,% organics and interstitial salinity. (For the original data analysis
see Field JG, Clarke KR, Warwick RM, 1982, Mar Ecol Prog Ser 8: 37-52).

57
5. Clustering

Close the existing workspace, if any, and open exna.pri, pre-treating the samples with a fourth root
transform (see section 3), and calculating Bray-Curtis resemblances between samples (section 4).
With the latter as the active window, enter the clustering routine with Analyse>CLUSTER>
Cluster mode•Group average, taking the other defaults (~Plot dendrogram, which would almost ~
always be required, but not the SIMPROF test option for now). l

.,

'~I

·-

w~ler mod~

0 5ngle ttege
0 SM>ROF test

QComplete ttoge
0 Group ovetage

Exe nematodes
GrolJp average
...
OK I~ I C40Cel

Data labels When the active window is a plot, levels of a factor can be displayed in place of (sample) labels
& symbols and/or represented by differing symbols, with an accompanying symbol key, usi ng the Graph>
menu Data labels & symbols menu. (Alternatively, the same choices result from right-clicking when the
mouse is over the plot.) If the relevant (~By factor) check box is ticked, a list of previously-
defined factors can be selected from, independently for labels and symbols, so that checking all
four boxes results in a 2-factor annotation of samples on the plot. Note that if the (Labels>By
factor) box is not checked, but (Labels>~Plot) is, then the displayed labels are the sample labels "·
from the resemblance matrix; if the (Symbols>By factor) box is not checked, but (Symbols>~Plot)
is, then a uniform symbol is displayed. This default symbol can be changed by clicking on the
Symbol: or Colour: areas in the (Symbols>Default) box. The differing symbol colours and shapes
can be redefined by clicking on the (Symbols>Key)) button, giving the same options as previously
shown (Factor keys on page 43).

Symbol & Label font sizes, types, colours etc can be changed with the (Labels>Font) button, and symbol
text sizes sizes by increasing or decreasing the default value of I 00 in (Symbols>Size: 100). Note that all
such size parameters in PRIMER 6, whether for symbols or text (data labels, main or sub-titles,
axes titles and scales etc) , are given relative to a default value of l 00, rather than being expressed
in terms of a typeface point size, for example. This allows all plots to be perfectly scaleable as
their windows are resized or printed/saved, without the need for continual redefinition of sizes.

58
5. clustering

In datasheet exna.pri, two of the envirorunental variables (from exev.pri) have also been coded as
simple binary factors: interstitial salinity (Sal) as Lo (<25%) or Hi (>71 % of full seawater); and
depth of the blackened anoxic layer (H2S) as Shall (<7.5cm) or Deep (~20cm). Look at the factors
with Edit>Factors on the exna window. As noted in section 2, there is often a choice as to whether
environmental variables as~ociated with each assemblage sample are held as a separate data matrix,
or as factors within the biological sheet. Here, it is useful to hold some of the data in both fonns .
With the dendrogram plot as the active window, take Graph>Data labels & symbols>(Symbols>
(./Plot & ./By factor>Sal)) & (Labels>( ./Plot & ./By factor>H2S)).
!M•rnjt1::'.1TfWVZ!~
Spedol .. .
Generol .. .

v POOter
Oonuel Data labels a symbols Tile$ X axis , v oxis I
Zoom In ~~-~-~---------.,

Zoom~ Lobe!s -
! Size:
Savo Graph As... • E)Plot r.=-1
~ •100
-~vo Graph v2s As'.':._
EJey foctor
sleno v j s1>1_
~ _ _3 ~
sle no
Sol Oefaut

Symbot Colour:
fil •
~----------------------J
ConeeI Help

Editing plot Then, still in the Graph Options dialog box, take the Titles tab and edit main and sub-title content
titles/scales appropriately, also altering the title font sizes and types, e.g. (Main title: Font>Size: 175) & (Sub
title: Font>Colour:[choose black] & [Italic check box off]). From the Y axis tab, cha nge the title
to (Y title: Bray-Curtis similarity), also respecifying the y axis scale by (../Specify scale)>(Y
interval: 10) . On the X axis tab, take (/Reverse vertical text).

------ -----------r;;;::-)---·- ---·


I Generol Doto labels a symbOls1 Hies 1._x_ I - - - - - --
o x_1s_~_v_o.._·s......

' Men!Clo:
lexe estuory nematode assemblages "119 sics

Sub111a:
JAnoxic loyer (Deep or Shoiiol

Vtitle:
Fort Size:
je..oy-Cu11s slmhrly
!Arial ~ gc]


0BOld Colour. E)Llml size Vmh Vmox: V lntervot

B 1o1c
E)Specifyscalo j_o_ __ 100 rtm]---i
0 0
, ...
Log scale Reverse vertical lex!
~
AaBbYyZz

OK ·1: I Cancel I ~ I
. I OK Cancel I Help

",,
General plot Finally, on the General tab (also reached directly from Graph>Gcneral), thicken up all lines with
-
menu (Line width: i.5), remove the display of the calculation history (transformation, similarity measure
etc) by unchecking the (Plot history) box, and change the text in the symbol key box to bold and a
different type by (./Plot keys)>Keys font>(Font: Times New Roman) & (./Bold)
I
) 59
S. Clustering

'x
Fcrt
~--:--,------,,..--..--
Oene<ol bata labels & symbols ntes
--
X axis Y axis I. ~1
Size:
!100
0~slze
I nfo font... I 0Bold Cololr.

0Plotkeys
Overal fort scale:
1100 I
0 1o11c:
~
I Keys tort.~ AaBbYyZz
0 Plot history

IHlslory rant... I
OK t Cancel I Help

Ila Graph1 - -_-~- ·· - ~~~


0 Show text pano
Exe estuary nematode assemblages at 19 sites
Anoxic layer (Deep or Shallow)
0
Cancel
10 ~
20 ~
l; 30
!., 40

~ so

1: 80
90
100
+ + • a • • + • + u • + + • • • • + +
~ ~ ~ ~ ~ ~ ~ ~ ~ IIIIIIIIII ~

Edit/create Though the above dendrogram shows the way these two (binarised) factors of interstitial salinity
l.
new factors (Sal) and anoxic layer (H2S) 'explain' some of the cluster groupings of assemblages at the 19 sites,
from plots it could sometimes be more revealing to view them as a single combined factor with four levels, ·l
thus allowing display of four different symbols for these, while retaining the original sample labels.
A combined factor can be created under either Edit button under the Data labels & symbols tab,
frotn the Graph menu. Edit simply calls up the same Factors dialog box that could have been
obtained from a data sheet or resemblance matrix by Edit>Factors (or the right-click menu). As
noted in section 2, new factors (or amendments to existing ones) can be produced at the graph stage
and will be immediately available to the plot, as well as back propagated to connected data sheets.
When combining factors, a separator character is sometimes convenient: create one in the Factors
dialog box by Add>(Add factor named: -), type - into the first cell and highlight the whole column
(by clicking on its header) then Edit>Fill Down>Value. Now Combine and move Sal, then-, then
H2S from the Available to the Include box with thec:Jbutton, giving the combined factor Sal-H2S.
If (Symbols>./By factor) is now taken as Sal-H2S, and the (Labels>By factor) box unchecked,
(also remove the subtitle by simply deleting its text under the Titles tab), it becomes even clearer
that these two combinations of factors can account for much of the clustering structure seen in the
dendrogram (next page), though not the division between the groups (S6, SI l) and (S7 , S8, S9).

Special menu It is also sometimes possible, including in dendrograms, to create new factors as a result of a plot
for slicing & and for this one needs the Special menu. With the mouse over any plot, right-clicking produces a
orientation of choice of the Data labels & symbols, General or Special dialog boxes. The first two have exactly ... _
dendrograms the same format for all plots (clustering, ordination, draftsman plots, dominance curves, linkage
trees, species~accumulation curves etc) though several of their options may be 'greyed out' as
irrelevant, but the Special plot menus ho_ld the options appropriate only to each type of plot.
Dendrogram orientation can be selected by, e.g. Graph>Special>(Orientation•Left), and a slice
drawn through the diagram at a specified resemblance, with (Slicing>./Show slice>Resemblancc:
30) say. The option to create a factor, with levels (a, b, c, .. ) corresponding to the groups generated
by that slice, is available with the button for Create factor>(Add factor named: 30% slice). Whi lst
the symbols could display this new factor here, for visual reinforcement of clusters, it is more
useful in annota~ing a subsequent ordination, to judge its agreement with the cluster analysis.

60
5. Clustering

I 1Lobel
S1
-----+sao--no- - ' -Sol
1
- - - ' -H2S
.Lo
- - _ . _ - - + SBl-H2S
' Sh !lll •
- - - - - - ."
Lo-Shol

I Combhe... I i---
S2
--i-2 - - " T "L_o _ _ISNll '· Lo-Shol

S3 3 .Lo \Sholl ---!Lo-Shol



Avoleblo . \::·;" :...·· lncWe
I Ronome ... I S4 4 Ila !Sholl ILo-Shol-
do no Sol
·0 I Reorder... 1-----is
SS
I SS - _ _ _L_o_ _ ,Deep ;Lo.De~ .
6 H fSholl • 1 Hi-Shol
·.~ Oelelo •••
!-----11
S7
- H- - jshoi"---;----·H;:St;i
--
S8 S H Shel 0
Hi·Shol

El Key ... S9
1-----19_ _.':! LSh~- ~ Hi-Shol - -
S10 10 Lo Oeep Lo.Deep
mport ... ; - - - - - i- - - - -- - -- --- ---- - - -
EJ S11
r--- --i
11 H Sholl Hi-Shot
~-_...1_2 _ _ _~ --- ... o~·~ ---·- ... _ .~:~:~ _ _
Move OK ~---<-13_ _ .!:!___ -·-- D_::_p_ __ H•Deep
OK ·~

Cencel
OK
Help Exe estuary nematode assemblages qt 19 sites
+ s13 Sal·H2S
+ S12 .> Lo· S
+ S14 ... Lo-0.· 1

J
r-f
I
+ S19
+ SIS
+ S17
+S16
Hi-Sholl
.. H1 0. t :>
4

. +SIS
T S10 ~
..- ss ~
Doto lobols & symbols
., 59 J!
S7
F$"$ I i. S9
Gonorol... i.r--• .a53

"' Poiter
I I
.. S2
L S1
Zoom In l S4

I1- Zoom Out


-----. ---- ___ ___ - -·- ,.
'- S11
c S6
Sove Graph 0 10 30 40 50 60 70 80 90 100
Sove Graph DcndrD8fAITI Options Btoy-Cu1Js smlarly
1

Orle!i6llon l ·r Slcng .. _ · · ... ··· · • ·: · · · •


QUp . i 0Shows!lco ' .
slo no
QRigt't
00own
i I
I ·f -.
Resetnblonee:
130
Sol
H2S

0Lon ; I I Creete ltldor .. ~


More ...

-., Add lllClor named:


OK Coneel Help
- ----,
OK Cancel Help

75 ~ Save the workspace as exwk.pwk for later ordination analysis, and close it.

Rotating & The order of samples on the (by default) x axis of a dendrogram is to a large extent arbitrary, since
condensing all arrangements of samples along the axis, which do not lead to vertical and horizontal lines
dendrograms intersecting, are equally satisfactory displays - think of the dendrogram as a 'mobile' , of horizontal
rigid rods and vertical strings, that can be rotated at will. Such rotations can be achieved by
clicking on any of the horizontal 'rods' and, whilst it is not appropriate to use this feature to re-
arrange samples close to some desired a priori sequence(!), it can be useful in displaying visual
agreement between clusters from different analysis choices (e.g. transformations), or comparisons
of abiotic and biotic groupings for the same set of samples. Clicking on vertical 'strings', in
contrast, collapses the clustering structure under the selected point, replacing it with a single dashed
(green) line, to indicate the presence of condensed structure. Such lines are labelled (A), (B), (C), ··
with a 'text pane', below the plot, defining which sample labels are contained in each of the hidd_en
structures (the text pane can be suppressed from the Graph Options dialog). For a dcndrogram with
many samples, this feature should make it possible to view the overall (coarse-level) structure, and
"'\
the fine-level grouping can then be seen by zooming in on areas of the original dendrogram.

61
'~)

5. Clustering \.

5 1~ Re-open the workspace gfwk from page 50, in C:\Examples v6\Grdfish, with datasheet gfa of 277
samples from 93 groundfish species, captured in research trawl surveys of 9 areas of European
shelf waters (factor: area). From this, produce a cluster dendrogram based on Bray-Curtis similar-
ities from square root transformed abundances, by: Analyse>Pre-treatment>Transform (overall) '...
>(Transformation: Square root), then Analyse>Resemblance>(Analyse between• Samples) &
(Measure•Bray-Curtis similarity) and Analyse>CLUSTER>(Cluster mode•Group average).). Do
not take the SIMPROF test option here, or it will never finish! (Also, for a dramatic demonstration
of why group average linkage is generally superior to the 'chaining' that single linkage can produce
- see Chapter 3 of the Methods manual - repeat the clustering with the single linkage option.) ..
\:

~- Graph1 ' [';]§~


Groundfish NW European shelf
Grovp w erage
0

20

'··

Using Graph>Data labels & symbols, add symbols for factor area, but the 277 samples make any
area groups too small to see. So, click on (say) two of the vertical lines to collapse the large middle
section of the tree. Note the appearance of a text pane under the plot, li sting the samples in the
condensed branches, labelled (A) and (B). Also see how the structure may be arbitrarily rotated by
clicking on, say, the top horizontal line, to rotate the sample S 173 to the other side of the tree.

80

...... ~ + • •• + + + A a + + + + + 4 + e + • I

~ ~ ~ ! ~ ~ ~ ~ ! ! g e ~ ~ g! ! ~ !~ ~ ~ § a§
A:S170•S1 67+S168+S40+S176+S183+S178+S31+S39+S42 + S . J r - - - - - - - - - - -s-ampl _•_•_ _ _ _ _ _ _____---:--:-=7
B:S190+S191+S192+S197 +S207 +S208+S214+S213+S216+S1 A:S170+S167+S168•S40+S176•S183+S1 78+S31 +S39•S42+S1 8S•S47 +S24+S30•S41 +S3S+S36·
~I .u:··
1 B:S190+5191 +S192+S197 +S207 +S208+S214+S213+S216•S186•S223•S22S+S196•S222•S193•
..,...;;;;;;;;-..~~~~~~~~~~~-1~1_"-=.. I >

I•
....
Ordering Factor levels are normally treated as unordered categories, so in the symbol key they are listed in
factor levels the order in which the levels are met in the Factors .dialog. For neatness, it may be preferable for
in keys the key to display the levels in a different order, especially here where they are numeric. Taking
Key either from the Factors dialog box, or directly from the Data labels & symbols tab, gives the
symbol colour/shape selection dialog seen earlier (page 33). A series of (Move>.!.) and (Move>t)
operations can re-arrange the levels in any desired order, without changing the assignment of
symbols and colours to levels. (Alternatively, go back to the data matrix gfa.pri and E dit>Sort>
Columns>•By factor/indicator>area, then re-run the transform, similarity and cluster. The sample
labels are now in order of increasing area number so the symbol key will display in this order also.)

62
5. Clustering

-, Symool:
Size:
0Plot
1100
2 Bev factor
OK I s6
14 .. " -· · De1out
Concel l1
7
B

Help "] Is
Move - -

GJ "' 20

hmplu

A:S170•S167 +S16B+S40+S1 76+S163•S17B+S31 +S39+S42•S1 BS•S47+S24+S30•S4 1•S3S•SJ6·


B:S190+S191+S192+S197+S207+S20B+S214+S213+S216+S1B6•S223•S225+S196•S222•S 193•
i_I _}.i j >


Zooming Zooming is invoked by Graph>Zoom In or Zoom Out from the main menu, or by cl icking on the
'I dendrograms Zoom in fo or out fa icons on the Tool Bar. The cursor changes to 0 or 0 when over th e plot,
and left-clicking zooms one step in or out. To leave the plot in its current (poss ibly zoomed) state
and return to default operation, click on the pointer icon ~ on the Tool Bar, or take Graph>
Pointer (remember that the graph menu can also be obtained at any time by right-clicki ng when the
cursor is over the plot). To restore the plot to its original, unmagnified state, you wi ll need to select
Zoom Out and left-click on the plot several times (as rapidly as you like).
Instead of zooming by incremental steps, you can go straight to the final zoomed area by drawing a
box around the area to magnify: with the cursor in the usual pointer mode (click on ~ on the Tool
Bar if necessary), draw a box by left-clicking and holding at one comer and dragging over the
required rectangle, then releasing, in usual Windows fashion. A single click on the fo icon on the
Tool Bar (or Graph>Zoom In) will take you straight to the zoomed area. (The process is reversed,
as usual, by taking Zoom Out and rapidly left-clicking on the plot several times.) But note that,
unlike the incremental zooming, which preserves the 'aspect ratio' (the displayed y:x ax is ratio) of
the diagram, a rectangular zoom will change the aspect ratio so that all the information within the
box is magnified into the current window size, however long and thin (or sho1t and fat) the drawn
rectangle originally was. This is a powerful feature for zooming on dendrograms, si nce a long, thin
rectangle allows you to view a subset of the samples across the whole similarity scale. Under
zooming, note that the axes are always shown, even when the zoomed area is well away from them,
and scroll bars are displayed on the axes. By clicking and dragging these back and forth (or up and
down) the whole tree can be viewed, piecemeal, at the current aspect ratio and magnification.
Reverse the condensing of the middle section to reinstate the full tree (rotating and col lapsing arc
'toggles', switched on and off by repeated clicking on the same line) and now try to zoom in on the
fine detail. Repeated use of the 0 cursor from the fo Tool Bar icon is not effective. By the time the
symbols are visible, the similarity scale is too narrow to see the clustering stnicture (even using the
vertical scroll bar). What is needed is a change in aspect ratio: reverse the zoom operation (by
repeated use of 0 from the j?J icon), change to the pointer, with the ~ icon, and draw a tall, narrow
box round part of the dendrogram, then select the zoom-in icon fa again. A visible dcndrogram
results, which can be scanned across, using the horizontal scroll bar. Save and close gfwk.
63
5. Clustering

Flo ~ View Graph Tocls wndow ~

Ql Cl ~ 111 ! 5 ll. !·".Yi ~ e, j ~cf ftJ GV ~ P t ~ F.£ •:• f


"!!!!!P~!!!I

20

100

20

.

95 ~
" I>

SIMPROF The similarity profile test (SIMPROF) introduced on page 57, is a permutation tes t of the null
method hypothesis that a specified set of samples, which are not a priori divided into groups, do not differ
from each other in multivariate structure. (This is not to be confused with the ANOSIM test of
section 12, which tests pre-defined group structures - replicate samples from different sites, times,
treatments, etc). The similarity profile itself is the set of all resemblances between the specified
samples, ranked from smallest to largest, and the ordered resemblances then plotted (y-axis) against
their rank (x-axis), see examples on page 67, the thin (red) line. The departure of this curve from
its 'expected' shape under the null hypothesis is the basis of the test. For example, if there is
genuine clustering within a set of samples, there will be many more smaller similarities and larger
simllarities than if all the samples came from the same community (and therefore all had medium
sirnl larities to each other). The 'expected' profile shape is obtained by permuting the entries for
each variable (e.g. species) across that subset of samp 1es, separately for each variable ; this certainly
produces a null condition in which samples have no group structure, and the simulations realistic-
ally fix the variable values (e.g. have the same pattern of rare and common species, with the same
counts) . This random rearrangement of the variable entries across the samples, separately for each
variable, is carried out 1000 times (by default, though other values can be selected), producing
1000 'expected ' similarity profiles. The 1000 values at each rank are averaged to produce a mean
profile, the thicker (dark blue) line on page 67 plots; the 95% range of similarities at each rank are
also shown, as black dashed lines. The summed absolute distances (n) between the real si milarity ·l

profile and this simulated mean profile is the test statistic. A further 999 (by default) simulated
profiles are then generated and n computed between each of these and the mean simu lated profi le
(from the first set of 1000). This defines the range of likely values of the test statistic under the
null hypothesis, and the real n is compared to this, as with any test (e.g. see Methods manual
Chapter 6). It g ives a significance level, e.g. p<O . l % if the real n is larger than any of the 999 \.
simulated values - clear evidence of group structure. (Note, two separate sets of c 1000 simulations
are used to ensure absolute independence of the real n from the simulated values.)

64
5. Clustering

(Bristol Densities from 24 species of zooplankton at 57 sites in the Bristol Channel and Severn Estuary,
Channel collected by double-oblique net hauls, are given in C:\Examples v6\BCzoo\bcza.pri. The sites were
zooplankton) taken over a fairly regularly-spaced grid (Fig 3.2 of the Methods manual), and seasona lly averaged
to give one sample per site. There is therefore no a priori structure of groups and replicates within
• -> groups (though there is a natural salinity gradient, described by a factor with 9 numeric levels). The
original data is from Collins NR, Williams R, 1982, Mar Ecol Prog Ser 9: 1-11, who identify four
main clusters of sites. It is relevant to ask what the statistical evidence is for there being such a
division, and whether any of the sub-structure within those groups can justifiably be interpreted.
Open the data file C:\Examples v6\BCzoo\bcza.pri and produce the cluster dendrogram from Bray-
Curtis similarities on 4th root transformed densities, with Analyse>Pre-treatment>Transform
(overall)>(Transformation: Fourth root); Analyse>Resemblance>(Analyse between•Samples) &
(Measure•Bray-Curtis similarity); and Analyse>CLUSTER>(Cluster mode•Group average), but
this time taking the option (-1'SIMPROF test) . Look at the dialog under the SIMPROF tab, though
the defaults can be taken for nearly all: the matrix whose species rows will be independently
permuted is Datal- (the 4th root transformed data sheet); the Resemblance for the randomly
permuted matrices must again be Bray-Curtis, naturally; the% significance level is conventionally
taken as 5 (though could be made more stringent, given that several tests are involved); and I 000
permutations are used to calculate the mean similarity profile, with 999 to generate the null
distribution of the departure statistic, 7t. The only non-default option to take is: ./Create factor>
(Add factor named: SprofGps).

i Mai1 St.'l'ROf
i
I C!ucter mode

I 0 Single inkege ~ROf test


0 Complete inkege
0 Plot dendrogram
0 Group average

r Men I SIMPROf'

OK Cancel
Date sheet:
f0aie1
I Re~emblance... I
~
Nlrn perrnut811ons
Meen:
r-
~__J
Slmutllllons:
jsprolOpsl -__ -=·____
Add rector nernect

Cone el Help I
Sig level(%): 1999
Is
~etlletactor

OK Cancel Help

CLUSTER onalysls ... ........4..-....@-.-.___


'.M l!- _ __,IRow 1 Col 1

This SIMPROF tests will run in a reasonable time here (if in doubt, keep your eye on the green
progress bar at the bottom of the main window and, if necessary, terminate the run with the Stop
tasks button~!• on the Tool bar). The resulting dendrogram shows the four main groups of sites and
Collins & Williams characterised them as true estuarine, estuarine and marine, euryha/ine marine
and stenohaline marine. That the 57 sites are, broadly speaking, grouped by salinity can be seen by
displaying the salinity factor as a symbol: Graph>Data labels & symbols>(Symbols>(./Plot &
\/"By factur>Sal)), and you could tidy up the plot by arranging the legend for Sal in strict numerical
order (using the Key button on Data. labels & symbols, or from Edit>Factors on the data ?r
resemblance matrix, as at the top of page 63). You may also need to rotate one or two of the main
groups, by clicking on horizontal lines in the dendrogram, to make them exactly match the example
below. Note the subtitle change, using the Titles tab in the Graph Options dialog). These 4 groups
are also the only ones found by the SIMPROF test (called a, b, c, din the SprofGps fac tor~ ai:d t~cy
are identified by the black lines in the dendrogram. All other branches are coloured red, 111d1catmg
that SIMPROF can find no statistical evidence for any sub-structure within these.
65
5. Clustering

l!lli3l~
Bristol Channel zooplankton
Groups: estuarine, estuarine & marine, euryhaline marine, slenohaline marine
T t.anrform: fourth ro ot
Rtumbl1 nct : 5 17 B11y Curtlulmil11
20 Sa/
...(
• 1
,, . 2
1+ 3
40 I , 4
'• 5 ·1
~+ 6
~
i l L_ , .
.~
' 7 ...
E 60
_L -· It> a
Ci5
i-L
I1I I( r D
rl Ir ..r r I I' ~
I..l,
_..--L. ~ ffi__...._~,,,__~ In==··
, -:i_ -l
9

f rr'~
r
n Jr I
1

I. I1 1 1~ I.~ I.,]~,1.~.U
80
t - r-1- I I r..::L
I '-4
I
1 -·· [
I I' I I 1.J1 I1 I' I I
1 1
I r
I 1· · 11
I ·-2:::1.., __
l
I 't ' I ,1 IJ,,,
I ,-, 1
I 11 I I I I I I I I I j 1 I
1 ' r :, 1 . , ' 1-1
100
4
••••••Y • YY • •• + •• aa+• •• •••••• +++6 •• + • ++++ • + • ~· ~a
~~~-N2~n°~~~~2~=~~~~~~N~~~~~~~M~~QS.,i~Q~~~¥~~~~~~~8~~~~~
+
Samples

CLUSTER In addition to the dendrogram plot itself, Analyse>CLUSTER (like all analysis routines) produces
results a separate 'Results' window (e.g. CLUSTER!) which firstly lists the conditions under which the
window analysis was run (e.g. whether on a selection of the matrix, with what linkage option etc), and then
outputs any text-form information. For succinctness, the output will often use the sample numbers
(1-57) rather than the sample labels (which here are stations 1-29, 31-58, confusingly, since station ·1
30 was not sampled), so a listing is initially given of the numbers and their corresponding labels
(ending with sample 57 being station 58). Then the results specify, in numeric form, how the
dendrogram is constructed, in case the precise numbers are needed for another purpose: sample ...
numbers 4 7 & 48 (stations 48 & 49) are the first to group, at similarity 92. 78, with the new group
labelled 58, then 31 & 36 group at 91.59, .. ., 16 & 64 (i.e. 16 & 14 & 21) at 85.45 etc. The most
useful outputs here, however, are the SIMPROF test outcomes. These are read from the bottom
~
(upw'ards) of the results window: 7t =6.4 (p<0.1%, its most extreme value for 999 permutations) for
a test that all samples are from the same assemblage; and 7t =3.4 & 2. 7 (p<O. l % for both) for the 3:
successive splits, at 46.0% and 51.4% similarity, of the three right-hand groups. Site 12 is
'.i
borderline for splitting from the rest of the left-hand group, at 54.2% similarity (7t =2.4, p<6%), but
·<:_
there is no evidence for the apparent division of the second group into two at 60.0% similarity (rt
= 1.0, p<30%), or any of the other groups. Tests of finer-level structure are not carried out, if the <~

differentiation of the coarser level structure is not significant, so only 7 tests are needed here.
..

Hierarchical Cluster analysis .t


, Re.semblance wo.rk.sheet 13[~1~ -..,
'Ne.me 1 Rc::icml
loata type: Similarity
Sc le ct ion: All
i sq SS 69+9S - > 100 at 66 . 26
!iParamete.r.s 10+90 -> 101 at 66.13
SS S6
!:c1u:st c r mode: Group average 92+99 - > 102 at 67.73
S6 S7
S7 se Sl+100 - > 103 at 67. l S; Pi: 0 . 92 Siq(<): 27.6
Simprof test 94+96 -> loq at 6S.97
Combining 123+97 -> l OS at 6S. 2 3
iDdtd work.sheet q1+qe ->seat 92.76 96+102 - > 106 at 64.9; Pi: o . ee Sig (<): 3S.6
'Ne.me : Datal 31+36 -> 59 at 9l . S9 6S+10l -> 107 at 64 . 07
Data type : Al:>undance 20+10S - > 106 at 62.96
22+26 -> 60 at 9 1 . l
.. Sample ::iclcction: All 104+106 -> 109 at S9 . 96; Pi : 0.99 Sig(<): 2 9 .9
SS+S6 -> 61 at 67 . 27
Variable :selection: All 12+107 -> 110 at S4 .l6; Pi: 2.3S Siq ( <): S . 7
6+7 -> 62 at 66.3
4+S -> 63 at 66.26 103 +106 -> 111 at Sl.37; Pi : 2.69 Sig(<) : O.l
simpro~ Pd.ramote.r.s 14+21 -> 64 at 66 . 04 109+111 -> 112 at 4S.9S; Pi : 3.36 Sig(<): O. l ""
Permutat ion:s t o r mean protilc 2S+29 -> 6S at 6S . 6l ·~.:.:.~~~~~~~~~~~~"T~~~~~~~~~~-'
,110+112 -> 113 at 26 . 26; Pi: 6.q2 Siq( < J: O . l \-
Simulation per mutat ion::.: 999 3S+S9 -> 66 at 6S . S4 ~
.S ignificance level: S< l6+64 -> 67 at 6S . 4S
,1+2 - > 66 at 6S.2
i60+6S - > 69 at 64. 67 v
- o..J

66
5. Clustering

· SilvfPROF SIMPROF can be run directly using Analyse>SIMPROF, rather than as part of another analysis
,. direct run such as CLUSTER (above) or LINK.TREE (section 11) . In that case, the active window must be
the data sheet, the rectangular matrix whose variables are permuted randomly (and independently)
across the samples. SIMPR,OF must always have such an underlying data matrix available - it
cannot operate solely on a triangular resemblance window. Thus when the SIMPROF option is
taken in CLUSTER - which is run when the active window is a triangular matrix - v6 uses its
internal knowledge of how that resemblance matrix was calculated to specify the correct data
matrix, as a default under Data sheet: on · the An alyse>CLUSTER>SIMPROF tab. Change this
default at your peril! - its main purpose is simply to remind you that SIMPROF always works on
the underlying rectangular array not the triangular matrix.
Direct runs of SIMPROF are used to test for evidence of internal group structure in the full set of
samples that are submitted to it, i.e. a single test rather than the (usually large) series of subset tests
in the CLUSTER option. The advantage of doing a single test at a time is that more information
can be output, including a plot of the real similarity profile (red line), the mean simulated profile
.. J (dark blue contin}lOUS line) and the upp~r and lower limits within which 99% (say) of the simulated
profiles lie, at each rank value (broken lines). Another output (optionally selected by checking
..~
.l'Stats to worksheet) is of the data used to plot the similarity profile graph. This worksheet will
have a number of rows equal to the number of entries in the resemblance matrix , containing as
'variables': the real ranked similarities; the simulated mean similarities; the lowest and highest
similarjties obtained, at that rank, over all permutations (not shown on the plot); and the lower and
upper 99% limits (or whatever specified) of the simulated values at that rank .
.The Bristol Channel zooplankton workspace should still be open - if not repeat the steps at the top
of page 65; of 4th-root transform and Bray-Curtis similarity on C:\Examples v6\BCzoo\bcza.pri.
With the transformed data matrix Datal as the active window, Analyse>SIMPROF, taking the
defaults but also (.I' Stats to worksheet). This is testing for any evidence at all for community
structure differences in the full set of 57 stations. Graph2 (and Data2) show that whilst very large
and small similarities could be obtained by chance under the null hypothesis of no group strnctures
(the extreme left and right side of the ·plot), there are more relatively high and relatively low
similarities from the real matrix than can be generated by permutation under the null. In other
words, the real similarity profile (in red) falls a long way outside the 99% limits of the simulated
profiles, virtually throughout its length - and this is captured formally in the n statistic.
·-·
.)'

- - A:1uJI
- - Mun
-----· Lowtr
- - - - ·· U r

.J.

~---.-,-..--

.S!MP.RDF ~,,. .. ·1.·" . .,: .·.,


1000
Resemblance ... Rani<

<)Jlpul r Num pcrmul~llons ·- -


Mean:
e]Doplots
11000 Other
e]Doimls .. . . .,
%1imls: Slnulotlons: Actual Mean Min Max Lower P Upper P "'
J" j99 1999 92.783 82.337 76.057 I 94.263 , 76.837 90.506

;I
91 .589 80.112 75.085 1 88.068 76.01 8 -~i
~tits to worksheet 91.099 78.961 74.828 84.829 75.1 ' 83.96
.-., 0 Pl volues to fie --aa.ooa - 1e"'.092-I 13.5s1 84:532 74.714 • a2·.393
87 .445 'ff.405 73.133 · e1-:844j74:122-:S1:oii4
)
87 21 j 76.856 73:Q54 , - ili14e 73.571 80.201
I OK Cone el Help
1 6 6.296 I 76.356 72.554 · a u48 13:5- 79191
h-~...:..e--i- -::86=--.2:::8-:--i 17- - 72,4a2·· -1s.sil[_i3.2i6-~79:066
4 j7 5.9 "

)
' 67
J
;

" - ---~ _... - ~· -f--- - • . ··{ 1- -


5. Clustering

Histograms The second plot produced, as in all permutation tests in PRIMER v6 (e.g. in ANOSIM, RELATE, ';•
of null BEST etc), is a histogram of the null hypothesis distribution of the test statistic (1t), in this case
distributions measuring departure of each simulated profile from the simulated mean profile. The value of 7t for
the real similarity profile is also indicated on this plot (vertical dashed line), giving a visual
assessment of how likely the real statistic 1t is to have come from this null hypothesis distribution -
which is formalised in a p value, given in the results window for a SIMPROF run. A final option
(~Pi values to file) is to export the 1t values from the 999 simulations (or however many specified)
to a text file, with a .txt extension, which simply puts each 1t on a new row. This is not commonly
used but is an available option with all such permutation tests in v6; it allows the user to redraw the ·.
histogram of 7t under the null hypothesis with other software, or examine its parametric form etc.
Look at the histogram plot from the above SIMPROF run, probably labelled Graph3, showing how
different the true 1t is from the simulations. The histogram can be displayed in finer detail with, say
Graph>Special>Bin size: 0.02 and restricting the x-axis to the range of the histogram, e.g. with
More>X axis tab>~Specify scale>(X min: 0) & (X max: 2) & (X interval: 0.2). Finally, the results
window, SIMPROFl, simply gives the departure statistic 7t (= 6.4) and its p value (<0.1 %); we also
saw this information in the last row of the results window for the earlier CLUSTER run. Any minor
discrepancy in the two 7t statistics (or p values), between running SIMPROF directly and as an
adjunct to CLUSTER, is due simply to the fact that only 1000 simulations are used to calculate the
mean profile and departures from it: each new run will therefore produce slightly different answers.
If greater precision in significance levels is desirable - which it is not in general! - then simply
increase the number of-permutations (there are almost always a vast number possible for this test) .

.Similarity profile
Data worksheet
\.. -
Name: Dar.al
!oar.a cype: Abundance
Sample selecr.ion: All
'Variable seleccion: All

Parameters . ·- . - . .
Permur.ar.ions tor mean protile: 1000 Histogram Plot .< . • .
Simulation permur.ar.ions: 999
P lor. limics: 99~ er. size:
Resemblance: 10.02!
Analyse becueen: Samples
Resemblance measure : Sl7 Bray Curr.is

Global Test
I OK Concel Help

Sample sr.ar.isr.ic (Pi): 6.413


Signiticance level ot sample sr.ar.isr.ic: 0,1\
Number ot permur.ar.ions: 999 (Random sample)
Number ot permur.ed sr.ar.isr.ics grear.er r.han or equal co Pi: 0
,;I

SIMPROF A direct run of SIMPROF, with all the additional information this outputs c.f. its call in CLUSTER,
on a subset can be carried out on any subset of samples simply by selecting that subset from the data matrix
of samples before entering SIMPROF. The different methods of selection will not be met until section 8, but
if you are already familiar with this you might like to note that the created factor SprofGps, with
levels a, b, c, d, that was produced by the earlier CLUSTER run (page 65), could be used to select
each group in tum (e.g. on Datal, Select>Samples>•Factor levels>Factor name: SprofGps> -
Levels>Include a). On input to SIMPROF this gives a very different real profile, falling within the ~r
99% limits across much of the range, and 7t within the null histogram (albeit the upper tail, p=6%).
140 f - Save the current state of the BCzoo workspace in bcwk.pwk, for later use.

68
5. Clustering

:· SOOROI?". ,.·~ SIMPROF can b e run directly using Analyse>SIMPROF, rather than as part of another analysis
1.... direct run "·"
-1
·;'.. such as CLUSTER (above) or LINK.TREE .
(section 11) . In that case, the active window must be
}~-· ·. ~ :.::.. · the data sheet, the rectangular matrix whose variables are permuted randomly (and independently)
across the samples. SIMPROF must always have such an underlying d~ta matrix available - it
cannot operate solely on a triangular .resemblance window. Thus when the SIMPROF option is
taken in CLUSTER - which is run when the active window is a triangular matrix - v6 uses its
internal knowledge of how that resemblance matrix was calculated to specify the correct data
matrix, as a default under Data sheet: on the A nalyse>CLUSTER>SIMPROF tab. Change this
default at your peril! - its main purpose is simply to remind you that SIMPROF always works on
the underlying rectangular array not the triangular matrix.
Direct runs of SIMPROF are used to test for evidence of internal group structure in the full set of
samples that are submitted to it, i.e. a single test rather than the (usually large) series of subset tests
in the CLUSTER option. The advantage of doing a single test at a time is that more information
can be output, including a plot of the real similarity profile (red line), the mean simulated profile
(dark blue contin}lOUS line) and the upp~r and lower limits within which 99% (say) of the simulated
profiles lie, at each rank value (broken lines). Another output (optionally selected by checking
.!Stats to worksheet) is of the data used to plot the similarity profile graph. This worksheet will
have a number of rows equal to the number of entries in the resemblance matrix, containing as
'variables': the real ranked similarities; the simulated mean similarities; the lowest and highest
similarjties obtained, at that rank, over all permutations (not shown on the plot); and the lower and
upper 99% limits (or whatever specified) of the simulated values at that rank .
.The Bristol Channel zooplankton workspace should still be open - if not repeat the steps at the top
of page 65; of 4th-root transform and Bray-Curtis similarity on C :\Examples v6\BCzoo\bcza.pri.
:· I
With the transformed data matrix Datal as the active window, Analyse>SIMPROF, taking the
defaults but also (.!Stats to worksheet). This is testing for any evidence at all for communily
,, structure differences in the full set of 57 stations. Graph2 (and Data2) show that whilst very large
and small similarities could be obtained by chance under the null hypothesis of no group structures
(the extreme left and right side of the plot), there are more relatively high and relatively low
similarities from the real matrix than can be generated by permutation under the null. In other
words, the real similarity profile (in red) falls a long way outside the 99% lim its of the simulated
profiles, virtually throughout its length - and this is captured formally in the re statistic.

--.A.:luJI
--Mun
- - · - - · lowtr
-----· u C>tf

---,- ---·- -~

;SJ~RDf..,-;,-.: ':.·. , . .:. . :;,


:I 100 J I
500
I
1000
Rcsemblonce ... Rink

(),1pul Num porrmt~llons


Meon:
00o plots
0oa limls ~-]
% imls: Simulations: Actuel Mean Min Mox Lower P Upper P "
[Sf---~
1999 92.763 82.337 I 76.057 94.263 76 .837 90.506
•~r-:---+--:9-1.5~8':'"'9 80.112 1 15:005··- ·ee.068T76.ai·a· ·-05~555·
~ots 10 worksheet e1 .099 j 78.ss1'"174 .8w • 84.829 ! 75.1 - 63.96

" 0 Pl volues to fi e 4
aa·-'---+-
. 5
88_.0_08+.28~2 p:.~~1 i
~.532l 74J1°4 ....82. 3?l
87 .445 77.406 , 73.133 81 .844 1 74 .122 81.084
6 87.27 7~~ '_!~~4.CSiJ48 73.571=:_iio.201
OK Conc:ol Help
7 86.298 76.356 72.554 1 81 .746 73.5 , 1s.rs1
6 86.284 75.917 72.482. 79.838 173.210779-:00S ...

67

\\l - -- - .
---- --·· ..
5. Clustering

Histograms The second plot produced, as in all permutation tests in PRIMER v6 (e.g. in ANOSIM, RELATE,
of null BEST etc), is a histogram of the null hypothesis distribution of the test statistic (1t), in this case
distributions measuring departure of each simulated profile from the simulated mean profile. The value of 1t for
the real similarity profile is also indicated on this plot (vertical dashed line), giving a visual
assessment of how likely the real statistic 1t is to have come from this null hypothesis distribution -
which is formalised in a p value, given in the results window for a SlMPROF run. A final option '·
(./Pi values to file) is to export the 1t values from the 999 simulations (or however many speci tied)
to a text file, with a .txt extension, which simply puts each 1t on a new row. This is not commonly
used but is an available option with all such permutation tests in v6; it allows the user to redraw the
histogram of 1t under the null hypothesis with other software, or examine its parametric fonn etc.
Look at the histogram plot from the above SIMPROF run, probably labelled Graph3, showing how
different the true 1t is from the simulations. The histogram can be displayed in finer detail with, say
Graph>Special>Bin size: 0.02 and restricting the x-axis to the range of the histogram, e.g. with
More>X axis ta~>./Specify scale>(X min: 0) & (X max: 2) & (X interval: 0.2). Finally, the results
window, SIMPROFl, simply gives the departure statistic 1t (= 6.4) and its p value (<0 .1%); we also
saw this information in the last row of the results window for the earlier CLUSTER run. Any minor
discrepancy in the two 1t statistics (or p values), between running SIMPROF directly and as an
adjunct to CLUSTER, is due simply to the fact that only 1000 simulations are used to calculate the
mean profile and departures from it: each new run will therefore produce slightly different answers.
If greater precision in significance levels is desirable - which it is not in general! - then simply
increase the number of.permutations (there are almost always a vast number possible for this test).

,,
Similarity profile e
Date! workshee t
i
Name : Dar.a l
' Dar. a cype: Abunda nce
Sampl e s elec t.ion: All
Vari able se l ect.ion: Al l
l
Pai:c!lne t ei:.s ·- . ...
Pe r mur.ar.1ons tor mean profi le : 1000 Histogram Plot ' . , ·
1
S i mular. i on pe r mur.at i ons : 999 ' •• a
·Plot limir.s : 99% Br.size: "·
Resembla nce : Jo.02!
, Analyse ber.ueen: Samples ·~
Re s emblance mea sure: 517 Bray Curt i s

Glob al Test
I OK Cone el Help

Sample sr.a r.ist ic (Pi): 6.413


Siqniticance level ot sample statistic: 0.1%
NwN:>er ot permur.ations: 999 (Random sample)
NW!'be r ot permut ed s tati s r.ics qreater than or e qual to Pi: 0
v i

SlMPROF A direct run of SIMPROF, with all the additional information this outputs c.f. its call in CLUSTER,
on a subset can be carried out on any subset of samples simply by selecting that subset from the data matrix
of samples before entering SIMPROF. The different methods of selection will not be met until section 8, but
if you are already familiar with this you might like to note that the created factor SprofGps, with
levels a, b , c, d, that was produced by the earlier CLUSTER run (page 65), could be used to select
each group in turn (e.g. on Datal, Select>Samples>•Factor levels>Factor name: SprofGps>
Levels>Include a). On input to SIMPROF this gives a very different real profile, falling within the
99% limits across much of the range, and n within the null histogram (albeit the upper tail, p=6%).
140 (-- Save the current state of the BCzoo workspace in bcwk.pwk, for later use.

68
6. Workspace

6. Managing the workspace (Window, View, File)

Explorer Someone working through the earlier examples cannot fail to have noticed another innovation in
tree . v6, namely the Explorer tree displayed to the left of the PRIMER desktop. This allows the user to
manage the workspace - the collection of input files, results, plots, data sheets generated etc - in an
orderly fashion, finding and activating windows only when needed, and saving the entire
workspace structure in a single operation (and data file) for later retrieval. The Explorer displays
-... the logical relationships between the constituents of the workspace, making it easy to navigate but
also, importantly, reflecting the program's internal knowledge of these inter-relationships. v6 uses
this structure to select sensible defaults and even to pass information (such as newly created
factors) through the tree - by both forward and backward propagation, where this makes sense.
For example, if you have been analysing the data in the previous section, of zooplankton from the
Bristol Channel (workspace bcwk.pwk, pages 65-68), your current workspace may look like:
,;:;------------·--·--·--·-----·-· -·-. ------~

'Ii PRIMfil . _. . . . .· ;- , .: • .- ; . ·: .'. ·...-~..:;:'..'.:\!!!!.!I~~


fr' Fie Edit View Gt aph Tools Window Help

D ~ !iii ~ C9. I "~' ~ 6 ! It ft> fa li1 ~· .o I ~ tit : •!~ If


!EJbcwk fi!.iilijiijiiijiijiiiiii~jiij;iiiiiJ.~iifiiiiiiftimiiiiii~if.§~~~-
8-b bcza
8 I! Over81 Trcnsform1
8 ·D Dlllo1
r:: I! Rcsemblonce1
8 •& Resem1
8 I! CLUSTER1
L. ~~
8 I! SIMPROF1 ~
,. ~m:g
~ Grcph3
. ·D 0111e2
1' SIMPROF2

~-

t ~ Graph4
L ~ GraphS
•· t.) 0Clla3
r-- ..

r-~~~~~~~~"""""~
;,.;;
"'-;:;;-=
· =--=-=·==-="='=
· =~=-====
· =·=~-=
--~~===-=-=-
=-
=·=~~-:=:._;_:;=..;:'--~-~>~·

Closing, From the Explorer tree (left panel of the PRIMER desktop), you can see that workspace has been
redisplaying saved (or renamed) as bcwk, with every row below this representing a window held in the
& tiling workspace, whether open or not. This is an important difference from PRIMER v5, where if a
windows window was closed, by clicking the close button I~ at its top right, it was lost to the workspace and
would need to be re-entered or re-created. In v6, you can close down windows individually - or all
of them at once, using Window>Close All Windows - and all windows remain in the workspace,
and can simply be re-displayed by clicking on their name (or icon) in the Explorer tree. The option
to tile or cascade windows may also be useful here, so that a common way of tidying up the display
area (right panel of the PRIMER desktop) is to close all windows, click on the Explorer names for
the ones you want to re-display and then Window>Tile Horizontal or Tile Vertical or Cascade.

ma
I Cascad-;-
Tile Horizontal
--1111 File Edit View Graph Tools -----'---~
1 Tile Vertical D ·~ fiil !~ [9. ! ,)(. litil1 Cascade

',
I
·
Arrange Icons
· J. •1
~n bcwk
~n
8
, ~orizontal
iIwcumw Tae

·yE:· :zeOverea Trensform1 Arrange l cons"'S


I bcza 0
2 Ovcral Transforml
s -D 08181 I doso Al Wridows

3 Data I 8 " Resernbtonc e1 1 Grapht


I 5 ..~ Resem1 ~ 2 Graph2
1
, 4 Resemblance I
I 8 " Q.USl-,.,--r~·~-~
I 5 Reseml 1 - ~ Graph1 I j
i 6 CLUSTER!
:$1- r, SIMPROF1 :j I
I 7 Graph! - ~ Greph2
.
.,
~ Grcph3
1
' ~ e Gr4Ph2 I
I I
'
9 Graph3
I · -[!) 08102 ~ 'I
More Windows... 8 " SIMPROF2
r ~ Graph4 ,v·
< ~.~~::. . .~~;~-:·:---- - L1'J- :

.,
69
6. Workspace

Minimising For consistency with v5 and other Windows software, there is also the option of minimising
windows windows to the bottom of the PRJMER desktop, by clicking on the usual minimising button la , I

but this is no longer necessary or useful with v6, it being easier simply to close the window and
retrieve it from the Explorer tree.

View menu There will be situations in which, in order to display as much of a plot or datasheet on screen as
possible, it is desirable to temporarily hide some of the other features, to widen and heighten the
display area. In standard Windows fashion, the View menu allows three features to be toggled off
and on again: ~Explorer (the left panel), ~Tool Bar (icons in the top row) and ~Status Bar
' 'I:
(bottom row, used to display the position of the cursor in a data matrix, the progress of a
calculation etc). a ====
(.

ivTool B«
: v Status Bar
ctrf+T
ctrl+B
I
I

Understand- The Explorer tree, however, is the key to managing your work pattern in v6. Returning to the
ing the Bristol Channel zooplankton workspace: after the workspace icon and name, mibcwk, each entry in
Explorer tree the list has an icon describing the type of window represented and they are linked in the hierarchy
of analysis steps. ·t?J
bcza is a species by samples rectangular data matrix, initially subjected to a
4th-root transform: ~Overall Transform! is the results window for that operation. All operations
on the Analyse menu, and on th~ main section of the Tools menu, produce such a results window,
which then leads down the branch in the Explorer tree to a derived data matrix or resemblance (in
the case of the Tools menu and the top section of the Analyse menu), and often to one or more plot
files (in the middle and lower sections of the Analyse menu), sometimes with further data sheets.
(In fact, what distinguishes the operations in the Tools menu from those in Edit, Select or File is
that a results window is always produced in the former and never in the latter.) The transformed
datasheet is tJ Data 1, which becomes the starting point for all further analysis here. Bray-Curtis
resemblance is calculated on this: ~Resemblance! is the results window (which here just records
the choice of coefficient, but in other circumstances might also list a selection of samples and
variables made before that computation, see section 8). The resulting similarities are in ~ Resem 1,
the icon denoting a triangular resemblance matrix. This in turn is input to cluster analysis, and we
have seen the results window~ CLUSTER! and dendrogram ~Graph l before (page 66).
..
This is the end of that analysis strand but another one branches off from the transformed data
matrix tJ Datal, with results ~ SIMPROFl and .eioraph2, ~Graph3 and tl Data2 all being
outputs of this one SIMPROF run, as illustrated on pages 67 and 68. This Explorer tree structure
passes il}formation forwards and backwards between the windows: look at the factors in the data
matrix 11] bcza at the top of the tree, using Edit>Factors when it is the active window, and you
will note the presence of the new factor SprofGps (the 4 main clusters, labelled a, b, c, d). This was ·-
created much further down the tree, in the CLUSTER run on~ Reseml in fact.

"'
SprofOps "'
a
!B bcwlt ,..----1,...-~
a
•3 ·\] ll!?J
a
9- ~ Overal Transfonn1
Overall Transf~m~~il?I a
8 •D Oota1 Downweight high Resemblance
I
· -----
G ~ Resemblance!
& ~ Resem1
-21 -·0a -·-
8 ~CLUSTER!
Data worksheet Create lower triangular resemblan
Delete••. 111 -2 -- 0
2- · -
Nl!J'ne: bcza
ei Graph1 'e
G ~ SIMPROF1
Data type: >.bu Data worksheet
Sample ~electio Ne.me: Datal Key... I ,9 -- - -
4
0
b
~ Graph2 Variable ~elect Data type: Abundance 10 3 a
~ Graph3
D
Dota2
Sample ~election: All
Variable selection: All
lmpor1 ...
I 11 -~ ·~--
5 b

5 ~ SIMPROF2
~ Graph4 OK b
~ GraphS
DDota3

~-
•• ~. "'-- --,'!.,,..,_ -··- -'-=-- --.: ·1 ~J
Row 1 Col 1

70
6. Workspace

Deleting I~ part of. an analysis is wrong or u~elpful, ~nd you wish to delete that part from the workspace,
items from a simply click on the results name or icon " m the Exp lorer tree and use the Filc>Deletc Results
_workspace menu (alternatively, a right click when over the Explorer tree gives a 'floating menu' which also
includes the delete operation). This results window, and all other items below it on the same branch
will be erased from the workspace. This is not a reversible operation, so you are prompted to make
sure that this is what you wanted to achieve. (Note that there are no 'Undo' optio;:)s in PRIMER, in
the interests of computational efficiency and size of databases that can be handl ed - there arc
significant speed and memory overheads to implementing Undo features.) This delete operation
can be used on any item in the Explorer tree and the main File menu will change to reflect the
active window type. Thus, File>Delete Data will delete everything in the workspace when the
active window is a data matrix at the top of the tree, such as U bcza, from which everything stems.
(Of course it will not delete the original file, bcza.pri, from the \Examples v6\BCzoo directory.)
If your bcwk workspace is still open, try deleting (one of) the SIMPROF runs by Filc>Delete
Results, and note how the Graphs and subsidiary data matrix produced by that run are also erased.

- • ·-· r•

?,; PRJMm . .
Clo$0 Worksp&e Fie Edt View Graph Tools WlndOI
Savo Workspace ctrl+5 D ~ &;l ~ C! [g. l .X. ~It"
Savo Workspoce As...
Savo Reds As •••
©; bcwll
C2i bcw1c S · t) bczo Ffe Edit View Tools Window Help
8 b bcza RONme Workspace... 8 1" Overol Transtorrn1
e'.1" Cvcroa Tronsf0fm1 Rename Results... e "l.'.J oa101
e ·D oa101 E 1" Resernblance1 ~ bcw1c
El ~ Rosernblonce1
8 'D. Resem1 Peoe Setup ...
G- - 1
1
8 -~ Resern1
6 " ' CLUSTER!
a D bczo
E ~ Overol Tronstorml
8 " ' CLUSTER1 Print Preview ... 1 ' ~ Grophl ;:: {) votot
_ ~
.......
.. G<oph1 PrWlt... ctrl+P
~"' SM'ROF2
: 1' Reumbl4nce1
E·1"™ - - - -- ~ r ~ Gropl\4 :: ~ Resernl
~ Oroph~ Recent Workspaces , ~ Oroph5 .~ ~ CLUSTER!
. ~ OrophJ Recent Items -lj DotoJ 0.'.J Oropht
· D Doto2 1 Ext i ~~
'.:. 1" SlMPROF2
• ~ Oroph4
- ~ OrophS
{) DotoJ Delete 'SIMPROF I' fr om worksl)&e or.:l everytlW>Q below U

~I Cane~ l

Rolling up If analyses within a workspace are needed in the longer term, and therefore not to be deleted, but
branches of are cluttering up the Explorer tree display so that it is cumbersome to navigate around, then they
the tree can be temporarily hidden. When clicked, the 'roll-up' icon 8 collapses the tree structure below
that point, replacing it with the 'roll-out' icon l±J (see above); clicking this reverses the effect.

Renaming Note also how the different examples of the same icon in the tree are given serially ordered names:
items Datal, Data2, ... , Graphl, Graph2, ... ,etc. These names can be changed by (left) clicking on the
name or icon in the Explorer tree, to make it the active window, and taking File>Rcname Data (if
the window is a data sheet, otherwise the menu item changes to Rename Results, Rename Resem ,
etc.) An alternative is to right click on the name or icon to do the same operation from the floating
menu that results. Another possibility is just to click twice, slowly, on the name (not the icon) in
the Explorer tree, and the cursor should now be in an edit box around the name, which is
highlighted, and it can be retyped or (with a third click) edited in the usual Windows manner.
Saving plots The name of an item can also be changed as part of the process of saving it. In general there is not
the same need to save individual derived data sheets, resemblance matrices, plots etc, that there was
in earlier versions of PRIMER, because the whole workspace can be saved in a single operation
(see below). Nonetheless, there may be occasions (covered on pages 20, 27 and 54) when data or
resemblance matrices need to be saved in an export format (Excel or text file), or in internal v6 (or
v5) binary format, with extensions *.pri and *.sid, perhaps so that they can be opened in a di ffercnt
workspace. Similar options are available for graph (and results) windows, all such save operations
being accessed by clicking on the relevant name (/window) and taking, e.g. File>Save Graph As.

71
6. Workspace

For the Bristol Channel zooplankton workspace bcwk, try saving the dendrogram BGraphl with
File>Save Graph As>(Filename: bczaden) & (Save as type: PRIMER Plot Files (.ppl)). Such an
*.ppl file
is PRIMER's internal v6 binary format for graphics, and is not accessible in any other
software (even PRIMER v5). Its main use is likely to be in passing olots to other v6 users.
Demonstrate this by launching a parallel run of PRIMER 6 (click on the icon on the Windows rff,
desktop) to generate a second PRIMER desktop with an empty workspace, and open this newly
created plot file with File>Open>(File name: bczaden). You will see that, in spite of the link to its
original data and resemblance matrix being cut, the plot is complete in itself, and still capable of
being modified. It even holds with it the background information on factors, inherited from the
data file, so all the modifications seen in section 5 can be implemented: not just resizing, retitling,
suppressing keys and history boxes, zooming, condensing, rotating, etc, but also changing the
displayed symbols to a different factor. Try Graph>Data labels & symbols>Symbols> v"Plot>
v"By factor: SprofGps, if that factor is available from your previous analysis on page 65. Or try
creating a new factor which is a fixed 50% similarity slice, by Graph>Special>Slicing>v"Show
slice>Resemblance: 50>Create factor>Add factor named: 50%slice, and then displaying the
resulting three groups as differing symbols, with Data labels & symbols, as just described. Of
course, ·a new factor created here (the 50%slice) is not automatically available in the original bcwk
workspace. It is important to realise that there is no dynamic linking between sheets in different
workspaces, or between windows in a workspace and any outside data file (even if it is in PRIMER
format, *.pri, * .ppl etc); only an unlinked copy of any file is ever entered into PRIMER. ....
~

Vector vs. Closing this second PRIMER desktop and returning to the original bcwk workspace, note the other
pixel plots options for saving the dendrogram bczaden with File>Save Graph As>(Save as type:). There are
two vector formats, Windows Enhanced Metafile and Windows Enhanced Plus Metafiles, both
with standard Windows *.emf extensions. At the time of writing, the first is preferable, with much
current software as yet unable to cope with the newer, second format. Vector format plots will
usually be the best option for exporting graphics from PRIMER into other applications, for fine ....
tuning of title placement etc, in graphics presentation software. When taken into Powerpoint, for
example, with lnsert>Picture>From File, a Draw>Ungroup operation will convert such a vector
plot into its component points, lines, text boxes etc, so that even drastic subsequent modification is
possible. Note that this is the graphics format that is transferred via the clipboard if the Edit>Copy
menu is taken when the active window is a graph: Edit>Paste in Powerpoint or Word, for example,
will paste a vector format plot into those applications.
In cohtrast, the other plot output options from PRIMER all produce bitmap (pixel-based) files:
stahdhrd * .bmp, * .png, *.tif and *.jpg. Subsequent modification options are then limited. However,
if the plot can be put into satisfactory finalised form just using the manipulations avai lable within
PRIMER, then high-quality output is certainly possible through the bitmap route. Saving the plot in
one of these formats allows specification of the resolution, e.g. (Plot Size•As screen) or Plot
Size•Specify>(Width: 1024) & (Height: 768). The latter are the default number of pixels but larger
or smaller numbers can be entered (preserving the aspect ratio if it is an MDS plot! - see section 7).
The files will generally be much larger than for vector plots.
a.______. . .,
I New... Cttl+N
I Open ... Ctrl+O
0As screen
Clo$e Workspace
Save Works~ce Ctrl+S 0Specity
1 Save Workspace As ...
~~h:

i:~~~esAs ~
ReMme W~ce .. .
u•

..
· : ', · r~ .. :
!1024 ..
"
'·-~-
Cone el
ReMme Graph...
Dele te Graph...

Paoe Set up... File name: I.bczaden


Pr1nt Preview...Save as typo: PRIMER Plot Fies r .pp()
_ Pr_int_.._. - --== = =riPAIMER Plot Fies (".ppl)
Recent Workspaces
Recent Items
Vus(·.bct)
Wridows Enh Melalie Fles ('.errl)
~I . \~

Wridows Enh P\Js Mell!Re Fies (".em/)


Wridows Bl Fies .
* ;,: ~: :;; ::.::::;~H :::-!: :: ::::::::::~~::::::.; ::: :; :;:;•:; ~
!Al\"fltt
Ex~

72
6. Workspace

Saving Certain graphs, such as MOS ordinations (section 7) or Cluster dendrograms, can be validly rotated
,' graph values ·· in an infinity of ways (effectively); after the results window is generated, perhaps to align them
better with a previous run under different transformation or coefficient choice. The plot is always
saved in its currently rotated state, naturally, but these will not correspond to the co-ordinate
positions of ordination points, for example, which are listed in the results window. In order to
allow current ordination co-ordinates (or in the case of a dendrogram, the ordering of samples 011
the x-axis under the current rotation) to be accessible to the user, an option to Save Graph Values is
provided. This can be run in two ways, by File>Save Graph As>(Save as type: Values (*.txt)) or
more directly by File>Save Graph Values As. The end result in both cases is a text fi le containing
either x,y or x,y,z co-ordinate points for an ordination (each point to a line and tab separated within
a line), or a list of the current order of samples in the dendrogram (each sample label to a line).
Saving Results windows can be saved in just the same way, when they are active, e.g. on ~ CLUSTERl ,
results File>Save Results As>(File name: bcdenres) & (Save as type: Rich Text Files (* .rtf)) will save
the window on page 66, containing the sequence of SIMPROF tests, to a rich text format fil e. The
latter preserves any formatting in the results window when taken directly into Word, for example.
The alternative is to Save as type: Text Files (* .txt) which outputs all text in fixed size Courier IO
format. In practice, most of the information in results windows is displayed rather simply, with
only headings in different size fonts for example, so the distinction is rather unimportant.
Adding It is not permitted to edit directly the information in a results window. This tells you what operation
notes or analysis was actually carried out, and what the outcome was, and should remain immutable, to
avoid subsequent confusion (e.g. if the workspace is revisited at a later date). You can, of co~rse,
copy (Edit>Copy or Ctrl-C from the keyboard) and paste selected information to an external text
file, Word document or other application, via the usual Windows clipboard. · However, 'if you need
to annotate the PRIMER session actually within the workspace, e.g. clarifying or commenting on
analysis steps or results, this can be achieved by Add Note, selected from the floating 'drop-down'
menu that ap pears when you right click on any item in the Explorer tree. A blank ~Note window
is opened, for you to type in annotations as appropriate. This is displayed in the Explorer tree on a
branch leading down from the originally clicked item, which could just be the workspace name, in
which case the ~Note would always appear at the bottom of the tree - a convenient place to put
' read-me' information. Text can be pasted into the note from the clipboard (Edit> Paste or Ctrl-Y),
whether this comes from outside or from elsewhere in the PRIMER session (e.g. part of a results
window, information copied from the Edit>Propertics>Description: box, etc). You can even copy
graphs (or highlighted portions of data sheets, see section 8) into a note window, using Edit>Copy
on a plot window and Edit>Paste within the note. So, it is possible to create a note-form summary
of the main features of the analysis, held within the workspace (though the lack of fonnatting
facilities is always going to make this only an intermediate step, at best). t?' Note windows can be
renamed, deleted and saved, as with any other Explorer tree item, the save operation again being a
choice of.txt or .rtf formats (* .rtf is needed to preserve any plots in the output file).

•. • ' I'
, Filo •' V'- Tools Wndow Help
--:-·----: -:------ JOuster on<llysis produces I D I
:&,P.RJMµl'.j~,:::,:~: :•-:.. r: !groups r eported by Colins o 'il Cit
Copy
ctrt+x
Ctrl+C
L ~ Cil!.
I ·- -
i.; J:) JJ r1. ,. : 1<; . l.· ~:· ~

File Edit View Graph Tc (note the nunbering on lhe x ~@M@ltwcy - ,4 ' ~)It .. ' ~ Z2J I ~
jsteloo runber 1·29, 31-58 n a Delete Del
D ~ lit) ~ [9. ,'i, sample runbel ,., ml .Cluster onftlysis produces the some 4 groups reported
s-D Da101 by Collins lll1d ll'.lliams (n~e l he numbl:mg on 111e x-oxis
E bcvvk 8 " Resembloncet
1s lhe station OU'llber 1·29, 31-58 not lhe somple number
8- D bc:IO 8 ~ Resemt
1·57). Nole lhe 1rst 111'°'4> Is morginet on the SM'ROf
lesl. stotion 12 neorly splits 0 11.
E: " Overol Tronsforrr
e D Da101 El " bc:derves
EJ.o ~ bc:zaden .Q.AI
..:r:J OW\"'.a._ , .1.u.rn
-~. ~-·-'-MW AIJl .....Y-~~ ~~-.,,,. -'°
E. " Resemblonce1
8 ~ Resem1
8 " bc:derves
9-"e
I

t
SIMPROf2
Gropl'>4
- / Note1

.-"---'~
-~.D!!!!! L ~ OrephS
\ '.
Close W0<kspace ~{) DCllo3
Sove Works?<'CO As...
Rerwno Workspace...
Sove Item As .. .
Rerwne Item .. .
Doloto !tom .. ,

· Addtlota ·~ ·

73

- · _ .. - - __ .. - ...:..=.;:._ .. :.. -=-=-:--·= ===:::;_;::-..•;;.'. . ·;..-;.-.. -'-------


. .;.
6. Workspace

Printing Direct printing from PRIMER is also possible for analysis endpoints such as results windows, all '·
results and graphics windows, and notes (but not data sheets or resemblance matrices, which are generally too
graphs large and unwieldy for easy printing - selections of them are best saved to Excel, or via text format
files to other applications which have sophisticated tools for manipulating large blocks of data into \
printable form). For results and note windows, File>Page Setup offers standard Windows choices
of •Portrait or •Landscape, paper size, margin sizes etc, and a Printer button to select from the
installed printers (including any Network connection choice). For plots, Page Setup adds an extra
initial dialog, to allow a single graph to spread over several pages, both vertically and horizontally.
This can be useful when attempting to generate a single, readable, hard~copy dendrogram from
numerous samples. (File>Print Preview shows the outcome, with l , 2, 3, 4 or 6 pages displayed
at a time. Note how a repeat of the margins of the picture helps the overlay of the printed sheets.)
The Page button on the Page Setup dialog then returns to the same sequence of orientation, printer
choices etc as for printing results. The File>Print menu itself has the standard Windows format.

New.. . Ctrl+N ,,,


Open.. . arl+o
dose Worksp&ee Printer
1
Save Workspace Ctrl+S
~ave Workspace As ...
.. ! Name:
S!lve Graph As ... Status:: Re~
S!lve Graph YIWes As ...
T_ype: Brothel HL·1250
Rename Workspace••• t Where: LPT1 : New... Ctrl+N j { ,
Rename Graph.. . I Comment Open.. , Ctrl+O I
Delete Graph.. . l_ ----- d ose Workspace
I Network... I Save Workspace
,Save Worksp.!lce As ...
Ctrl+S

v i Save ResUts As ...


Scuce: I

1·- Rename Worksp~ce .. .


---1
0 Moill pege plot
Hor. peges: Vert. peges
,· Orientation
, 0Portra~
Merom (rrilimetercJ

J Left ~
-

Right ~ I
1 ReMtne Results...
Delete Results... _

P!IQe Setup .. .
, 0L~e r Top: ~ Bottom: ~ I Print Preview.. .

Pago ... r::;:i •m lf'#4=?PM


...----, .------. OK 11 Cancel 11 Priries... ~ Recent Workspaces ~
·:
OK Cencel Recent Items
-Exit- - --- -- -

Workspace Finally, the most important thing about the new workspace concept in v6 is that entire workspaces
planning can be saved. Saving, opening and closing workspaces was dealt with on page 20; all that needs
noting here is that a workspace re-opens with branches in the Explorer tree rolled out l-=l onl y as far
as they need be to list all the windows that were displayed when the workspace was last saved. A ll
other branches are rolled up (±1. One Explorer tree cannot contain multiple workspaces, though t '·
multiple launches of the PRIMER desktop are possible, each with a different workspace (or copies
of the same workspace! - which will be unlinked). In general, it seems desirable to use a new
workspace for each new project, so that analyses from the same study are all held together, but
unrelated data are never put into a common workspace. Starting with fresh workspaces at new
stages of an ongoing analysis also seems wise: a workspace that gets unwieldy is likely to consume
too much memory, and datasheets (with their created factors etc) can always be moved without loss
of information, by saving them in PRIMER binary format (*.pri) and reopening them in the new
workspace. If a single workspace is persisted with, and added to over time, it is still wise to rename
it every so often, so that earlier analyses can be retrieved if something disastrous does happen!
When it is so easy to save all information in a single operation, it is tempting to rely on the
workspace as the only form in which the data is saved, and there are advantages in doing so - the
workspace contains much relevant history about how the analyses and data sets relate to each other.
Nonetheless, there is always a good case for saving the basic data matrix, from which everything
else can be rederived if necessary, as a data file in its own right - either in PRIMER * .pri format or
in Excel or text format, depending on size.
140 <- Save (with File>Save Workspace) and close the Bristol Channel zooplankton workspace bcwk.

74
7.MDS

7. Non-metric multi-dimensional scaling (MDS)

MDS Chapter 5 of the methods manual describes the operation and rationale of M DS ordination; its
rationale purpose is to represent the samples as points in low-dimensional space (usually 2-d) such that the
relative distances apart of all points are in the same rank order as the relative dissimilarities (or
distances) of the samples, as measured by some appropriate resemblance matrix calcu lated on the
(possibly transformed) data matrix. The interpretation of an MDS is therefore stra ightforward:
points that are close together represent samples that are very similar in community compositio n (or
environmental variables, or biomarker responses, or particle size distributions etc), and points that
are far apart correspond to very different values of the variable set. The algorithm is an iterative
one and not guaranteed to converge to the optimal solution, hence the need to run it for a number of
random restarts. The default is 25 but if the run time for a single restart is not an issue (i.e. the
number of samples is not very large) it may be worth doubling this number, to be fairly sure that a
better solution could not be found (one can never be 100% certain!) . A reasonable criterion fo r
deciding that enough iterations have been performed is that the same (lowest) stress value is
obtained from more than a handful of the restarts; 'stress' indicates how fa ithfully the hi gh-
dimensional relationships among the samples are represented in the 2-d (or 3-cl) ord ina tion p lots
(see Chapter 5 of the methods manual for definition and interpretation of stress).

Running an From directory C :\Examples v6\Exe, File>Open the workspace exwk frorn the Exe estuary study
MDS of nematodes (first met on page 57). If this does not exist, open data fi le exna and re-run the cluster
analysis: Analyse>Pre-treatment>Transform (overall)>Transformation: Fourth root; An alyse>
61 -t Resemblance. and Analyse>CLUSTER, taking all the default options. The Bray-Curtis simi larity
matrix between samples is in Reseml ; with this as the active window (e.g. click on its icon ~ ).run
Analyse>MDS>(Number of restarts : 50) & (..l'Shepard diagrams), with defaults for other options.
The outcome is a results window, MDSl, and Graph2 to Graph5, which are the 2-cl and 3-d MOS
plots, and Shepard diagrams for the 2-d and 3-d cases. For Graph2, remove the symbols with
Graph>Data labels & symbols, unchecking the (Symbols>Plot) box, and adding (Labe ls>..l'P lot>
..l'By factor: site no)- see pages 57, 58 for an example of the Data labels & symbols di alog.

D ~ r;I .{·~ G, 1 Non-metric Multi-Dimensional Scaling


~
I ~ exwtc Minimum stress: R '"embl ~nce wor ks h ee t
e D exnn jo,01 Name: Re l!leml
e"
S
Overel Trans f
:r:J Dete1 i
E " Reserr: 2STAGE.. .
RELATE.. .
Da t a t ype: S i milar it y
Se le c t i on : .A.11
~Shepord diogrC1111:
a ~ t:a.m3
C. ~ CLUSTER1
- ~ Greph1
0 Conflglx at ion plot 8 e$ t 3-d. c onr igur«t ion (Stre~ s : 0.03)
Samp l e l 2 3
J ,~ Sl - l. a3 -a .19 -a . 2S 3 .1
,-----. S2 -1. 11 -a.29 -a . as 1. 2
- i.1 a -a. 3a - a .a9 1.3
-a. 3 6 - a. 2 6 1. s

,.ti exwk -a . is a . s9 1a.6


S D exna
E 1" Overal Tronsform1 Exe nematodes
STRESS VALUES
'3 ·t'.J aa1a1 -~--
Bt\S onn:
Rtumbbnce: SI Re peat 30 20
E. 1" Resemblance1
Ei D. Resem1
1 a .03 a . as
15 2 a. a3 a . as
E. 1" CLUSTER1
19 3 a. a4 a . a6
I L 21 Greph1 18 4 a. a4 D.05
E 1" ~ s a.04 a.a5
r 21 ~h2
I ~ Groph3
-21 Greph4 49 a .a3 a.as
I 2l Graph§
so a .a3 a.as
6 •• • naximum number ot icerat ionl!I used
11
3 - d : n inimum strel!ll!I: a .a3 occurred 31 -
s 2-d : n inimwn l!l cr e s l!I: a.as occurred J 7 ~
10
r---------~~----..!.!:'....--------11< ; '" >

75

,.;::-_
::-_
= -'.:"'.".~-=-:--:: ---· ----·-·._
- __,-....---;,;.-=--·-~~---- -~
7. MDS
'· I
MOS results The MDS 1 window first lists the final co-ordinates of the samples for the best 3-d plot attained in
window all restarts (20', here). Where there are several equally small stress values, the first such configur-
ation is selected (here the lowest stress is 0.03, a very small value, indicating that the 3-d ordination
is a near-perfect representation of the high-dimensional assemblage structure at these 19 sites). A
column to the right of the co-ordinates, headed %, is a breakdown of the total stress of 0.03 into ...
I
contributions from each of the 19 samples (sites 12, 16 and 19 account for nearly half). In a case
where the stress is larger, this is intended as a diagnostic tool for identifying sample points that do
not fit well in the low-dimensional ordination space. (It is not always as useful as it might first
appear, however, since outliers can 'drag the ordination space towards them' so that their
contribution to the final stress is low - they have pushed that stress elsewhere in the diagram. There
is a close analogy with the effects of outliers on contributions to sums of squares in regression.)
Below the 3-d listing come the co-ordinates of the 2-d ordination with lowest stress (0.05 here, also
sufficiently low to give an excellent representation of the high-dimensional data), and then a listing
of the 3-d and 2-d stress values from all 50 restarts. The same minimum stress solution occurs
many times ~ore_than_half__Qf the restarts for both2 -d and 3-d runs - so is unlikely to be
· oettereaeyfurtherresfarts. Note that your run Of MDSWlll n otproduce exactly this sequence of
stress values, because each repeat is from a different random start, with- the -random seed drawn
from the computer's timing mechanism. Each restart ir:volves an 1 e_ratiy£...QYCl~(alternate fitting of
non=mefficregression to the Snep_arachagram and adjustment of the configuration pomfSby
steepest descent - see Chapter 5 of the methods manualf.TfTheileration has not converged within
l 00 cycles then it terminates: this has happened here for a few of the restarts with the 3-d solution,
indicated by ** after the quoted stress value. {It probably suggests that two equally good solutions
are available, ·and the algori~ is cycling back and forth between them; this happens quite often
when the structure can fit easily into lower dimensions, so that stress is very low). ··.
Shepard That the stress is low here is also clear from the Shepard diagrams (Graph4&5), which are scatter
diagrams plots of the pairwise distances between samples in the final ordination against (dis)similarities in
the resemblance matrix (see also Fig. 5.2 in the methods manual). Stress measures the departure of
points (blue) from the best-fitting increasing regression line (black). When all points lie on the line,
all statements of this form are true: 'site 15 is closer to 16 in the ordination than site 5 is to site 6,
because the similarity between sites 15 and 16 is greater than that between 5 and 6 '. In other words,
rank order relationships are exactly preserved and stress is zero. Note the regression does not need
to be linear, and is certainly not here (it is not true that if 5 and 6 are twice as dissimilar as 15 and
16 then they are twice as far apart in the plot) - this is what gives non-metric MOS its flexibility.
Shepard diagrams have the standard options to change font size, titles, axis scaling, symbo l type,
colour and size etc, available with all plots through Graph>General or Graph>Data labels &
symbols (pages 57-59); irrelevant options in these dialog boxes are 'greyed-out' .
'>!..

Size:

Oenetel Dllla labels & symbols tii;;;- - - ~

I nfoforL I E.U! ..

E]Pfol keys

[ Keys font... I .....


El Plot histO<y
Itistory forL I 3 .0

~2.S
E)showtex! pane .....
- - ---· 1l~2.0
1.s .....
OK ,i;

--- ~0.5
~1.0 ~1 .0

o,__--1~~~~----+---
jo.s ...
100 BO 60
SOniarly
40 20 0 0
100 60 60 40 20 0
....
Simlarly
....
76
7.MDS

Accuracy & For the MDS run above, defaults were taken for two of the options in the MDS dialog (page 75). A
fit scheme change to Minimum stress: 0.001 would both increase the accuracy with which stress values arc
reported in the results window, and also decrease (from 0.01 to 0.001) the lower limit of stress at
which the iteration decides that it has effectively reached a perfect solution. Reporting stress to a
third decimal place can be useful in deciding whether a batch of restarts, which have the same
stress to two decimal places, are really the same solution. However, it is unwise to take small •
differences in str.ess____!9_p_s~riously:__~9lution!_w!t~ ~~£1Y the same stre~ ~_?.~ally lead to the same_ \
. .i~te~r~t~_ti~~ ~nd, in an case a low-dimensional ordination is onl an approxi~a~i_?.~to the_real \
h1gh-chmens1onal pattern, ~d not necess~n y_ a very gQ,2__2_.~ln fact, 1t can be quite revealing to
look at repeat MDS runs with only one restart, which much of the time will therefore converge to
an inferior solution, and observe which points differ from their placement in the optimal solution.
The other default option previously taken was K.ruskal fit scheme• l. This is by far the commonest
choice in practical use of non-metric MDS (it was the only choice in PRIMER v5). Basically, it
allows dissimilarities which are equal (tied ranks) to be represented in the final ordination by
distances which are not equal, whereas K.ruskal fit scheme•2 constrains those plot distances to also
be equal. This is p~obably an unnecessary constraint for routine use, though it is included here for
comparison. For example, in the Exe nematode data there are a number of ~ites which are 100%
dissimilar from each other (they have no species in common), as can be seen from the right hand
side of the Shepard diagrams on the last page (similarity= 0%). The first fit scheme allows these to
be represented by a range of distances (about 1.6 to 2.4); the second fit scheme forces them to
equality and makes the approximation in low-dimensional space less flexible (higher stress).

Minimum sires,; o. 239


Number of restorts:

~ :B f!EiD I
44
45 0.051
0 . 339
0.1 ="'
·=
1.0

l·:ru:~el ft ~cherne 0 Shepord diag


46
47
46
0.051
0.051
0 . 051 ..
0.1
0.1
0.1
u"'
c:
~ 0.5
..
01 49 0.051 0.14'\ i:5
02 0Conllg1.rotlon so 0 . 051 0.1
0
~ •• • Haximwn n\UNler ot 100 80 60 40 20 0
Similarity
Of< Cline el Hinimwn :iere:s:i: 0 . 051 occurred 36 time:i -
2-d : Hinimwn :itre:s:i: 0.1 occurred 30 time:s

Ordination Turning now to the purpose of an MDS analysis, the ordination plot (or 'configuration '), in 2-d
plots in 2-d initially, the first point to note is that there are no mi;:aningful absolute scales for the axes. An MDS
with zero stress retains only the rank order information about the entries in the resemblance matrix,
so there can be no conserved units of measurement and no absolute direction in the configuration.
Plots can be arbitrarily rotated or reflected in any of the axes without changing their meaning. What
is not arbitrary of course - in fact central to the method - is relative distances apart of points.
Changing the aspect ratio of an MDS plot (shrinking or expanding it along one axis) destroys the
key inferences, of the form 'samples 15 and 16 are closer together than 5 and 6 hence they are more
compositionally similar' (even though the direction of 15 to 16 is perpendicular to that of 5 to 6).
Within PRIMER, this is not a concern. For Graph2 in the Exe workspace, try reshaping the plot
window differentially, by clicking and dragging on one side of the window only, and the plot
preserves its original aspect ratio rather than taking up the shape of the new window (as it wou ld do
with a cluster dendrogram for example). You should be careful not to reshape the plot later, having
~. saved or copied it via the clipboard to a graphics presentation package, or if you wish to copy the
MDS co-ordinates from the results window to an Excel file for replotting in an outside appl ication
(most plotting programs do not preserve the aspect ratio!). Note, incidentally, that all such tables of
'>.
data found in results windows are tab-spaced so can be copied and pasted direct ly into an Excel
sheet in correct tabular form, without difficulty- another improvement on v5.

"'1
77
7. MOS

EXe nematodes
:ocnu:aos
15
10
18
Info font ... ·1 ...
1710 {i 1~

0Plot key#
0
78 0
.
....
11

~Plot Hstory 510


History , 10
18
1~
0Showtext 1710 11f
780

OK 0
11

510

Rotating and For the 2-d MDS configuration, Graph2, rotate and/or flip the plot (say to match up with Graph6,
flipping the the 2-d ordination using fit scheme 2) by Graph>Rotate Data and Graph>Fllp X or Flip Y. The
ordination same options can be obtained by right-clicking when over the plot, giving a 'drop-down' menu
located at the cursor. With Rotate Data, the cursor changes to a hand 6 and by clicking, holding
and dragging, the plot can be continuously rotated within its current rectangular frame (which, in
the nomrnl convention for 2-d plots, is designed to have greater .width than height). Note how the
configuration always retains the correct relative distance apart of points. Any orientation is equally
valid, remember, so it is just a matter of convenience or clarity as to which to choose. When the
plot is at the desired orientation, the hand can be changed back to the normal cursor ~ by
Graph>Pointer, so that the points are not then grabbed and rotated by accident (as tended to
happen in v5 when a cursor over an MDS plot was always in the form a hand) . A third way of
achievjng the Rotate Data option (hand cursor) is to select it from the Tool Bar, by clicking on the
Rotate data icon ;;'!; • To return to the standard cursor, click on the Pointer icon ~ on the Tool Bar.

---~------------·---.
/Di:G~aph2 . ' . ~§rRI
Exe nematodes
Data ~bels 20 Stnss: 0.0~
Special. •• 15
General•.•
19
.; Poilter 18 20 St1tss: 0.0~
Zoom In ~1 -J 1 19 15
I Zoomo..t 1~~ 1617
18
Rotetd Det

. ,fa
~ 9 87 9 87 16 17
,; ___
stJbse
MOS I

6
I Save Grapl
11 6
: Save Grapl
11
10 5
6
10 5
'··

Linking The 2-d ordination plot shows a clear separation of the nematode assemblages at these 19 sites into
MDS plots about 5 groups (which Chapter 11 of the methods manual shows can be related to sediment
to cluster properties such as the median particle size, anoxic layer depth and interstitial salinity). It is a good
analysis check on the adequacy of the low-dimensional approximations seen in MDS and cluster analyses,
to the real high-dimensional structure (there are 140 species variables in exna!), to look at the MDS
and dendrogram results in combination and there are now two ways of doing this in v6.

78 """
7. MDS

Firstly, the clusters that are defined for a fixed similarity slice. through the dendrogram can be
automatically put into a factor, with levels defining the different groups (as on page 60). This factor
can then be displayed as differing symbols on the MDS, and the agreement noted. An alternative
factor created from a CLUSTER run is the (variable similarity) grouping from SlMPROF tests.
For the Exe nematode data exna, you should already have created a dcndrogram (Graph I) and, on
page 61, a factor for a 30% similarity slice through this. (If not, recreate the factor from the
dendrogram by Graph>Special>Slicingv"Show slice>Resemblance: 30>Create factor>Add factor
named: 30% slice.) Returning to the MDS (Graph2)., use Graph>Data labels & symbols to set
Symbolsv"Plot>V"'By factor: 30% slice, leaving Labelsv"Plot>V"'By factor: site no . (When plotting
both labels and symbols, the symbol is centred on the point, with the label above. On its own, either
is correctly centred.) Yo_u might also like to re-run the CLUSTER ~~1:1tine on Resel})_!,_!aki n_g_the
SIMPROF option to create a new factor of rrummal-sized grouping_~ (~s O!]_p~ge 65), and
Slip~:-rimpose this factorassymools on the MDS:-The difference oetween the two approaches iSihat
the latter givescfustersfor which there is no statistical case for interpreting within-group structure.
This may be too fine a division for practical classification of a large number of samples: that may
require a coarser division at an arbitrary (lower) fixed similarity level.
·:I . ··--·· --··-· -·- -- . ··---·· ..
·~-,,. --~--- ~--- ...--,----:----y-~ - ..
.
Graph 0 pllons .. :" . .. . . . . . .. ; . '..'t"f:."<I °'>t'-'~ 6
·. . · :.,:.~·:, i. ·,~·~· ~.:-:,.. ~..~~ ~
•·tttISQIRttiifi!tf~
Spect41.. .
Generbl .. . Oenerel Dllla labels & symbols mes Bubble con1~ 1
Pointer L11bel.s Symbol$
Zoom In Size:
0 Plot
Zoom Out ~£~
jv Rotate Data •
,--;,-;-;------- 0By factor
js1e no
~~Y fact~--~
1 s~e no
sHe no
[ Siii
H2S
Exe nematodes s.
.a. 1 20 S hess: 0 .05 30% slice
.ti.. 19 .... b
15
• ... d

9 87
lij 18
x
+ ~ a
+c Help
••• • 16 17
xx • g
+e
6 x f
QI

..
11

tJ
., ...
10 5

:.'

Special menu The Data labels & symbols and General menus (Graph Options dialog box) have the same format
for configur- whatever the plot, but the effect of Graph>Special differs for each type of graph. Here it produces
ation plots a Configuration Plot dialog box (which is shared with PCA, section l 0), and covers: a) overlaying
groups from a cluster analysis as enclosing contours; b) traj ectory plots, in which points are joined I\
•. up in sequence;~ bubble 1 osin a variable as circles of di ffering sizes, on the points \)
.. in the ordination. Tfuee other options are more re eva c-J>e A-thcrr rMDS : cl) a toggle between 2-d
and 3-d plots - in MDS this is not needed because the 3-d plots are separate graphs (because a 2-d
MDS is not two of the axes of a 3-d MDS, but an independent fit); e) a choice of which axes of the
plot display which ordination axes, more relevant to PCA; f) toggling on or off a vector variable
plot (only for PCA again because it is a linear technique). Options not available are 'greyed out' .

Cluster The second way in which a cluster analysis can be displayed on a 2-d MDS plot, so that the level of
overlays on agreement can be assessed, is not for (variable-level) SIMPROF groups but for fixed resemblance
MDS plots levels, at which slices are drawn through the dendrogram. For each slice, denoted by a different
line type and colour, a contour line is drawn round each of the clusters it defines. The 'slackness '
determines the smoothness of the enclosing line (how loosely drawn it is around the points in th at
group). Slack%: 0 gives the tightest enclosing polygon (the 'convex hull' of the points).

79

.·- -... ... . .~


- - - · - -.... 4 . . . - --..--r-...----
--.- ~. . . . . .
I..

1. rvros ·~

For the Exe nematode MDS, Graph2, in the current exwk workspace, remove the symbols that are
displayed with the 30% slice factor, using Graph>Data labels & symbols (take off the Plot check ,>,;
box under Symbols but leave Labels./Plot). Take Graph>Special>Clusters./Overlay clusters> "'
(Dendrogram plot: Graph!) & (Resemblance levels: 10, 20, 30) & (Slack%: 80). Experiment with . :.
other slackness values (0, 40, 120 etc), and change the colours and line types with the Contour tab
in the Graph Options dialog, i.e. Data labels & symbols>Contour>Key and double-click on a •
colour or style box to get the same option screens as for Factor keys (page 33). -...

- Axes ---,-: • · - - - · i CUlfers ..


11
v Ponter 0 20 scllfler I X axis: <. • r 0 OVerlay ~slers
Zoom In
Zoom Out :;c, i:t~tle< I IhllS1 v 1· ' .
1
Dendrogi:arn plot
Rot.,i:e Data
02Dbubble v axis:
1
l0raph1 v I
Fl!p X lhl)S2 Resembfence levels:
FEp y ~L=:J
l'.OS stbset ...

f Generol ro;;ja labels & symbols Tftles 4;;~''"'~~,


.~ . \ '
Similarity
- 10
( ·- .. \ \.
I
I
',
\
---20
... 30
~· l.1 .- .• . ,
\\ .; 9 87
\
\ , ,\,,.
. __ ' I
,...i,1
\
\

\\ ".
(s---~, )
I. l
-..., \ ... 11,J.
'-..........::::-r·

.."-
·, .,

The agreement between clustering and MDS plots is exemplary, giving confidence in both, as
approximations to the high-dimensional pattern.
138 <-- Save and close the exwk workspace. In this form, the data will not be needed again.
.,
(Clyde dump Directory C:\Examples v6\Clydemac contains counts (clma.pri) and biomass (clmb.pri) data on 84
-ground species of benthic macrofauna in soft sediments at 12 sites in the Firth of Clyde (data pooled over
macro fauna) several grab samples). The sites Sl to 812, sampled in 1983, were on an E-W transect across the
Garroch Head sewage-sludge dumpground, with site S6 in the vicinity of its centre, and S 1 and S 12
• -> several kms distant (see Fig. 1.5 in the methods manual). Also in the directory is an environmental
fi le (clev.pri) for the same sites, consisting of 10 sediment concentration variables, of the metals
Cu, Mn, Co, Ni, Zn, Cd, Pb,Cr and %carbon and %nitrogen, and the water depth. (Original data
from Pearson TH, Blackstock J, 1984. Dunstaffnage Lab Report, Ohan, Scotland.)
Overlay The points in an MOS plot can be joined by straight line segments, in the increasing order of levels
trajectory of a factor defined for those samples. This can be useful in demonstrating trajectories of multi-
variate change through time or over spatial transects. The factor needs to be purely numeric
(alphanumeric levels are not considered to be ordered).
Open the Clyde abundance, biomass and environmental files (you can open all at once by selecting
them in the Open dialog box, clicking whilst holding down the Ctrl or Shift key, as usual). Analyse
only the biomass data clmb for now. Create a numeric factor of site numbers 1 to 12, by Edit>
Factors>Add>(Add factor named: Site#), highlighting the new column (click on its header) and
Edit>Fill Down>Label number. Alternatively, clma already has this factor defined, so Edit>
Factors>Impo.r t>(Worksheet: clma)>Select will just import this factor for you (see pages 32, 34 ).

80
7.MDS

Now, pre-treat clmb with a square root transform, taking Bray-Curtis similarities between samples,
running Analyse>MDS with a Minimum stress: 0.00 l , otherwise taking the defaults. On Graph 1,
from the right-click menu take Data labels & symbols and for symbols leave on (Symbolsv"Plot)
but turn off ( By factor), but for labels turn on both. Returning to the graph, now take Special from
the right click (or Graph) menu, and Trajectory>v"Overlay trajectory>Trajectory numeric factor:
Site#. (You might also need to flip or rotate that plot to obtain the orientation shown below.) The
clear pattern is of steady community change approaching the dumpsite centre ( 1 to 6) matched by a
similar reversion (along a somewhat different trajectory) in the other arm of the transect (7 to 12),
~\
with the two end-point sites land 12 (over lOkm apart) sharing relatively similar communit ies.
Save the Clyde workspace as clwk for use in a later section, and File>Clos e Workspace.

l
w.sns::a1~W*'t<'i:.
Special... II
... GenerIll ...

Plot typ~ A~e: Clu!:ler:

020 ecotter X axis:


0 Overley clusters

Symbols
::r 1 s;:~.:::~1 fMOs1 v
I
' ·-
0Plot
02Dbubble Y axis:
IMDS2 ~ fJ
F·:______
·" • ., -- 1
:. ax1:;:
Slack%:
i.<R••-. '" ~
1100 I

Tre.joctU1y
00verloy trajectory
-- ·---~· 1
..,
12
6
5

10

~ raJ G3
7
25 0 . 014 .. 0 . 058
,,..
-
.. . Maximum nwnber ot it:erat:ions U:!led

Minimum :!lt:re55 : 0.013 occurred 5 ti me s


104 ~
-ct : tt 1n1mum s t: re:!l:!I! 0.054 o ccurred 9 t.1 me!5 v

(Ekofisk oil- The soft-sediment macrofauna assemblages sampled at 39 sites, at different distances from the
field study) Ekofisk oilfield (post-drilling), were introduced on page 21, factors set up on . page 31 and the
workspace ekwk last saved on page 49. Transformations of the count data ekma were considered
49~ on page 38 and, for the square-root transformed sheet Data 1, Bray-Curtis similarities computed on
page 43, giving Reseml. Recreate the latter similarity matrix if not available, as on page 43.
Run Analyse>MDS on Reseml to obtain the 2-d ordination plot Graph!. You may need to rotate,
with icon ~. on the toolbar, or flip X or Y from the right-click menu to obtain the plot shown.
1
(Return the cursor to a pointer, with It . on the toolbar, when you are happy with the rotation, to
avoid rotating further by accident.) Already displayed as four different symbols is the factor Dist,
which puts the 39 sites into a priori defined groups depending on their distance from the centre of
oil drilling activity, D: < 250m; C: 250m - lkm; B: lkm - 3.5km; A : > 3.Skm. Remove the site
labels using the Data labels & symbols menu (uncheek the Plot box under Labels), to unclutter the
plot, perhaps increasing the symbol size (Size: 130 under Symbols from this menu).
A clear pattern is seen of steady change in the benthic community as the rig is approached (left to
right in the plot). Note that sites within a few hundred metres of each other, close to the rig (D),
have quite variable assemblages but distant sites (A), some of which are 16 km or more apart, are
more tightly clustered - the opposite of that expected if the communities were unimpacted by
drilling mud disposal. The clear distinction between groups B and A (confinned by ANOSIM,
section 12) is good evidence for an impact extending to more than 3 km from the oi lfield centre.

81
7.MDS

'~Di_st_ _ _ _"~' ~
Ekolisk oilfield macrofauna
[tt""""=.=1.,,,,=:""'S<jliirt-=--=---~---.
IRutmbbnct : Sl7 !t?y Cunlr rlmlR!-A
20 SU'Ur:0. 11 OiSI
..a.o
TC I Add... I ~- Dist - Qi;, .. o;-;;;. ~
:1 B
•A .-------. ,S30
ICombine... l
1S36 o
0
,,- - -2-
1
;.
1

• A
r------. 'S37
I Renome ... I 'S31 :1 4
.... ---t---;---
• ., 6
·1 '5 ....

• •
•• + •
Delete...
c
2
·2 8
Key... c 2 9

•• •
T
c 2 10
n.>orl ...
529
I 'S32 c 2 11 -
-, - · 2- 1_2 _
•S38
OK I -< ~ - ::--
- .....-
>-
:::'

Species The above MOS draws on all 174 species (with more abundant species given greater weight) but
bubble plots only certain species will be responsible for creating the observed gradient - others will be largely
'noise'. The behaviour of a single species over the sites can best be seen by a bubble plot (Special
menu), in which circles are superimposed at each point, of size related to abundance at that site.
On the Ekofisk plot, Graphl , take Graph>Special>(Plot type•2D bubble) and under Bubble data
choose (Worksheet: ekma)>Variable: Harmothoe castanea, for example. This species is clearly
found a bly at sites furthest from the oilfield, in fairly modest but predictable numbers - the key
shows that the largest bubble represents a mean of3 individuals per sample (in an average of 3 Day
grabs frbm that site), on a continuous scale for which a count of zero is a vanishingly small circle.
It usually aids clarity of bubble plots to add back all the sample points as labels (thus Data labels
& symbols>Labels./Plot>./By factor: Dist), so the group labels are plotted at the bubble centres.
Now repeat with different species: e.g. Stenhelais limicola (also near the start of the list), for which
the generally larger counts are found closer to the rig; Pholoe inornata, whose large counts are now
within a few hundred metres of the centre, though not at the two most central sites to the right of
the ordination; Chaetozone setosa (about 30 species down), an opportunist whose very large counts
are now at the most impacted sites; and many species which are seen to be scattered randomly.

Duplicating Bubble plots can only be plotted with one superimposed variable at a time. In general, more than
graphs one is unlikely to give the clarity for a publishable (monochrome) plot, though if you have a
particular application in which two variables could successfully be superimposed - because the
species counts were mainly found in different parts of the MDS space - then this could be
constructed in Powerpoint (say) from the separate plots output from PRlMER. More helpful would
usually be to view several bubble plots side-by-side in PRIMER. Firstly get the 2-d MOS plot,
Graph!, into the desired format (e.g. from Graph>General uncheck the Plot history box, perhaps
increase the Overall font scale to 130, and from the Titles tab, delete the main title - if it is blank
the plot will reutilise the space - and reduce the subtitle font size to 80, which also controls the key
title size). Then use the ToolP.Duplicate menu item on Graphl, to generate new plots Graph3,
Graph4, Graph5 . Superimpose different species on each plot, close all windows apart from these
four (e.g. W indow>Close All Windows and redisplay Graphl, 3, 4, 5 by clicking on them in the
Explorer tree). Then take Window>Tile Horizontal or Vertical. Clearly, though this is an
excellent way of displaying species establishing a group or gradient structure, it is a not a good way
of identifying them _in the first place - for this see SIMPER or BEST (sections 12, 13).

82
7.MDS

• Ploltype · ·rAXes - - ' - - - - - · - , o.1slers


( O 20 scatter . j x axis: . ·, . : · jI 0 Overlay clusters

~-----t I" ~~colter


I 020 bubble ·
... ·'1 jlll)S1
,....
v _ex1_s:_ _
I~·!.. I! :~d~~l'._ll"'. ~·-· '-"l
~ Gra~h1 - . . .. . . .. -····· . ..• . ~- •....... ·- ·:-.~-·; .·· . -·~fQj~
. -· _ ----- J' 1 1~MD_S2_ _
~ Pointer
Zoom In
1 ! ,...,... Ekoti~ oil(iekJ macrorauna
Zoom Out !, · - - - - · · .-------------..!i:::=~=':"
=:
""""'=
·""""=
:u:=:"~=:''"'~cu~u;,='
,,..~"~J~,,.,,.,,=--,.,...,.--,
;anu :o
' ttll'llfOtno * c•1itllM'
4't 11
Rotate O&o
0 0 .)
FlpX Bubble deda - - - --------7" - ·1
FlpY
lo'OS subset ...
Worksheet:
je1ana
'·i· @» 1.2
1 '. . I
.1
...

,
Save Graph As ...
Save Graph Values As.
([) 2.1
I~
1(:1Y

File Edt View Graph • ¥mdow He%>


D ~ ~ C9 ~ jM•]!J\jilL»qiifo fl cg ~
l8 ol<wl< stoi:i T•sl<s -G-r4-p-
h1-------~~~.
eo-0 ekmo Options ...
1 »••n .0.11

.,.
E " Ovcrol Tr011$lorm1
I e-'.:.) 0et•1 :. • 0.) c l

2. " Roscmbloncet 0•.2 I e~ O•


8 ~ Rescmt

E:I "r ei
h()St
0rop111
00 0
®2.1
,, )
l @t~: ~ (~
o
C) 1"
lO
~. -- ~ Oropl\2 c
- ~ Oroph3 D D
~ Oroph4
I - e,
Orop/15 II '
!:. " Ovcrol Tronstorm2 1~----------~
t'.J
Ooto2
E " Ovcrol TronstotmJ
::.- ' ] Oato5
=: ~ Resemblonce3 0 I 0 80

E "
- ·~ RoscmJ
Rcscmblonce4 0
A
A
,~
fP r~0
0
@ 0
Q

o ~o
llO

.
.1, j
E " Rcscmblonce5
~ Rcsem4

·!'.::> Re scm5
A •
~ A A
,, 0

C"Jeoo
8 U ekcv ••• 0
AA A
®
E' " Normalset
e D~oJ
_( _I., • IJ

Environment It is not just information on the variables underlying the ordination (species, here) that can be
bubble plots displayed as bubble plots but any variable set defined over the same sites. Within the same
workspace there should already be open an environmental sheet ekev (introduced on page 23 ).
Make it the active sheet and check that it is all selected - since only a subset was considered on
"· page 39 - by Select>All (and Edit>Clear Highlight if you wish, though the latter is not essential,
see the next section). Now, returning to Graphl as the active window, if the worksheet under
Bubble data (from the Graph>Special menu) is chosen to be ekev, ab iotic ~lfili _such as total
hydrocarbons (THC), barium (Ba) etc are available to be selected as bubbles, superimposed on the
biotic MDS. Note that the sites need not be in the same order in the biotic and abiotic sheets, and
the environmental matrix could even have a larger number of sites. The key feature is that the site
labels must be spelt identically in the two data sets, so that all samples in the biotic matrix can be
matched in the abiotic data (or vice-versa, if you are doing the analysis in the other direction, wi th
an environmental ordination 'providing gradient(s) on which species information is superimposed).
A failure to find a matching sample will give the message ' Samples not in Bubble data selection!'

83
7. MDS
...,
Plot values To see how well the assemblage pattern relates to sediment hydrocarbon concentrations at these
on bubbles sites, superimpose THC on Graphl, the biotic MDS, with (Worksheet: ekev>Variable: THC) &
( v"Values as labels). The latter allows the actual values of the superimposed variable to be plotted
as numbers at the centre of the bubbles (and moves any existing label upwards). This needs some
care to avoid looking cluttered: you will need to switch off the ordinary plot labels (Data labels &
symbols tab), restore the Sub title font (Title tab) and Overall font scale (General tab) to their
defaults of 100, and perhaps adjust the font scale of the plotted values. The latter is on the Bubble
tab, which can be accessed in the usual way from the right-click menu: General or Data labels &
symbols, or it has another access through Scale under Bubble data on the Special menu. Either
way, you may need to take Font and adjust the (Size:) box smaller or larger. Also, try rescaling all
'-
circles proportionately, so that the maximum value of the superimposed variable is represented by a I

circle of 150% of the default size, with Bubble scale: 150 on the Bubble tab. Then turn off the key
(since the values on the plot make it redundant), by unchecking the Plot keys box on the General
tab, and reinstating the main title and sub title to give the relevant information there (Title tab) .

020 seSter xaxis:


I' MOS1
I1 Oenerol Delo labels & symbols n;;- awe I._,c_onr_Oll'
_._l- - - - - - - - - ,
...,
I' V olds:
(
JI ~S2

., Pointer t ;~ 1..XJ~'

Zoom In ----
Zoom Out
Rotate Data
0 Specify scale ,.
Bubble dale
FlpX
'Mlrksheet:
FlpY
lekev
M:>S subset ..
Fort Sile:
-
Save Graph Varloble:
Save Graph 1Ar1o1 vj !@1 1 t


Oeo1c1 ColOll': E]Llrnll size
0 ~
E]vwes es labels
0 1111ic
<
AaBbYyZz
OK
OK Cancel Help
,_

:to"u :
MDS for macroravnal a3!Jemblages at 39 El<ofis!< s11as
+ $tdlmtl>I B:
l t ~n u . O. lt

1~ 1~ !fj
,. ~t 151~
a 1~s fl@ - w
s 1~ .11 ~
1<2.. 10 1u
lqj Hl
s 6)
9 14

The resulting plot shows that THC is strongly elevated only within a few hundred metres of the rig
and, whilst it can 'explain' the different (and variable) assemblage structure in D compared to other
groups, it cannot explain the differences out to 3 km or so (e.g. between B and A). On the other
hand, barium - a potential indicator of the extent of dispersal of drilling muds - does display a
discrete drop from all groups D, C, B to the apparently background levels in the A (>3.5km) sites.
(Maximum bubble size needs scaling down for this, to 75 say, since there are many large circles.)

84
7. MDS

~ Changing The bubble plot for phi mean (grain size in$ units) shows no apparent association with the gradient
: bubble scale but this may be because the circle scaling is unhelpful in this case. The default is to make a circle
: & colours of zero size correspond to a zero value for the superimposed variable; that may be appropriate for
positive quantities (species counts and biomass, contaminant concentrations whose background
level is close to zero etc) but is not relevant for all variables. Some may take negative values (e.g.
redox potential), for others zero may play no special role (e.g. temperature in °F), and some may be
·~ on log scales and/or take all values far removed from zero (e.g. phi coefficients, salinity in coastal
marine studies etc) . Scaling a zero circle to the approximate min of the data (with the largest-sized
circle staying as the approximate max) is more sensible, and achieved by ¥"Specify scale on the
Bubble tab (e.g. from Special>Bubble data>Scale). This gives suggested Bubble min: 2.8 and
Bubble max: 3.3 for phi mean, which are the rounded min and max of the data (greyed out and
non-operational until ¥"Specify scale is checked - a useful way of checking on the rough range of a
variable for a large data set!). Other values could be chosen, e.g. if bubbles needed to be scaled
uniformly over several variables. Also in this dialog is the chance to change the bubble fill and
border colours (ineluding to white), by clicking on the displayed colours as usual. (Also reinstate
the key from the General tab, with ¥"Plot keys and uncheck the 'Values as labels ' box on the
Special menu.) The revised bubble plot for phi mean is now more informative but only shows a
slight, and unconvincing, increase across the gradient. Save and close the workspace ekwk .
......
~dl ei~
AIDS for macrofaunat sssemolaga~ at 39 Ekofisl< s11es
+ md.:11 r:m slzt phi units)

Bubble seolo:
iloo--

0Specify mlo
_J
~
Bubble 11"1:
§.__
Colour.

j
.
8ound111y colour:

Bubble mox:
[~_ _ J ~

3-d MDS In the three cases above, MDS stress in 2-d (0.05 and 0.11) has been low (see guide lines in Chapter
graphs 5 of the methods manual), but this will not always be the case and a much lower stress might be
obtained in 3-d. This MDS is a separate calculation to the 2-d plot, so is given in the Explorer tree
as a different graph (in contrast to PCA, section 10, where the 2-d plot is the first 2 axes of the 3-d
solution. Though 2-d plots of all 3 pairs of MDS axes are a viable alternative to the 3-d plot, e.g.
for publication, the first 2 axes of the 3-d MDS should not be viewed in isolation: if a 2-d plot is
needed then it should be the 2-d MDS solution). The 3-d MDS can be arbitrarily rotated or fl ipped
on X, Y or Z axes. There are logically two different ways of rotating the picture: as with 2-d MDS,
the axes can remain fixed and the points rotate within them (use the t!. icon from the Tool .Bar or
Rotate Data from the right-click menu) or, as is more usual with 3-d plots viewed as a 2-d .image,
the points remain fixed within the current box but the axes can be rotated (use the C. 1coD or
Rotate Axes). In another improvement to v5, a full range of annotation is now possible, with Data
labels & symbols, j ust as for 2-d plots (though there are DO 3-d bubble plots).

85
1,
7.l\.IDS

38-+ Open the W Australian fish diet workspace WAwk in C:\Examples v6\WAfdiet (introduced on
page 37), or if not available, re-open the data WAfd·and Analyse>Pre-treatment>Standardise>
(Standardise Samples) & (By Total), to remove (some of) the effect of differences in gut fullness.
Transform Datal with square root (Analyse>Pre-treatment>Transform (overall)) to downweight
the otherwise dominant polychaete and copepod diet categories, and create Bray-Curtis similarities
in Reseml (with Analyse>Resemblance). Produce the 2-d and 3-d MDS plots, Graphl and Graph2
(with Analyse>MDS). Note the history box now includes the standardisation step (seen on the plot
and also from Edit>Properties on the similarity matrix, Reseml ). The results windows, MDS 1,
shows that the best 2-d solution, found about 50% of the time, has a rather high stress of 0.20,
whereas the 3-d stress is substantially lower, at 0.13. The 3-d plot, Graph2, is likely to be the more
useful one to examine therefore (3-d stress will always be lower than 2-d but sometimes the gain
will only be marginal, and not worth the extra difficulty of visualising the configuration in 3-d).
When examining the 3-d plot, it will help to switch off the labels (from Graph>Data labels &
symbols), and make more use of rotating the axes Q . rather than the points fl .· You may also want
to change some of the coloilrs or symbols (the + sign for fish species S. robustus is rather indistinct
here), us\ng Key ortData labels & symbols and double-clicking on the symbol/colour to change.

t! !il ~
Select View ~ Tools wndow ~ •
G:·W. i GQ8 It foPGJ ;;.· t: l ~ tli •!• f
f!,WAW'I.
;:'. l:.)WAfd
E ~ Slendardse1
='::;.) ~
0111&1
Overtll Transfonn1
:;p~C ~ $
E CJ Oet"3 IM
T • AC>Qibyf
y s.uhotr\b.
=. " Resernblance1 'I A. t 5ong:'1t .
= ~ [iml • PJenynsi
S ·ft hl>S1 • S.busen.
t ~ Greph1 l _. SJ'Ol>un.
S:wruu
·~ Greph2.
'.) 0111&2

Cleor Hi9h~9ht
I nvert Hi9hh9ht
Cut Ctrl+X
Copy ' Ctrl+C
Post~ Ctrl+V
Insert
Delete
Move Columns
Sort. .;

of 7 ooershore f ssh specie::: from WA

History:
Resemblance type J O :::.u:o.1J ~ies
Standardise Semples by Total
0 s 1tnaarfty • A.ogllbyf
Transform: Square root T S .sc.homb.
0 Olssimilarfty esemblence: S17 Bray Curtls simileirtty A. t lon9 .1 t.
• P.jt nyn.sll
• S .bunn.
O Olstance - S .robust.
S.vitt 1tJ1
0 C0trelation

Q Rank

Number of row/columns: ~

Description:
Dleiery d81a from 7 marine fish species, sampled during the day at
several nearshore sttes In Western Australia (lower west coast),
during spring/summer seasons. Dela Is volumetric contribution
made by each of 39 broad dietary calegories to the total content
of fish guls (thus reflecilng both composttion and gul fullness),
I # OI . . ... . . 0 # - · ...
I

Cancel Help

86
7.MDS

Zooming 3-d · It is also usually advisable to zoom the 3-d plot by at least one step: zoom operates in the same way
MOS plots for all graphics, namely click .~ ~ on the Tool Bar (or Zoom In on the right-click menu) and the
cursor changes to©. Clicking on the plot now zooms in one step; use the scroll bars to centralise
the points in the zoomed area (and use fa on the Tool Bar or Zoom Out to reverse the zoom, the
cursor now changing to 0 ).

,o;,;;;;; Dela labels & symbols 1mes Data labels & symbols' - -- - - - - - -
Symbols I Speci41. .. ,. Data labels & symbols
Size: I General... Special •••
FcrJ ... Gener<>!...
"' Pointer
'W§Zji~ Pointer
0Bv toctor I Zoom C>.Jt~, Zoom In
~[$j)_e_clc_s_ _ _~,"'~J ~ Rotate Data Zoom Out
I Rotate Axes j Rotate Data
'1 :: ~ Iner.!..:"'~a ..:·-<iJRl'l"'ii:ill:~~tt.;a
. . "'mtj -i
I FlpY

1S.ichomb.

OK I IA.oloinget.
IPJenynsi

Careel I
[S.bassen.
S1obusl.
•c
:s .vlleta x
Movo ·
Help GJ CD !

)(

'f!T

<

It is clear from rotating the 3-d plot that some fish species (e.g. the congeneric Si/Iago schomburg-
kii and S. bassensis) occupy largely different regions of the prey space represented by the MOS,
and others (e.g. Spratelloides vittata) have a greatly restricted diet in comparison. (For more formal
examination of such differences see the ANOSIM test in section 12 and MVDISP in section 13).
One of the replicates of S. schomburgkii is also seen to be suspiciously far away from all the other
points (replace the labels to see that it is sample B4). On inspection of the original datasheet
W Afd, sample B4 proves to have very little gut content. If it is required to remove this and repeat
the MDS without it, then a selection can be made from the original matrix (next section) and the
whole analysis repeated, but a quicker way in some circumstances might be to use the MOS subset
feature, using selection by rectangle, as on the next page.

Rectangular In a 2-d MDS, with the cursor as the standard pointer ~, a rectangle can be drawn around any
zoom for 2-d section of the plot (by clicking, holding and dragging with the mouse, in usual Windows fashion).
MOS plots This is the alternative way of zooming in on just a small section of the 2-d MDS space: having
drawn a box over the area, click on the zoom-in iconft> on the Tool Bar, and only that area will be
displayed, at greater magnification (though the scroll bars will allow you to scroll over other areas
at the same magnification). This was seen on page 64 for cluster plots but, unlike dendrogra ms, the
aspect ratio of the plotted points is not altered to match that of the rectangle - to do so wou ld be to
destroy the interpretation of the MDS! However, the display box itself does take the rectangle
shape and content, and this is a neat way of changing the surrounding box shape for a 2-d MOS.

87

..
-~----~~ ~~------~~------~~~~-------
7. MDS

~
~
~

Dle/3 of7 near3hore fish scec/e3 from WA


J!:::~,.:;c~~. ,.,. A
~
DI
2• ..... ;o ~t;;•:
"'- AaJltM
-
DI
• ,, ::acHm.b ~
u A.1ts19n
• •P JtP'fjll
"'
c,) 1'$
•Cl 02• • Ctmu1 .
.L. ~.tm1 n.

C3
01
• •
cd c,:
3
. r:;"lft!
AID
Al6
A
NS
M
A
::1r::iu
"'"
. ·~
c10130 .. Al•l"-1$
.a. .t.£4. •
fl

v
C'$ "-1 .
oF.
llh
• c
rn 0 1 c;.
• flJ ,..,
~

Ce eu ....
., .,
i'OJa~~Oll
e12
C~$0 A·~~I
II .. ...,
-
.. 'I'

- -- .)
..

MOS subset Zooming into the fine detail of a highly cluttered MDS plot, with rather too many points, by
plots successive clicking with the magnification cursor 0 , can sometimes be surprisingly informative.
However, it is not the only, or even the best, possibility. If attention is to be focussed on only a
subset of points, then it would be desirable for the 2-d ordination to display those points in the best
possible relationship to each other, matching their similarities in high-dimensional space. If it is no
longer necessary to display the relationships also with all the omitted points, then the optimal 2-d
configuration for just the subset of points will almost certainly change, and will be more accurate.
In other words, the MOS ordination should be repeated from scratch, on any selected subset of
samples. The same is true in the example shown above: if it is thought desirable and va lid to omit
the sparse sample B4, then the right-hand plot should be a new MOS solution, without the outlier.
This can be very simply achieved in v6, directly from the 2-d plot, using the rectangular selection
technique shown above, followed by Graph>MDS subset, which is also the CQJ icon on the Tool
Bat (and on the right-click menu). This automatically selects the surrounded points from the
resemblance matrix and displays the MOS dialog box (normally obtained by Analyse>MDS). So,
successive re-runs of MOS on smaller and smaller subsets of samples are possible, and can be
especially useful when a multivariate structure contains major outliers - points that have lower
similarity to all other samples than any of the other samples do amongst themselves. This leads to a
'collapsed' MOS plot, with the outlier on one side and all other samples at a single poi nt, and it is
usually necessary to strip out such outliers (or redefine similarity - see the discussion on page 44).

:• e ttt1 a" :.p•ci~~


-" Aogi1byi
• S.s chomb
- . -- . . - Atlong.n .
• P }tny nsii
MDS • S.basun.
::: S.rol>ust.
X $ V41r.3
GJ Mlmber of reslarts: 03
ps ±l ~

- KrU3kol fl scheme

..
84

~ : I__cance1
_~ Help

The MOS subset icon makes this easy, but there will also often be the need to re-analyse subsets
chosen in other ways, e.g. by levels of a factor, and the next section details selection techniques.

88
· ·~~·1

8. Selection "

8. Highlighting and selection·(Select) ~·.

'· Highlight .'. . There are many cases in which analyses of different subsets of the samples or species are required.
· and select This can be easily achieved, without the need to create large numbers of separate datasheets, by
temporarily selecting subsets from a single sheet, analysing them (and thus creating new branches
on the Explorer tree, with the results windows listing the selection used for any particular branch),
and then restoring the full data set. There are several different ways to select subsets, described
below, but it is important to keep in mind the distinction between hi ghlighting and selection. The
act of clicking on a row and/or column header highlights that row and/or column; it does not select
it. Once you are happy that you have highlighted the correct set of samples (and/or variables) you
can select them using the Select> Highlighted menu. Highlighting is just an intermediate stage, and
has functions other than selection (e.g. to identify samples in a datasheet that need individua l trans-
formation, whilst the rest of the matrix remains unchanged). Alternatively, highlighting can be
bypassed altogether and selection made by other direct choices from the Select menu.
Selection by The W Australian.fish diet workspace, C:\Examples v6\WAfdiet\WAwk.pwk, should st ill be open
highlighting from the previous page (section 7). If not available, recreate it, as on page 3 7. Standardisation of
the original datasheet W Afd produced a subsidiary datasheet Data2 of total gut fullness for each
sample (itself an average over 5 fish guts). It can be seen that dropping samples whose total gut
fullness is, say, less than 10% (since it could be argued that they contain little information and will
have large variability in similarity with other samples) would necessitate exclusion of samples A9,
B3 and B4. Thus, with the standardised datasheet Data! as the active window, highlight all
columns except these three. There are various options here which you should experiment with.
Clicking on a column label highlights that column - in light blue shading - and is a toggle action (a
second click turns off the highlighting). Clicking, holding and dragging the cursor across column
headers will highlight a sequence of samples, as will the usual Windows action of clicking on the
first, then holding down the Shift key when clicking on the last. (The Ctrl key has no effect; also
note that the toggling action is set so that intermediate columns which are already highlighted will
not be turned off if a wider sequence of columns ranging across them are highlighted in this way).

Diets of 7 nearshore fish species from WA


Biomass
I

""

.!:._
I
•.,

However, the easiest way of highlighting all except a few columns is to highlight all data, by
clicking in the blank cell at the top left of the sheet, then click on the A9, B3 and B4 labels to
dehighlight just those. (The top left cell is also a toggle note, so a second click is a convenient way
of clearing all highlights, though this can also be done by Edit>Clear Highlight) .

:.'

89
8. Selection

[ In the default Windows set-up, cells in the table have one of three backgrounds: white, light blue
or dark grey. Three colours are necessary because highlighting can also be by rows, or rows and
columns simultaneously. The rule is that the cells with the darkest background are those that are
highlighted. You will see this best by turning off all highlights then clicking on a random set of row
and column labels: the intersections are considered the highlighted part of the matrix. (Individual
cells in the table cannot be highlighted by clicking on them; it is not meaningful to be able to select,
say, only BIO Nematodes and B8 Isopods. Do not think of a data matrix as a conventional
spreadsheet: only a limited set of operations make sense for datasheets, and v6 is able to make
major speed and memory gains, compared with v5, by having its own specialised grid control.)
m
O.t1; $!1C
1 I
Invert~~
Cut Orl+X
I

·-

0
87.976
• 2.4602
. OJ 0
• •• > 0

When all except columns A9, B3 and B4 are highlighted, selection of these highlighted samples is
by the Select>Highlighted menu item. Alternatively, right click when over the data and a drop-
down menu will appear, containing a c~mbination of operations from the Edit and Select menus,
including Select highlighted. The matrix entries now have a different (turquoise) background
-
indicating that you are operating with a selection - a new datasheet window is not created.
~
Al --,

....... '··

'-------'-·+-- - ~ Calanold 1.0638


0' • Harpodocold 0
0' ~ Cyclopoid 0..,
2.857 <
·--
>
1
0 3 .1 34
_
-.J..---
_ .....___ __.
r-:.-
O t_,__·-~·--
~
0 .:::_
>

Duplicating Though there is less necessity in PRIMER v6 (compared with v5) to save selections or other
a selected derived sheets as *.pri (or *.xis, *. txt) files - the whole workspace being easily saveable now -
worksheet there will be occasions when a subset of the data is needed in a different workspace or for use with
different software. Saving the current form of the sheet, with File>Save Data As (page 20), saves
the full data not just the selection: this is a security feature designed to make it difficult to overwrite
the complete data set with a temporarily selected subset of it. To force a save of only the selection,
you must first duplicate the current sheet with Tools>Duplicate. This creates a new sheet (Data4,
see below), which throws away all but the selected data - and the factors/indicators etc relevant to
it - so saving the duplicate achieves the objective. Even where exporting is not the aim, there will
be occasions when the selected data is needed in a new window in the current workspace, starting a
new branch on the left of the Explorer tree. Tools>Duplicate does this automatically, so that
subsequent changes to factors etc (in Data4) are not back-propagated to the full data (in Data l ).

90
8. Selection

---·-- --· ... ·- ·-· - -- ~-·-· ····- - --- --- .


Aweoato •••
AverbQe •••
1' Ddla1 Check.:.
Diets of 7 nearshore fish species from WA M•1rjifR&q Duplicate
Biomass l'lssfio... I'S 'Copy worksheet
... MerQO •••
,D4t4 worlcsheet
AB A10
0
Data type: Bioma~~
Sample ~e lec tion: i-e, 10-1e,21-6e .,
Select Tools wr.dow Help Vari<lble ~eleci;ion: All
67.658 :
&1------t---:3:-:.s-::ss~
1 '! -•D ~ ~ : .§ LO. .X. ~ 8 ~ P P l:i4 ;€' :::: ~ >. • ~:• 'V
BWAWk 0•
r..;iF=.:.::_-:----'--- - - ' -
......__________-t 8 -{ E) WAid Slandardiso1 1 1'
I i3-tl Dota1
I 8 ~ OvcralTr
I 8 {) Oato3
6·1' R. s : : : : i 1 - - - - - - l - - - - - " - ' - ' - - --J....;.;.;;._ _ ...i.;_;.;..__ ___i;...;.;

13 ·~
(B-
Cil
L1' Oupielllo1
I '--0 Oata2
'bQtl\D
<1 IJ >. ~ :;_._
Row 1 Col 1

Deselecting Select>All (also on the drop-down menu as Select a ll) reverses the selection, to reinstate the ful l
datasheet - the highlights are retained so it is easy to change some of them (or even reverse them,
with Edit>Invert Highlight, see example on previous page) and reselect with Sclcct>Highlightecl.

Selecting by The highlighting route to selection can be bypassed altogether using the other opti ons on the Select
factor levels main menu, Select>Samples and Select>Variables. To select only those samples from the three
congeneric Sillago species, i.e. sample labels starting B, E and G, one can use the factors that have
already been set up to identify these different levels: S.schomb., S.bassen., S .vittata from the factor
species, or from species full name, or equally, 2, 5 and 7 from the numeric factor species#. (Don't
get confused between 'samples' and 'species' here! The 'species' variables are the different dietary
categories - though most are identified to a level higher than species - and the samples are the
repeat pools of fish guts, from the different fish species, the sampling mechanism be ing predation.)

- Da!a1

- ---------------
· : .. . .
-
Diets of 7 nearshore fish species from WA Oear HoNlQht I Add... I ~~ species hAI name species# :::.
Invert tiQh&oht ~-~ IA7 A .ogibyi Alhcr"1omorus ogilbyi 1
Biomass
ut I Combine... I ·Ae ~ogi!b~-~~~~u• ~lbyi .L _
Copy A10 A .ogibyi Alhcrino~lbyi 1

0 0.44248 .
62
Paste I Rename •.• I A11 A .ogibyi
- -
Alhcnnomorus og.llyi 1
-Alhetino~s
- _ogil)yi
0 0
Insert
Delete
I· Reorder .•. I· Al3
A12
A .~
~.ogibyi Al~erino'."°' us ogilbyi 1
1

0
- --------·
33.333
Move
Oelele... A 14 ~:~!.1 ___'.';~~e~!"o~_i:;_og~byl .1
2.2989 51.201 - - -54.897
- Sort A15 A.ogilbyi
- -Alhcrlnomorus
·--
og1!byi 1

0 01 0
Key ... A16 ~.ogi~ - ~hcnnomorus ogolbyi 1

4.023 o• 0 ~ Sil&gO ·~~gki . 2


__:.i
S.sehomb. Sillago :chomb<xgki 2
0 0 0
~·~; sc~~-~:2 ---
S.schomb. S.Sago :chomb<xgki 2
. '

With the transformed datasheet Datal as the active window, take Selcct>Samplcs>•Factor levels>
(Factor name: species)>Levels, giving a standard Selection window (see also pages 32 and 33),
with boxes listing levels to Include, and those Available but not included . Move across all items to
the Available list with then using EJ, 8
move back the desired levels: S.schomb., S.bassen.,
S.vittata to the Include list. This can be either singly, or all of them can be highlighted with Ctr!
clicks (a range would use Shift click), in the usual Windows manner, and then all taken across to
the Include box with 8.
91

·--:-=-.: - ~=- . . - -~-·--.,.~.;.._! ... .


8. Selection

s ,· s ---~.-· - ·- ··- r -=- - - ·- - - · · ---- -


c cct amp Scl!!t:lion fg)
0 Somple runbers Select levels for factor
Al Avalable
IIEmm..
~ed

GJ
A.oglbyl

n
0FDC1or1eve1s S.schomb.
A.elonget.
j Vorlobles... Factor name:
P .jenynsl
8 S.banen.
Srcbust.

q S.vttota

El
Avalable
A.c .

A.elenget.
P.jenynsl ~

\'°'·
l§l'tifyfi I r::l
I.~

El
0.44248
0
4.7499
01
63

0,
7.4074 1-
o.
,.

'~.
48.357 31.526 I 50 ' 0

OK Cancel - 0 0 0 0 .., "'.._ I

>
-.:
Multiple The effect is to make Datal a selection of all samples from these three Sillago spec ies. This is in
selections spite of the fact that, in the above example, Oatal started the operation as another selection -
excluding samples A9, B3 and B4. That prior selection has been ignored: it is important to real ise
that each new selection is a fresh operation, on the full array that is held in that datasheet. If,
instead, a compounding of the two selections is required, then that is easily achieved, in at least two
way_s. One option is to take the current selection (all B, E and G samples) and highlight everything
displayed, except B3 and B4, then Select>Highligh ted. (This is logically sound because all the .,_
omitted A, C, 0, F samples from the first selection are not highlighted at that point.) This would
retaih a single copy of the matrix Datal at an unchanged point in the Explorer tree. A more general
alternative, which may be more relevant if a complicated multiple sequence of selections is needed,
is simply to Tools>Duplicate the sheet after every selection, then do the next selection operation
on the new matrix. So, if the above selection of the Sil/ago samples had taken place on Oata4 and
not batal, the results would automatically have excluded B3 and B4. The outcome is the same, but
diff~rence is that Oata4 is now a separate sheet from Data 1, at a new starting point on the tree. A
third obvious possibility - with the same generality as the second approach, but with the original
outcome of the multiple selection being on a single worksheet (rather than a series of copies) - is to
create factor combinations (Factors>C ombine), which will allow simple selection of one (or more)
of the levels of this compound factor.
129 ~ To illustrate this, save and close down the WAwk workspace, and re-open the Tasmanian meio-
36~ faun~ directory, C:\Examples v6\Tasmania\ tawk, seen extensively in section 2 (pages 29 to 36) on
factdrs. There are only 16 samples, 39 nematode (tana.pri) and 17 copepod (tapa.pri) species; the
small scale helps for illustrative purposes (though in the real context would make most selections
quicker by simple highlighting). The study design has two factors: 'treatment' (disturbed 0, and
undisturbed U, sediment patches), and 'blocks' (4 areas of sand-flat, B 1 to B4). An example of a 2- .,_
~
factor selection for the nematode sheet tana would be to select distinct sand patches within each
treatment, say blocks 1 and 3 for 0, and blocks 2 and 4 for U. Use the TreatmentBlock combined
factor (if you don't already have this from page 32, repeat Factors>Combine and put Treatment
and Block in Include) . In Select>Samples>•Factor levels>(Factor name: TreatmentBlock)>Levels ,
leave 01, 03, U2, U4 in the Include list and take 02, 04, Ul, U3 across-10 Available.

Selecting by It may sometimes be easier to use the sample numbers, here Sclect>Samplcs>•Sample numbers>
number 1,2,5,6,11,12,15,16, though this is more likely to be useful where such numerical lists are output in \\:·
results (e.g. by the BEST routine, section 11), and can be copied and pasted into this dialog box .

92
8. Selection

if:
'
0 Factor levels
factor name:
fiiWi@--;-::-iiG;l I Levels••.~
0 No mismg values

OK Cancel

B10R1 B30R1 63002 ,..


0SMlf)lo~crs-· · .. ~~=;;.;.:;...:"---t---:::10 o 01 o-
10 - -q;99s1 s.32S l o
11-2, 5-6, 11-12, 15-16
----'--t-.==--=--:-_-~-+---------~-----~r-
r_-~:o%J--4-~~
ol o, 13.915 · 1.095'"1 .,
. · ,q- - . l >'--·

0 No missing valuas

.·l Cancel Help

Selecting Any of the options for selecting samples are also available for selecting variables, e.g. selecting by
variables variable numbers or by levels of an indicator (in effect, these are 'factors' on the variables rather
than samples, and operate in just the same way). Page 36 shows an example of copepod data (tapa)
in this workspace, selecting by the indicator Genus identified?. All taxa which are identified to
genus (with indicator value 1) are selected. This could have been done just as easily by Select>
Variables>• Variable numbers 1-12 or by highlighting the first 12 rows and Sclect>Highlighted.

Selecting by There are, however, two other selection methods under Select>Variables that are specific to
'most selecting species (or other taxon-type) variables, in which matrix entries are positive 'amounts' of
important' that species (counts, biomass, area cover etc). The idea is to be able to drop speci es which are
sufficiently rare or in small numbers that they are not a substantial component of the overall counts
(or biomass etc) in any sample. (Note, however, that removing rarer species in this way is not
required for most of the methods in PRIMER, based on Bray-Curtis simi larities for example, and
should be done only where there is good reason for it, e.g. use of a resemb lance coeffic ient wh ich
is sensitive to rare species - such as chi-squared distance or Gower). The option to Select>
Variables>(•Use those that contribute at least 5 %) applied to the copepod counts in tapa would
drop species which, for every sample, account for <5% of its total abundance, leaving only 7 of the
original 17 species in the selected sheet. Alternatively, the number of species to retain can be
specified, rather than the%, but the principle is the same. Taking Sclect>Variablcs>(•Use n-most
important where n is 7) generates the same set of species, naturally. If n is larger, say l 0, then in
··- order to be retained, the threshold percentage that a species must contribute somewhere will drop -
in fact a threshold of around 3% will leave 10 species. If n is smaller, say 5, then a higher
'}\ percentage cut-off is needed (10% will do it). The algorithm simply varies the cut-off percentage
until the matrix gets left with exactly the number of species n requested. This means of species
selection (rather than, say, by ranking their total abundance across all samples) is preferable
because it retains species which are important in impoverished sites, with low total abundance.
It is important to note that Select>Variables operates in combination with Sclect>Samples (unlike
": ' repeated Select>Samples or Select>Variables operations on their own). This is the behaviour that
-.. would be expected, e.g. if a sample selection is in operation then the ' most important' 10 species
... are determined only with regard to that selection, not using alt the samples .
)
"• 93

..., ... · ·- :· - .. _,::___...; ::a..-.... 4


8. Selection

ovarlablo ~ra

Al
HQhlQhted
Sa~es... 0 rdcalor levels •'
·~~Ll!· lr.dcatot rY.llOo!t
{~nu:- ~Jllfiiht1~_ _::J
B20R2

01 0
0 Use n-most mportant wtiere n Is 14 j 4
27 1 35
0 Use those the! contrllule et least

0 No mismg vwes a1 J

Cllncel \.. ~ .

Excluding A final option present in Select>Samples (and Select>Variables) is to retain only the subset of
missing data samples (or variables) that contain no missing values. The latter are indicated by M issing! in the ....
sheet, and are discussed in detail at the top of page 23. The option is only likely to apply for
'environmental' type data (physico-chemical variables, biomarkers etc), rather than assemblages,
where unobserved species are usually properly treated as a zero count rather than a Missing entry.
Most routines in PRIMER require complete matrices, either by excluding the row or colunm of a \...
missing entry or, in some circumstances (see section 10), by estimation of the missing cells.

Selection in When the active window is a resemblance matrix, selection can take place just as for a datasheet,
resemblance by Select> Highlighted or Select>Samples>(•Sample numbers) or (•Factor levels). Selecting by
matrices factors from a large similarity matrix, to· do subset ordinations, is commonly done. Another option
is added here: selection of only the rows and colunms containing at least one value above or below
a specified threshold. This is mainly used for picking out highly collinear environmental variables
from a large correlation matrix (correlations> 0.95 or <-0.95 say), see section 11 , but is shown
here by forming the Bray-Curtis matrix on 4th-root transformed· nematode abundances, from tana,
and Select>Samples>( • Values<40) to display the lowest similarities in the matrix. Then Select>
I,.
All (and Edit>Clear Highlight if you wish) to restore the full matrix and save the workspace ta wk.

Fie Edit .
~---~
D ~' An
fa tawk Hl9hliQhted
~-
13 ·b tan ' _'_
1 8 " Averege1
: -0 Data1
8 " Over eD Transl
E·•D Deta2
. 8 ~ Resem 81DR1 81DR2
I • ....~ Im IM-----r.;.;;...;.;.-~...;;;....___,_;;,;;,,;;,__.-,,-'
. ""----'--------I. -

,· 'D tepa ··:·. -. ·.


- D tepetx
- L) tepees
'- D tepe3c
',_
61.151
'-U tspsv4
0 Semple runbers

0 Fac1or levels

84DR1
0 Undefined vM.Jes

52.061
37 .971
37.694
I 36.179
1
45.732 1 75.183 ~
~~=;:l::::m;======::::i.=::::=::=:~liiiiil.a- - -

1~
( 100 ~)
Help
133 <-

94
9. Data tools

9. General data manipulation (Tools)

Tools v Edit Both the Edit and Tools main menus carry out ' housekeeping' manipulations on a dataset (or
menu resemblance and aggregation matrices) which are generally rather straightforward, and with an
obvious outcome, as opposed to the Analyse menu which contains the main core of the statistical
routi nes. The main difference between Edit and Tools is that items on the main body of the T ools
menu create a results window, and in nearly all cases also produce a derived sheet of the same type,
e.g. a new data sheet from a data sheet. (There are two miscellaneous items at the bottom of the
Tools menu, Stop Tasks and Options, which do not fit into these rules, but are there because this
is the conventional place for them in Windows applications). Items on the Edit menu , on the other
hand, never produce a results window and operate on the current sheet, chang ing its entries in some
way (sorting labels, inserting or deleting rows or coluIIUls, copying and pasting them, defining new
factors or indicators associated with the sheet, etc), and do not write the revised matrix to a new
window. Edit operations on data sheets themselves can therefore not be undone, except by re-
reading in the external data and starting the manipulations again. Tools operations can be repeated,
however, perhaps with different options, simply by going back to the previous data sheet - whid1 is
always left unchanged. Some Tools items apply when the active window is either a data,
resemblance or aggregation sheet, though with differences in detailed operation, whereas others are
specific to the window type.
64 ~ Open the gfwk workspace, saved on page 63, having been used-to demonstrate cluster analysis. If
not available, open the data file C:\Examples v6\Grdfish\gfa, of species counts from 277 samples in
9 sea areas of the NW European shelf (factor area), and also the aggregation file gfagg, defining the
Linnaean taxonomy of genera, families, orders and classes for the 93 ground fi sh species monitored.
If one doesn't exist, create a resemblance matrix (Reseml) in any way you like. Compare the
choices on the Tools menu when the active window is a data, resemblance or aggregation sheet.

-'!.efa .· - ~---· --·· r;:J~~ i!I~~


Groundfish NW European shelf Groundfish NW European shelf ~xonomy for ty_
'v!'_~ur9P_ean s.?~.!_f groL
/l.bundanr.e Oissimil,arity (0 to 100) Species Cet'<.ls FMliy

...,.. Rojo rodioto


Roja m~evus
Roj o
Raja
Rofidoe
Rejidse
S1 S2
. ·----·-·---·" -·--1 0 - · . S1 Rojo-~oto Rojo Ro1odoe
AllQl'eocte...
0 S2 4.1 026 R~j~ clovato Rojo R111idoe
AverOQe . ..
I;, SJ - - 11 .515 ---ns29· Rojo_m.icr~~cela R~jo Rojidoe

~
Check. ..
1.4~_r-=-S4:.........--ii'/:r.l
.. "!I
. _ j,368,- - ·12 Joa i Rojo br~ ch)'\.l'o Rojo Rojidoe
Dupllcote...
l'\ssinQ ...
ll·~S5:...__ _I Model Motrix .. . 1-· 6 .4865 : R"!_~ ~ogui R°!o Ro,'ldoe
- · 1 -~~2.2 ~.
Metoc... DJ
' o:v 1
S6
~~S7:----
' Check...
j 8 .0952 . ;:
Torpedo mouto Torpedo
T~~do ~E~do
TorpeC:oooe
Tocpedndse

J
Ronk v01lobles... Dissim...
Sun. .. f-> -' -
<• I Dupllcote... .- - - . , . - - - - -' Tr ee .. ~
Tr on.form (individuot)... I Rork... I Duphcote .. .
Tr4nspose...
1 .......- - · · ·-·-····--···· - ..-- ...- .._
I I-~~~arm ..:______ 1 _ 5~'.k: .. .. -·-
I Stop Tosl<.s l I Stop Tasks I Stop Tosks
L Options... I Options... Options...

Average and Tools>Average and Tools>Sum operate on data sheets only, in exactly the same way. For each
Sum for variable, they average (or total) across all samples with the same level of a spec ified factor, e.g.
samples creating site or time means (or sums). The resulting data sheet is another multivariate array which
can be input to much the same analysis routines (Resemblance, MDS etc) as the original matrix.
Averages are taken for the specified factor (not across it) . For example, for samples from a 2-way
)\' layout of s sites and t times, with one sample for each sitextime combination, Tools>Average fo r
the factor site gives time-averaged site means (and vice-versa). Any factors in the ori ginal matrix
which are not the subject of the averaging are carried across intact to the ne w sheet. Thus if a
further factor area had categorised the sites into groups, then the factor area would still be ava ilable
in the time-averaged matrix. (If the averaging mixes different levels of this third factor then
'Uudctined!' entries are given in the Edit>Factors listings). In the case of a sites by times study
with replicates at each sitextime combination, and a combined SiteTime factor (created using
Factors>Combine, page 32), averaging for the SiteTime factor is commonly used to create the st
cell meaus for each variable, to input to Analyse>Resem bla nce, MDS etc. An example of a
combined factor used for averaging replicates in a 2-way layout is shown on pages 32, 33. If the

95
9. Data tools

replication is balanced, with n samples for each sitextime combination, then Average and Sum will
lead to the same ordination because one matrix is simply n times the other and nearly all
resemblance measures are unaltered (in relative terms) by an overall scale change. If the replication
is unbalanced then it is unwise to use Sum, at least for Bray-Curtis similarity, because the outcome
will be sensitive to the different total abundances caused by differing n; Average is preferable.
For the groundfish abundances gfa, the 277 samples make for a cluttered MDS, with a rather high
2-d stress arising from the large number of points and the substantial 'noise' in individual samples
·-
(see below, for 4th-root transformed abundances and Bray-Curtis similarities). That there are
\
assemblage differences between areas is still clear (and appropriate to test by 1-way ANOSIM, see
section 12) but perhaps better summarised by MDS on the area means. On gfa, take Tools> '
Average>(Samples•Averages for factor: area) & (Variables•No averaging) and the outputs are a
results window Average! Uust specifying the factor averaged for) and a new data sheet, again with
·..
93 species but now 9 columns - the sample means for each area. Retitle the sheet with Edit>
Properties and rename it gfax with File>Rename Data. (Note that the factor sample#, which
previously just helq the label order 1,2,. . .,277, is now undefined). On gfax, with 4th-root transform
(Analyse>Pre-treatment) and Bray-Curtis (Analyse>Resemblance), the Analyse>MDS shows
the relationship among the mean assemblages. The latitudinal gradient is clear and the layout
mirrors the geographical location of the areas, Fig. 17.10 of the Methods manual. (The plot below
needed rotation and axis flips to achieve this; it is also zoomed in on a taller, thinner rectangular
boundary than the default, see pages 87, 88. Note that areas 5 and 6 are almost Go-incident.)

20 Suur:0.21 al'9a ,..


'\ .. ... 61 1l"..""·-
• 2 1•
;.• • 3
.J.4
liilillilli

• • "" ... ... +s


• >< 6
I 7
• ,..e 9
• •• ... 9
"
...
,.
9
8
.. • ..
+
7
3
2
• 2
• +s
., 6
4

..,
8

3
Check.1.
!Mllicote... Stlm\ll~ 1

l>
MlssinQ .. .
Mer9e.1.
0 No ovcrogng

, Ronk vllriobles. 0 Averoge• for facicr.


Sum...
Tronsform (indi
ereo • • • · ·
4
!.
6
*-
..,'
Tronspose...
Stop Toslcs
Options .. .

Groundfish NW European shelf - area averages


Abundance
·u:.

~
verage 9 8 7
eans for factor and indicator le 0.50499 01
I
I 1L.be1 ;;; .~ ••-·
"---~ 9 9 Lndeftled!
dta worksheet 8 B Lnde flnedl
!Name: ota . Roja clavoto 0 0.31034 7 7 Lndeftled!
Data type: Abundance
Sample ~elect ion : All
Rojo rnaooceloto O 0 3
- ---
3 Lnde fined!

Variable ~election: All ..,.._<_


1____ a•-----------t I Reorder ...
----~ ,5
j,2 2
5
Lndefrledl
l.Xldeflned!
\....
Pdrdllleters Delete ... 6 l.Xldeflnedl
Sample5 : Averaoe~ tor tactor •area' 4 .Lndc.;;;e;-· !
Var iable~ : No averaoino v Key ... 1 l.Xldefined!
(..

96
9. Data tools

Average and The Tools>Average and Tools>Sum menus also give the possibility of averaging (or totalling)
Sum for over variables for each sample, using the different levels of an indicator. Applicat ions of this arc
variables less common since variables can be on different measurement scales, so summing them makes no
sense. It could be useful when they are on a common scale, e.g. a Sum over species within each of
a number of trophic groups (the indicator). Similarly sensible might be a Sum for a trivial indicator
which takes the same level for all species, producing sample totals (total abundance, biomass etc,
or if the matrix is first reduced to pres/abs, total species in each sample). There are easier ways of
achieving this however, e.g. totals can be output from Pre-treatment>Standardisc (page 37), or as
•, indices of each sample under Analyse>DIVERSE (S and N measures, section 14). Summing over
both samples and variables at the same time is also logically possible but rare ly used, e.g. a sum of
all replicates over all species within each trophic group for each of a number of sites.
Aggregation Summing over species subsets for each sample is more commonly used when examining how
choice of taxonomic identification level affects a multivariate analysis. Species-level abundance is
best aggregated (up to genus, family, order level etc) not by defining indicators on the species -
though this could be done - but by exploiting the third type of input sheet, the 'aggregation matrix',
essentially a tree structure with fixed levels. Unlike data sheets and resemblance matrices its entries
do not need to be numeric. An example is gfagg(.agg), seen on page 36, which should be open in
the current workspace gfwk. It lists which (groundfish) species belong to which genera, families,
orders etc. On an active data sheet, Tools>Aggregate uses the aggregation matrix to pool the
species counts (or whatever) into coarser taxonomic groups so that genus-, family-level etc MDS
plots can be produced. Though gfa and gfagg have identical species lists, in general there need not
be the same number of species in the data and aggregation sheets (nor need they be in the same
order). Aggregation files can be comprehensive faunal lists, only a few of which are 'looked up' for
any one data set, but it is important that species names are consistently spelt (including spaces). If a
species name is not found, a warning message is displayed, the results window lists which names
were not matched, and these species are retained as separate variables in the higher-level matrix.
With the averaged data matrix gfax (93 species by 9 areas) as the active window, take Tools>
Aggregate>(Aggregation worksheet: gfagg) & (From level: Species) & (To level: Genus), and
rename the resulting sheet appropriately. Similarly, create data pooled to family and order levels,
and produce MDS plots for all three (same transform, same similarity), to match the earlier species-
level plot. Try misspelling a species name (e.g. Raja neavus) to observe the consequences.

-- I Ed<
I
Groundfish NW European shelf - area avereg Lobel title:
Ctrl+X
Abundence Ctrl+c
'1
L ____________
..I
_.,
Posto Ctrl+V
Insert
-
Lobel
I I
:I__
Delete lrnpor1... Rojo rodloto
Ro)<! rodie!o 0.50499 0 0
l /1Qi!--"-- - - - - t - - - - - '- - - - · · - - -
Ro)<! r>eOVU$ 0 0 0 Move Rajon~s
Ro)<! o1e 0 0 0.21053 Solt ~I ~Raj!__u~duloto -
Rojo clavo1o 0 0.31034 I 2.2532 Properties ••• ~ Rojo ctavoto
-- - - ---
· .:.:ooc:.::e:.::•"1::.:•'-1 - - - - o
___ _ .-; o _ _ _o4
1~.:..cR•""io-"...a:.::
~ Ro)<! bfochyuro 0 3.4483E-2 1.1842
Lbb-!ls / · · ··
Factors •.•
'· • Samples...

i~·!'i·-'-Roia
· -"mcnt~o~ou"·--- - 1- - - - 0 ·._2.44B3E-2 I
"" 121 os lnOcators.••
Torpedo nwmoroto 0 0
Torpedo nolliiona 0 O• O
Squolus ocorthios 0 0 0 .10526 Awegollon worksheet:
Scylortir-..s corlcW 7 .4468E-2 I
Scylorhlnus s!elorfs 0
0.10345
0
2.9189
0 ::!_
[otogg ----~]
< ,>1

Merge•••
Ronk verleblcs ...
Groundfish NW European shelf - area average SUn•••
Tr11nsform (indvtdu
Abundance Some labels were ...-vn&ched

9 Cancel
Ro)<! 0.50499
• Ro)<! neovus 0 To level: CenU!I
•· Torpedo 0 0 0
' Squom 0 0 0.10526 Unmatched label3 :
Scylortwlus 1=-.4-461lE
--- · 2-+-- -
0 .l-034
_ 5_ ,,
2.9189 .., 2 Raja neavu"
<. . . '~ >I Proport.ion ot unmat.ched label" : 0.01 v

97
9. Data tools

.:•o... 11:am

.
9
.
2
2
3'
~ 7
. . .
.
e 0
2 9

. 3 .
6 3

.
4

.
0 4

< " > 4 <·' ., 1 > ..


4~

.
e .
0

<1 rJ I ') J ( I ·~ >

Check on Use the open aggre'gation file, gfagg, to show the more limited set of Tools items available when
aggregation the active window is an aggregation matrix. Tools>Dupllcate has been seen previously for graphs
files and worksheets (pages 82, 90). Here it has the same effect, taking a copy of the gfagg window,
called Aggl, to the head of a fresh branch in the Explorer tree. Insert errors in this copy to be
picked up by Tools>Check: a) overwrite Raja clavata in the Species column with Raja radiata (by
jusr double clicking in the cell and typing in the incorrect name); b) remove the famil y name '·
Rajidae from row 2 (again double click in the cell to highlight the content and delete it); c) change
the first Torpedo entry in the Genus column (row 8) to Raja; d) blank the Species and Genus entries
for the Squalidae (row 10) family; and e) misspell the family name Scyliorhinidae in row 11. (You
can find row and column numbers in the table because the bottom right status bar of the PRIMER
desktop always displays the current cursor position.) Then Tools>Check will find three types of
error: the duplicate species name (row 4), the missing values (blanks) in rows 2 and I 0, and three
entries which are inconsistent. Row 2 column 2 (not column 3 note) is the first inconsistency
because Raja bas been established by row l to be a genus name in the Rajidae family, thus cannot
(row 2) also be a valid genus name in a different family (the 'blank' family) . Similarly, Torpedo
marmorata (row 8) cannot be in the Raja genus but have a Torpedinidae family, because a previous
line has established that Raja is a name in the Rajidae family. The final inconsistency is not flagged
in row 11 (where the misspelling is) but row 12, because it is only then that the contradiction is
exposed that Scy/iorhinus is in the Scyliorhinidae family when it has previously had a family entry
of Sillyorinidae. Detective work is therefore sometimes needed to establish the earlier cause of an
inconsistency which is only spotted on a later line (where it may not be an error). Note that the
blank entries in row 10 are not classed as inconsistent (we may genuinely never be ab le to identify
Squalidae to finer than family level), but blank entries are best avoided altogether since they cause
great confusion. Fill them with non-blank names from the immediate right or left, depending on the
l
context (usually fill right to left). Here, fill the Species and Genus entries with Squalidae; the
routine does not object to the same name being used in different taxonomic levels.

Duplicdte Specie.s
c
Specie"
Species Genus FW!y Tree...
4 Raja radiata ~·
Roja radieda jRaja IRajldae Duplicate ...
Roja naevus
l-l- 'Raja
' - - - - - - 1-'--- - - ' -- - +R-AJF
_ OR_ME
_§,ii•@i [$!# Number ot dup licate "pecic": 1
t-i~-----'-'---~-'----+ RAJIF
-- ORME
--l
Raja
unduleda Roja >Reldae
St Tasks
Roja RAJIFORME jRlljidae op Mi:u i ng V4l ues
---1-'-----l-'----
Rajo
+R-AJF
- ORME
--1 Options,,, Row Column
R8'doe
Roja Raldae 2 3
10 1
Raja R~ RAJIFORMES
10 2
!etpedo marmoreda IRoJ~ lletp~ TORPEDNFO!J
T0<pedo noblllana Tetpe o T~ae TORPEONFOR Number ot mi,,,,ing valu e~: 3
0 inconslstert velues
Scylotm.ts canlc~ScylortnJs sayomidae CARCHARNFO 0 Oupka!e species
Inc onsistent Vdlues
Scytortnis slele<ls !ScylortnJs Scylort.Hdae CARO<ARN'O- Row Column
( I -- - .• - r ~ -- -- •. "'"-;· 1 2 2
Cencel B 2
~--111 12 2

Number ot inco n~i,,tc nt value": 3

98
9. Data tools
"
. Tree menu The other Tools menu item for aggregation sheets is distinctive to·this case, namely Tools>Tree; it
simply displays the hierarchical structure of an aggregation file in the same way as the Explorer
tree, in a left-hand panel. Successive clicking on the 1±1 icons unroll the taxonomic structure, and it
can be rolled back with ~ . No operations can be performed on the display in this state.

~I

---
Duplcato..
Check. ..
Stop Tasl<s
Options •• •
8 Taxonomy for fWVEllopean shelf ~C<.11dflsll
S - CHC»DRICHTHYES
I
I
I
I

I
S
RAJFOR1i£S
I ~·· Rejldoo
I ' lim
;i TO~ORHES
iiJ.. Torpednldoc
Gl-SQUALFORMES
Ea CARCHARl'IFORMES
Ro)li rlldolo
Rojo noovus
Roja o..ncl\Aoto
Roja elevate
Ra)• rricrooceloto
Raja brochyur a
Roja mortagul

iil OSTEICHTHYES

OK Cancel Help

Check on When the active window is a datasheet, Tools>Check can check for the following: a) ./Missing
datasheets & values, identified in the sheet by "l'vfissing!', and which might have been read in as blank cells in an
resemblances Excel worksheet for example; b) ./Negative values, which are not appropriate for abundance-type
data analysed by Bray-Curtis, though common for environmental variables (especially nonnalised)
input to Euclidean distance; c) ./Duplicate sample (and/or) variable labels, which arc tolerated for
some analyses (warnings are usually given) but are best avoided . wherever possible; d) ./All zero
samples (and/or) variables; and e) ./Estimated values, displayed in red type in the matrix. The
latter come from applying Tools>Missing, see page 118, to environmen\al variables (or other
normally distributed data) containing Missing! cells, which otherwise would not be tolerated by
analysis routines requiring complete data. All or any combination of the 7 boxes can be ticked.
Whether it is important to check for a particular attribute depends on the analysis. For example,
species (variables) which are zero over all samples will be ignored when Bray-Curtis similarity is
computed between samples, and can safely be left in the matrix, but all-zero samples are potentially
more of a problem since Bray-Curtis similarity between two blank samples is set to 'Undd'incd ! '.
Dependent on the context, these samples might best be omitted, or a different similarity used (e. g.
zero-adjusted Bray-Curtis, page 44), or the entry left as ' Undctincd!', i.e. treated as unknown.

in :ic:l eccion

No dup l 1cace variable:i in :ie lecc1on


"";'.
0 Negative values I'No missing values in selecc1on
0 Dupicoto somplos 0 Dupllcato vorioblos IiNo negacive val ues i n seleccion
'~... 0 : : : . .:
1i:::-;i- - - - - 1- - MerQ<!•••
0 Al zero sarnples jNo escimaced values i n seleccion
l~t-'-----.,.--' Rork vanobles...
1No all zero ""-~ples in selec cion
----------; Sum...
TrMslorm (ndvtdu
Cancel
TrMspose ...

_:~:~ _____] Gaidrop~aru~


29 Raniceps raninus
medicerraneus

Undefined When the active window is a resemblance sheet, Tools>Chcck looks for only three data attri butes:
resemblances a) ./Undefined values, arising as suggested above; b) v"Out of bounds values, for distance coeff-
icients (or transformations) that return very large or small values (e.g. displayed as entries of
Infinity or -Infinity); and c) v"Duplicate labels, as above. Blanking a cell in a resemblance matrix
sets it to Unde fin ed! status, and many of the core routines using resemblance matrices (e.g. MDS,
Cluster, ANOSIM) are carefully written in v6 to tolerate a few Unddint!d! entries, treating them as
unknown. (You can see that knowing the similarities S12, S,3 , S14, S23, S24 might enable you to place
4 samples in relation to each other without knowing similarity S34). Blarlking out Infinite values, to
Undefined!. is one possibility therefore, but others may be equally good or better (replacing by a
large, but finite value, modifying the coefficient or transformation which generated them etc).

99
9. Data tools

Duplicating Copy any of the Bray-Curtis matrices from averaged groundfish data to a new resemblance sheet, \ .
resemblances using Tools>Duplicate. Again note that this starts a fresh branch in the Explorer tree, so changes
introduced to demonstrate Check features will not impact on the original data. (Only Duplicate on
graphs leaves the copied window next to its original in the tree, and linked to the prior steps, since ·,
graphs are always an endpoint rather than the start of a fresh analysis).
Blank a few randomly chosen values in the copied matrix (double-click on the cell contents and hit '-
the delete key). Running Tools>Check picks up those now llndcfincd! entries, but MDS will
accept the matrix in this form and still produce a plot which is rather similar to the original. Save
and close gfwk for later use.

Groundfish NW European shelf - area averages


Species level data wilh 4 random similant deletions
~OU of bol.nds values 1'3Mf0fm: out\h l'OCI

.
Reumbl:l1nct : Sl7 ~ Cools sim
9 20 Slru1:0.0S
"l
Transform...
~
Groundfish NW European j Stop Tasks
Similarily(O lo 100) __:Optlons~~
· .._:_....=:r-nl.2
L. .• 8 2
~
I

.
Unde~ined Value.!

L~----t----~--~-~
9 B 7 "' Rov Column
3 1
3
"'
.; s .....
76.SOt
- ---
1.bJeffll!dl69.799
7
9
2
3
6
e;
7
. ..,
66 72.699 I 70.78 "!"

...r------1--~_.7_63_ ~
!1 _ _ _1.~_s_e._
os_
s,v NWl'lbcr ot undctincd value~: 4 .::_
167 +-- < ~! .
"'<

M ingrg(~jofh) The Tools>Merge menu allows a range of merge operations on two rectangular data sheets,
b~tions subsuming the more restricted Join options in PRIMER v5, though the latter can still be simply
obtained. For example, two matrices whose rows (say) are of different variable sets (fauna! and
algal species perhaps) but from the same sample labels, are automatically joined end-to-end by
Toofs>Merge, with the upper half as the active sheet and the lower half supplied in the 'Second
worksheet: ' box in the Merge dialog. Similarly, two sheets with the same variable labels (species as
rows say) but different sample labels (e.g. the same set of study sites in different years), will be
placed side-by-side. In PRIMERS any two matrices whose sizes were conformable were stitched
together side-by-side or end-to-end, irrespective of whether their label sets matched. In PR1MER6,
by contrast, label matching is taken very seriously, so that two data sheets with different faunal sets
could. have the same sample labels in a different order and they would be correctly merged, the
sample order being re-arranged appropriately before the arrays are joined.
94-> The nematode and copepod datasheets from 16 samples at a Tasmanian sand-flat (C:\Examples v6\
Tasmania) were last encountered in workspace tawk on page 94, under Selection, and in an old data
fonpat (v4) on page 28. However, it is less confusing when returning to this data in a later example
(2-way ANOSIM, section 12) to do the current exercise in a new workspace that you can delete
afterwards. From the Tasmania directory File>Open tana and tapa (*.pri v6 format). With tana as
the active window, run Tools>Merge>(Second worksheet: tapa) & (Samples•Merge(strict names))
& (Variables•Merge (strict names)) & (New cells•Zero) & (Combined cells•Error), i.e. all the
default options. (The latter two options of new cells or combined cells do not come into play here,
but are discussed later.) The resulting merged datasheet Datal now bas 56 rows (the 39 nematode
species then 17 copepods). The results window Mergel shows that all the samples matched (and
the species did not) in the way expected. The title for the new sheet Datal is taken from the first
(active) window, so to avoid confusion should be changed using Edit>Properties.
Now, re-order the columns in tapa by Edit>Sort>Columns>(•By labels), which sorts the samples
in a different (alphabetic) order for the copepod matrix: BlDRl , B 1DR2, Bl UR I, BI UR2, ... than
the nematode sample order: BlDRl, BlDR2, B2DR1, B2DR2, .... Nevertheless, a re-run of
Tools>Merge with tana as the active window, and with exactly the same options (•Merge etc), will
result in a merged datasheet Data2 which is identical to Datal (with an identical results window),
the ordering of samples having been taken from the first (active) window.

100
9. Data tools

~-~------· - - · - - - - •
''T'l~~.
~ '" ,..,;.~~·'"
11-o,.... '(,< •
. -""""' ~~'~""·
,.,,'',. ~;·.'.'»:1.t:f;.".t· ·~ '~""'
,r~ •r,..Ff 'f--r;;:;~
)~ •';ii •~
SORT ... a <'·I>'
~· • ,..., - ":';r • "l '"• ... •· i a t!!J Cal
f'.lt - _,. 11; I '

Cle.v Hc;tl!Qht Tasmanian copepods


Invert Hiohioht 0Bv labels
Abundance
(I.( Ctrl+X 0 By fodorkldc«or
:._.~·~:• {~ 'ii.I
Copy Ctrl+C
Pasto [~·:~-=--=-:..-~··_:;·· B1UR2 I~ '
l
I Insert
Ctrl+V
Amcro' sp
. ,Apodops~s sp
43 1
O!
6J

124
o -·
105 - ·
·o
Ooleto

~
Move Cancel • Edhosoma SP -- DI 0 0 ----2 ...
••5o1t . l < Me;,.:--
. - _-,--··-:.-.-· ,;.:, . Bl
·1-
, , ..,. •

-;·~ .............., . 1i'Ct '1.~ l!!!J


iifllf1Tfr1!P..~
- C-- ... '·· .... ..i.. '.

:: .. Properties ...
lobels
Second worksheet:
~ =Q
I~ vj
Nowct l:
Aver~o ...
Check. ..
~ ... 0 Zero
0 Mergo (S1nct names) 0 Mi•sng
-4 ~oto .. .
f'Csshg .. . 0 Join (rename duplcll!es) O error
B10R2 B20R1 B20R2
• Spilophc(ela SP o: 0_1 Oj 0, Rri vatla$1es .•. 'Jorlabtc.s Com:.:i;nctJ C'!!l$

Symplocostoma sp ol oi or-·-(i1 s..n ... 0 Merge (s1rict nome:) 0 Error


= Vlscos!u p ol 1.285 . 01 _,
O svmmed
.~ <
~l~li:j
S<!llU>lo
Merged labels: 16
,unmacched primary labels: O
Unmatched secondary labels : O ""' '""' ' l.olll•••n.ir. I.I"'"'" ' '-'•'·"'""' "-
• Symplocosioma sp 0 0 · 0 0
v .u:.iablo 111 ~ • Visco.U. SP O 1 28S. - - O O
I ----· ·~· - - - .. · - - · ·~ -~-------
Herqed labels: 0 - Amera sp 43 63 4
Unmacched primary labels: 39 ~ Apodopsylus SP -- - 0 .. ·- -- 0 • 0 0 v
Unmatched secondary labels : 17 v I• ..... - ... -. - -- ...... · .... ·· .. ·--· ·

Combined Occasionally, use of strict label names does not give the this desired outcome, and the default
cells in behaviour can be changed to force PRIMER6 to consid~r an identical label, but in a di fferent
Merge matrix, to be treated as a different name. For example, this might be needed when spec ies na mes
have not been provided for either set, and the variable labels are just the numbers I , 2, 3, ....
Species 1 in the first set is not to be taken as the same variable as species 1 in the second set, and
the default options in Merge will cause difficulty in this case. Equally possible is the opposite case
where the species names match in the two matrices, but the same sample labels a rc repeated.
·I though should not be equated. Samples collected in year 1 might be labelled by their site
7' identification. A second matrix of data from those sites in year 2 might use exactly the same set of
sample labels, i.e. without reference to the year. This causes no confusion if the matrices are to be
" analysed separately, but a Merge under the default of strict name-matching would place the two
matrices on top of each other (because they have exactly the same row and column labels!). The
two options given in such a case are (Combined cells•Summed) or (Combined cells• Error). The
first literally adds the two matrices, element by element. Very often though, this is not the desired
i·-~
behaviour, so the default is the second option: if a Merge instruction results in an attempt to
combine two cells, an error results.
Into the same workspace, now take File>Open on the data files tanav4 and tapav4 (in *.pm!
format, from the old DOS-based PRIMER4), which should be read in as Type•Species-sample.
These are the same nematode and copepod matrices as tana and tapa except that PRIMER v4 held
species lists as separate files so both tanav4 and tapav4 have variables numbered just 1, 2, 3, ... ,
though the species are different in the two matrices. A Tools> Merge on them, using the default of
Variables•Merge (strict names) will potentially give combined cells. Try this with both Combined
cells•Error in place, to note the error message and the fact that execution then stops. Then repeat
with Combined cells•Surnmed, so those cells with the same species and sample numbers are
simply added together. This may occasionally be a useful option, e.g. it would a llow for easy
collation of data for the same samples by several different observers (though it must be debatable
whether such a piecemeal approach to data matrix construction - losing information on potential
observer differences - is often desirable!)

IOI
9. Data tools

M!!rgc PRIMER ., , ' : ,. ~


AQorCQ!lte...
Averaoe .. .
Check. ..
Second wcrtsheet ill Combined ~ occur ed Md user spectled error
Oupkate .••
f>fssinQ ,.,
SNT\PleS

Ranlt v cs .. . 0Merge (strict names) Qt.41$silg


Sun•••
Transform fndvtdulll),,.
0 Join (rename duplcetes) O error
Tronspose,., Vori6bles
Tasmanian nematodes I Stop Tesks
• 0 Merge (strict names)
Combnod eel s

0Error~
Abundance · Options ...
0 Join (rename duplicates) OSc.mnC<I
B1DR1
:. 1

·..:.

B10R1
.. 1 4 -
0Merge cstrici names) . 2 12
. 3
0 Join (rename duplcale 0 ~. "
< ~ .
VMeblcs
B1DR1 B1DR2 B20R1
43 631
0 Merge (strict nemes) O error
0 0 0 0 Join (rename duplcales) 0Sumied
0 0 0 ~ ,. .
OK Cancel I Help

·i..
Avoiding The best policy to avoid confusion is to use precise, unique species and sample labels (typically,
strict label the sample label would be a conglomeration of all the different study design factors and a replicate
matching number). However, conflicting criteria can sometimes arise, e.g. when the pattern of sites from year
1 is to be compared with the pattern in·year 2, using the RELATE test (section 13) on the two
separale similarity matrices, identical sample (site) labels are ideally needed in both arrays, so they
can be matched. But, as just pointed out, a Merge of the two data sheets underly ing these
similarities (so that both year 1 and 2 sites can be seen on the same MDS say) requires the sample
labels to be different. Thus, PRJMER6 is not dogmatic about label matching: several routines, '.
including Merge (and RELATE) are able to 'fudge' the matching and provide a natural
alternative, where this is likely to be convenient.
In Merge, this is by the •Join (rename duplicates) option, used either for Samples or Variables (or
conceivably both, to create a block diagonal matrix, though this possibility is rather pathological!).
For tahav4 and tapav4 to be correctly placed one under the other, even though they share species
labels 1 take Tools>Merge>(Variables•Join (rename duplicates)) with defaults for the other options
(i. e. Samples•Merge (strict names), and there should be no new or combined cells to worry about).
The cdpepod species are relabelled I (2), 2(2), 3(2), ... to distinguish them from nematodes 1, 2, ... .

61002 B2DR1
Oj 0 0
0 1.285 0
~=-----1---~43~· 63 . -
Voriobles
2(2)
- - -0--- 0 0 v

0 Merge (strict nor11C$) < ..


..=.:-----1----..1.-~
I
- ·-
>
B1DR1 B1DR2 B20R1 0 Join (rename cklpleales) 0Sunmed
.. 1 43 63 4 ~ -
2 0 0 0
OK Cancel Help
3 0 01 0
>

102
9. Data tools

Merging Perhaps the greatest benefit of the strict label matching in PRIMER v6 is the ability to Merge
non-uniform assemblage data when two sets of samples, taken at different times or places, are not recorded on a
species lists common data sheet, with predetennined taxonomic categories. Species names (or other operational
taxonomic units) must be consistently spelt (even to spaces) in the separate lists, so that exact
matching of variable names can take place. But there is then no necessity that the two sheets hold
the same set of species, in the same order. Typically, lists will be of different length, with some
species in each list not appearing in the other. Using •Merge (strict names) copes a11to111atical/y
with this, filling any spaces created in the merged array either with (New cells•Zero), relevant for
assemblage-type data, or with (New cells•Missing), more appropriate for environmental variab les.
A third option (New cells•Error) stops the procedure with an error message if any new cells are
created. This can be a useful safeguard if the intention was to join two data sheets with exactly the
same set of variables - an error alerts you to the fact that there may be variable names misspelt.
(Phuket data Live cover of a coral reef assemblage was recorded from 'plotless line-samples' (of 1Om length)
on coral perpendicular to, and at lOm spacing along three onshore-offshore transects (A-C) at Cape Panwa,
transects) Phuket, Thailand. Samples taken in the early years (1983, 86, 87 and 88) are described in Clarke
KR, Warwick RM & Brown BE 1993, Mar Ecol Prag Ser 102: 153-160, and in later years ( 1991 to
·~ 2000, omitting 96) in Brown BE, Clarke KR & Warwick RM 2002, Mar Biol 141 : 2 1-29. The early
years straddle sedimentation impacts from dredging operations for a new deep-water port ( l 98617),
and the later years include an prolonged Indian Ocean high pressure event with desiccation impacts
from lowered sea levels (1998). Data are in directory C:\Examples v6\Phuket, and this example
uses only transect. A, with 12 line samples taken along its length each year, the early and later years
being in files kpcAl (37 species) and kpcA2 (43 species). Open the t':VO year sets in a new
workspace, and note their different but overlapping species lists. Merge them using strict names
and with zeros in new cells, rename the window, amend the title and sort tne (now 53) species in
the merged sheet alphabetically (Edit>Sort>Rows>•By labels). Save the workspace as kpAwk.

91A4
3 41 1
7 11 0
0 0 0
0
0
. ---- 0
---~
0
0

Aweo&e ... Second WOC'ksheet:


Average ... · - -G
Fie Edl Select Sample ,., Check. ..
D ~ !iii i" 11er9ed lal:>elB: O
1
Ovpicllte...
~ero
Unmacched primary lal:>elB: 48 MssinQ •••
;.!j W0<kspece Unmacched Becondary lal:>elB: 108
0 Merge (s1rld names) C~iss1ng
e {) kpcA1 0Join (rename duplicates) 0 Error
\ ~ Mergel V<!.dable
I_ D kpcA2 111er9ed lal:>elB: 27 V11rlable: Combined c~ n s
'- D~ Unmacched pr 1.tnary labels: 10
IUnmacched Becondar labels: 0 Error

Help

..
88A9
'•I

88A10
~ . .
:

88A11
~

1>,
144 (-

103
9. Data tools

81~ Open the workspace clwk from the Clyde sludge dumpground survey (first met on page 80), in
C:\Examples v6\Clydemac. If not already open, read in the environmental file elev of 11 variables
from transect sites Sl to S12 (across the dump centre at S6) which will be used now and through
the next section.

Transposing As is conventional in 'classical' multivariate statistics, this is a samplesxvariables array, rather than
the datasheet the variablesxsamples matrices mainly seen so far (the convention in ecology). The difference has
arisen because biologists typically have p (species)> n (samples), whereas classic normality-based ,_
multivariate methods require n > p, and it is generally neater to put the larger set of labels into rows
(and this also suits lengthy species names). It does not matter in PRIMER which way round the
matrices are held, the important specification being which axis holds the samples (rows or
columns?). That is changed by (Samples as•Columns) or (Samples as•Rows) on Edit>Properties
and not by transposing the array (so that columns turn to rows and rows to columns). However, a
Tools>Transpose operation may sometimes be helpful in displaying a sheet in PRJMER or, more
likely, before saving the data to an external file, when another software application needs a
particular orientation. One example might be when a variablesxsamples array has been constructed
in PRIMER with >255 samples (columns) but <255 variables (rows). This cannot be exported to a
current version of Excel because of its 255 column limit, but can be saved in Excel *.xls format if it
is first transposed in PRIMER to a samplesxvariables array.
Take Tools>Transpose on elev and note that the SamplesNariables designation also switches. ·

t!ihil ~
Clyde ,\
Environmental ...
''-
Cu Mn Co ,...<
S3
26 2470
~ate .. . 30 7•
30
- - -·- - 1170
Im 2470 11 70 394
- 3• 9
12 MsstlQ .. .
37, 394
Co 14 15 12
- 12 ·-
74 349
34 32 36
--
41
115 317
·:.
___ _____
344
194 ..
127
221
257
246
Cd
----···
160
0
70
156
....L
0.2
59
182
0.2
81
227
0.5
97

36 194
53
- - - - - 15215
--~ -
77 113 ...
Oep 144 140 106
30· 326
24 439 'l.C 3 3 2.9 3.7 "-,
%N 0.53 0.46 0.36 0 .46

u
22 801
<I I• 't .
\, ...
~

Transform Transformation of a whole datasheet prior to calculation of resemblance measures has already been
(individual) discussed under the Analyse>Pre-treatment menu in section 3. However, particularly for envir-
onmental data, it may be desirable to transform only some variables, or to use diffe rent transform-
ations for different variables, e.g. in order that the optimal conditions for calculating normalised
Euclidean distances (roughly speaking, normally distributed variables) are obtained. This rationale
is discussed in Chapter 11 of the Methods manual, and the practical aids to making transformation
choices (e.g. Analyse>Draftsma n Plot) are covered in the next section. Transformati on of only
part of a matrix uses highlighting (not selection) to identify those variables needing the same
transform, and Tools>Transform (individual) will produce a new matrix containing all variables
but with highlighted ones transformed (and automatically renamed, if requested). Further variables,
needing a different transform, can then be highlighted and transformed. The process may take
several such steps but is easily accomplished because the variables that are to be left untransformed
at each stage are never dropped (as they would be if selection rather than highlighting were used).
Note that if nothing in the matrix is highlighted, then the transform is applied to all entries.

Transform The transform operation itself can be any of the previous pre-treatment options: square root, fourth
expressions root, log, reduction to pres/abs, using the Expressions: Sqr(V), V" 0.25 (= Sqr(sqr(V)), log(l+V),
PA(V) respectively, in which V (value) stands for any highlighted data entry (note that upper or
lower case is not important in the expressions). But it is not limited to these: many other transforms
can be constructed. lo fact any expression using the Basic language syntax is permitted, involving

104
9. Data tools

operators:+,-,* (times), I (divide),/\ (power); functions : Sqr, Log (to base e) etc as above, and Abs
(absolute value), Atn (arctan), Exp (exponential), Int (integer part of a number) and many others;
and even logical operators: =, <, >, <=,>=,which return -1 if true, 0 if false . (An example of the
latter might be to draw attention to cells with large counts using an expression like V> 1000). For a
comprehensive list of expression options take Tools>Transform (individual)> Help and click on
Transform expression. Operations can extend still further, to generate new entries as combinations
of columns (and even factors/indicators or other worksheets!), with scope for great sophistication.
With the transposed matrix Data2 of elev (or elev itself) as the active window, highl ight all 8
sediment concentration variables of heavy metals, and take menu Tools>Transform (individu al)>
Expression: log(O. l +v). There are zeros in the matrix so a small constant has to be added so that the
logarithm is defined: it is fairly conventional to use the smallest non-zero entry for the constant. On
OK, the resulting Data3 sheet is seen to have transformed just the first 8 variables. Now, working
from Data3, highlight the %C and %N variables, and transform again, this time with Expression:
Sqr(v). This could be simply typed in, as with the log expression, or you can se lec t the function
from the Pick box: (Type•Function) & (Item: SQR(.))>Pick. The action of the Pick button is to
place the selected' function around the ·entry already in the Expression box Uust V in this case),
giving SQR(V) (case not being important remember). If you have not taken ~he automatic opti on to
v"'Rename variables, then use Edit>Labels to change the labels to reflect the transform, and save
the workspace (clwk) so that Data4 is ready for environmental analysis in the next section. The
rationa_le for these transform choices, and for not transforming water depth at all, is given there .

..•,

S3
Average... 3 4045 36136
Check. .. 7.()648 59766
°'4>ic•to .. . 2.7147 2 4932
0Renome vorlebles 01 lriqve)
l'iSSlllQ .. .
3.4689 3 6402
MerQC ... 5 0758 5.0505 5 2046
Renk v.vlbblcs ...
-2.3026 -1.204 ·1 204
Sum ... Tir. lem:
4.2499 4,0792 4 3957
Gcelvlllue
3.9722 2.7147 4 3451
Transpose... O F<.ncilon 144 152 140
Stop Tosl<s Ose~ 3 2.9
Options . ..
O voriable
>
0 Fed0<

01'dcotO<
Selected data taken. Only highlighted data transformed.
0 Worksheet
Expression:
~OR(V) _ _ _,

0Renamo variables (11 unique)

.~

TY1'd aem
O ceAvo>Je ·---·---·
FIX(.) Integer • alue
INT(.) lnleger volue
0 Funcllon LOG(.) Nat""ol loi;rAtun ,
7.()648
Q Samplc RNO() Random number 0-1 I
SON(.). Sign 0 1 number I
Q VorlabCe ISIN(.) • Sine
b=:Zi\1&· "· 1.1••• e;.e.:r A"l
I
Q Foct0< TAI'(.)· Tongenl
O ncica10<
PA(.). Pre.:_~nc~·~•ence __ "-

0 WO<ksheet

OK Cone cl Help
..,
:>

Expressions For an example of an Expression combining two (or more) variables, go back to the transposed
combining array Data2 and copy it (Tools>Duplicate) - which is always a good idea when experimenting! -
·; variables to give Data5. The aim, say, is to create a new row (variable) which is the C:N ratio, so first Edit>
Insert> Row, which will be just above the current position of the cursor, then appropriately rename
105
9. Data tools

and highlight it (and nothing else). Whatever manipulations we do are now placed only in that new
row. Take Tools>Transform (individual) and delete the V from the Expression box (that refers ·~

only to values in the new row, which we shall not use, since they are . all zero). Then, in the Pick
area take ((Type•Variable) & (Item: %C)>Pick) & ((Type•Variable) & (Item: %N)>Pick), which
creates the two variables we need in the Expression box. Manually insert the divide symbol (f)
between them, to give the final Expression: VAR("%C")NAR("%N"), and OK now gives a new
sheet, Data6, with the added C:N ratio variable.
•'
Cleor HQhiQht
Jnvctt HIQlilQht
Selected data taken. Only highlighted data transformed.
Cit Ctrl+X
CoPY Ctr1+c Expresslon:
Paste Ctr1+Y

0Renamo vatiebles (If i.r>que)

• Piel:
I
• Type lem:
i!!i!.ij~~ 0 Cel VeUe iN
IZn
QF..-.ction iCd
Osemp1e .j Pb
1Ct
i"o:w:·-··---- - --- ·----'1r§ <!)Vortallle l0ep
IC:N retio
OFactor !%C
Clyde <:;N . •:. -,

Environmental

Help

r
...
E

< ...
Row 1 Col 1 ! .r .::

An alternative (e.g. if you just intend to take this new variable back into the earli er transformed
sheet) is not to bother inserting a new row into DataS. Instead, highlight just its %C row (which
will now be 'V' in the Expression box), and Tools> Transform it with Expression: VN AR("%N").
In the new sheet, this will have overwritten the %C row with the C:N ratio. Either way, you can
now put the new C:N variable back into the transfonned sheet Data4 simply by highlighting it, then
Select>Highlighted and Tools>Merge>Second worksheet: Data4, taking the defaults.

Expressions Similarly, expressions can combine samples, or even factors (or indicators) on those samples (or
combining variables) - and expressions can even incorpora'te different worksheets. In fact some of the most
worksheets useful applications of complex expressions are in combinations of data from related worksheets,
such as the abundance (clrna) and biomass (clmb) arrays of macrofaunal assemblages from the
Clyde study. The key facts to keep in mind when constructing complex expressions are that V
stands for any entry in the active sheet that is highlighted, and the result will be placed into these
highlighted cells only (which could mean the whole array, if there is no highlighting), and that
maintaining strict labelling across worksheets will make it easier to understand what the expression
calculates for each cell (though, as elsewhere in PRIMER, if Transform is given two data sheets
that have conformable dimensions but not consistent labelling, then it will give the user the chance
to relax strict matching and assume that samples or variables are presented in the same order).
If not already in the workspace, open clma and clrnb. One useful way ·o r combining abundance and
biomass information from the same set of samfles is seen in equation ( 15.l) of the Methods
manual, namely 'pseudo-production' P = A0·25B ·15 . With clrna active, tum off any highlighting
(with Edit>Clear Highlight), or highlight everything (the effect is the same), then take Tools>
Transform (individual)>Expression: (V" 0.25)*(Work("clrnb")"'0.75). You can use (Type• Work-
sheet) & (Item: clmb)>Pick to remind you of the syntax- it produces Work("clrnb")- but type the
rest; the counts are held in V (= Work("clma") since clrna is active). The result is a new sheet.

106
..~r ....
9. Data tools ·

S1 S2 SJ ,..

S2
0 0 2.2795
~82- 1 -,--,-;2:1·5·· - sisi3;

- 3:1 9
20
01· - - -4-
o:

Ef.li..::..:....----l·-----or---·--o!-

r------,---t~'.::-----j 95~391- ·--33.636- - 11.136
0
Et-:-:--- - - - i ------ 0 .. ---··-····--
0
9.118 8.1324 •
I v

Transform
User-specified transform or h1ghl
.mail. .l&tl'lmEr;tl
D dtd worksheet
No.me: clrno.
Daea eypc: Abundance
Sample ~clcceion: All
Vari able sclcceio11: All
I
Expr ession
42 260
1v•o.2s) • ti.rORK("clmb")•o. 75)
132 40
0 12 Sdmple~ tr4n~formed
..:..ss'------< ____E_L _ _ _o
11113 All
(
V d.ri dbles t.rdn~tormod
' - - - - - - - - - - - - - - - 1 ' All
I<.

Average A commonly required variation of this that needs slightly more care is average body mass for each
body mass species in each sample. This is simply BIA, but needs to cater for the many cases when A (and B)
matrix (Bl A) are zero. On clma again, so that V represents abundances, Expression: Wor~("clmb")l(V - (V=O))
under Tools>Transform (individual) will do the trick because V=O returns the value -1 (trnc)
when abundance is zero. The bottom line is then >Oand the expression returns the reasonable value
of 0 (assuming that B=O when A=O!). Where there are positive counts, V>O, the expression (V=O)
returns the value 0 (false) so that the correct ratio of BIA is calculated.
An illustration of error trapping in Transform for matching of entries is obtained by copying clmb
(Tools>Duplicate) to create Data9, then Edit>Labels>Variables on this to delete all the species
labels (click Label header and Edit>Delete). A sheet must have labels so PIUMER substitutes its
defaults of (V 1), (V2), etc. Now run the above calculation on clma, but with Data9 replacing clmb.
A warning message says that it could not find (variable) labels to match, but the two matrices are
the same size so the option is given of proceeding anyway, on the assumption that the speci es order
matches. We know it does here, so continue, to give DatalO of BIA. Re-run having deselected one
of the rows in Data9, however, and an irrecoverable error message occurs- a match is impossible.

Selected data taken. Only highlighl ed dala transformed.


AOQteQate ...
AverOQC ... Expreuion:
Check. ..
jVYORK("Olllo9")A:V (V•O))
0

Ouptcate ...
-, MissinQ .. .
Meroe .. . 0Renome variable• 01 unique)
Rank verlebles. Some labels not matched
Sun... sl<ip m•tchin<J end t•ke seme
order •s worksheet selections?
lem:
jclmb
Dato4 ~ _cenc_et___, ._I

1
, DeloS
0111..S
Doto7
Dottl8
Oel'l9· .' . ~ ,- •
jo;;;;-
'~

Somo labels not m11tched


Camot match l3bels, even rekued
3, 15 Cancel
OK

107

.I'!____..__________.____ ~~ ·------_._._
9. Data tools

Transform on Tools>Transform works in the same way for an active window which is a resemblance rather than
resemblances data array: similar transform expressions are possible, though the use of auxiliary information is
limited to Type•Function and Type•Worksheet. An example of a function applied to a triangular
matrix was seen at the top of page 54, where a correlation matrix between biomarker variables was
turned into similarities using Expression: lOO*Abs(V) (dropping minus signs and scaling over 0 to
l 00). An example using two resemblance matrices can be constructed with the Clyde data, namely
the Bray-Curtis site similarities averaged over abundance and biomass measures. So, instead of
combining the data matrices (as in the earlier A0·25s 0·75) , average the A and B resemblances.
There may already be a biomass sheet, Reseml, based on square root transformation of clmb then
Bray-Curtis similarity. Repeat for clma (Analyse>Pre-treatment>Transform (overall)>Trans-
formation : Square root and Analyse>Resemblance>Measure•Bray-Curtis), to give Resem2. Wi th \..
Reseml as the active sheet, take Tools>Transform>Expression: (V+Work("Resem2"))/2 and run
Analyse>MDS on the resulting Resem3, giving a similar pattern to the biomass MDS on page 8 1.

·~
Clyde mean of B end A similerilie
SimilarilY' (0 lo 100)

SI S2
Samples ti:am:t'oi:med: '
41 .595
24 .BIB I ss3ii7 .1.11
:.i l
C/)tde macrofauna (av~e biom & abund) · ·~
j(V+W:>RK("Resem2")
SI l~':_=~~.u;~':v C'!Jlt ,tTht\J
' S2 20 Stru s: OD7
· S3 6
Piel<
Type l em:
Q Cel V&Ue

Clyde abundance Q Fl.ndion

Sim1larily(O lo 100) B

S1
Fe :t~

42.366
24.255 54.21 ~ 0 Vllork.sheet
>
OK

Ranked Returning to environmental datasheets (e.g. elev), though the choice of appropriate transform ation
variables prior lo running a PCA (or MDS) ordination is helped by graphing the variables in pairwise scatter
plots (Draftsman Plot and PCA, next section), this is not an automated procedure and requires a
little experience to become comfortable with. An alternative, eliminating the need for choice (but
arguably losing some sensitivity in the ensuing analysis), is to replace variables by their ranks, •11_

namely the numbers 1, 2, 3, .. . for largest to smallest values across samples (modi fied where
necessary to substitute average ranks for tied values). The main advantage is that outliers - values
that are untypically large and which would otherwise make an inordinate contribution to the PCA
axes and Euclidean distance measures - are given much less weight. For example, a variable whose
values over the samples, in decreasing order, are: 25, 9, 7, 6, 6, 6, 4, 2, 2, 0 would generate ranks:
1, 2, 3, 5, 5, 5, 7, 8.5, 8.5, 10 respectively, and the effect is to make the outlying value of 25 no
different than if it had been 15 or 10. Ranking each variable (separately) also removes the need for
normalising the resulting array, which~ needed (after transformation) with the standard approach,
to ensure that all environmental variables take values across comparable ranges. Ranking places all ...
variables on a common measurement scale, the numbers 1 ton (where n is the number of samples).
For elev, take Tools>Ra nk variables and look at the outcome. Put this matrix through Analyse>
Resemblance>Measure•Euclidean distance and Analyse>MDS. In order to overlay a trajectory on
the MOS (as above and on page 81), you will need to import the Site# factor from clmb to elev
(Factors>lmport, see page 34).

108
9. Data tools

Clyde environmental variables ranked)


Ink v;m~OIU
tCtll'IDIJt~ • · 0 1 Eud 1d u n clsuncit
Aver ago ...
Cu .,.,., Co Cu hVl Co 2 I 10 ~1us: 0 Ol
Check. ••
SI
• S2
°"""at a...
Mlssng .. .
10
8.5 '
1'
2!
3 5

Meroe .. . 6
"-1
74 1 349 , 12 s,
11 5 317 10 SUn ... 6

_3~~1 221 10 TrlJl'ISfonn 1j


194 257 11 TrMSPQSe ... 2
127 246 10 3,
stop Toslcs
36 194 6 S9 1'
Options ...
30 326 11
24 439 12
S10
Sii ,, ,
8 .5 1

22 801 12 S1 2 12
> <! ,>
""
Ranked Tools>R ank is also a menu option wheri a resemblance matrix is active, but it operates a little
resemblances differently. This time, all elements of the triangular matrix are ranked simultaneously (rather than
separate ranking of the rows or columns of the rectangular data sheet) . Don ' t get these two possible
rankings confused: it is easy to fall into the trap of thinking that, because a ranked data matrix wil l
be the same whether ranked from original or transformed data, if you are intending to rank the
similarity matrix then initial transformation of the data does not matter. This is quite wrong, of
course - ranking the similarity matrix is by no means the same as ranking the data then calculating
the similarities! In fact, whilst ranking the data may play a marginally useful role for handling
outliers in environmental matrices (as above), it rarely makes sense fo r assemblage data because it
destroys the special nature of the (many) zero responses, which would be assigned d ifferent tied
ranks for different species. Ranking the resemblances, however, is rather centra l to the approach in
PRIM ER: many of the core routines (ANOSIM, RELATE, BEST, ... ) start fro m the ranked form
of the s imi larity matrix, and a zero-stress MOS ordination also re lies entire ly on this rank order.
For all routines, however, it is not necessary to enter the ranked form of the triangular matrix : if the
result depends only on the ranks, this will be part of the internal calcu lation on the similari ties. The
'··' option is here mainly to help you visualise the computations underlying ANOSIM tes ts etc (see the
difference in mean rank dissimilarities that is the R statistic, equation 6. 1 of the Method s manu al).
For the Euclidean distance matrix (perhaps Resem4) from the above Clyde environmental data (the
displayed Data12), take Tools>Rank to create Resem5 . Note that the entries are the num bers I, 2,
.., 66, with just one pair of ties. Importantly, the convention PRIMER adopts here is always to
return a distance-type matrix from the Rank operation, irresp ecti ve of whether it is given a simil-
arity or dissimilarity/distance matrix (the menu item could have been called Tools>Distance rank).
Thus rank 1 corresponds to samples (S 11 & S 12), which are closest together environmentally, and
rank 66 to those furthest a!)art (S6 & S 12). Now take Tools> Rank on the similarity matrix from
the macrofaunal biomass data (Resem l , from the earlier calcul ations on page 8 1). The resulting
ranks again form a distance matrix, Resem6, w ith the closest sites in assemblage terms (rank 1)
being S3 & S4, and several pairs of sites tied on the largest, most distant rank (average of 6 1. 5) -
e.g. S6 w ith S 1, S2, S 11, S 12 etc, which are all pairs of sites with no species in common.
PRIMER handles its (distance) ranks in this slightly unconventional way to reassure the user th at,
on the many occasions when two sets of resemblances are compared to see if they are arrangi ng the
samples in the same high-dimensional pattern (e.g. assemblage v. environment, Bray-Curti s v. Chi-
squared or Euclidean measures, biomarkers v. tissue burdens etc), he/she never has to worry about
whether the two resemblance matrices are the ' same way ro.und' (whether high values correspond
to large or small differences between samples) - this is always internally adjusted to be consistent.
You can force PRIMER to do the stupid thing, e.g. run MOS the ' wrong way round', ma king it
place sites that should be similar at the greatest distance apa rt and s ites that have little in common
close together (with resultant very high stress levels!). But you can only do this by providing
PRIMER w ith a genuine similari ty matrix and calling it a dissimi larity, by changing (Resemblance
type• Simi lari ty) to (Resemblance type•Dissimi larity ) in the dia log box under E dit>Prop crtics .
112 (- Save the Clyde workspace clwk, and leave it open fo r the PCA ordination of the nex t section.

109
9. Data tools

Clyde environmental dais


Clyde environmental datti
Distance (0 to inO
Rank (1 lo Inf)
SI S2
S1 S2 S3 S4 S5 ,..
S2 7.5829
10.223
s
S3
1S 21
1257
22 31 4 -
16.956
38 42 23
SS 24.()16
.' < 62 63 4S 27 8 v
.I
---
>

Clyde assemblage biomass


Ran.'< (1 to inf)
S2 S3 S4 SS
SI S2
• ~-:i;..;;.:;_--1 -~o.~2_
4 ~-- - 52 15
25.38 L ..... se.404 . S3 3:2_ _ _2
-2~ I ----s9.458 .
56.367 s4 ----- vl=-·- - c -1·
l 'P<- - - 1 - 2.4eSl [ -11:12sr y.Joo·r1-- ,-2.-1 26· SS 52 40 ' 23 1 12
____ __?J____oL_~---1 ~-- . ?:3~ 1.,,,... .;;.;56'---I_:=_ 61~5 1 .. 6 1.si-- -- ss-, - - 54 . 45 v
I • <I I >

Tools All other items on the T-0ols menu have now been dealt with or are covered elsewhere: a) Dissim
Option menu for resemblance matrices on page 47; b) Missing on data arrays in section 10; c) Model m at r ix in
section 13 ; d) St~p Tasks on page 51, leaving just Tools>Options, which is the same for all types
of active window. It provides default settings for the initia l directory (see page 21) and the page
width - the number of characters in the fixed-spaced font used for results windows. This is initially
set to 200 and only comes into play with the few routines (notably SIMPER, section 12 but also
DIVERSE, section 14) that can generate wide lists of results. If this default is set to a sm aller value
than will allow a single span of results columns, they are split into sets and listed a batch at a time.
In practice, since DIVERSE essentially produces a matrix of samples (rows) by diversity indices ...
(colur:nns), it is usually preferable to direct this to a new worksheet, where it can be analysed as a
furthet multivariate array, or be exported to Excel or a univariate statistics package.
The Graphs tab on Tools> Options sets a few of the global defaults for all plots. Firstly, in the
Symbol area, Shape and Colour can be set for all graphs on which a single symbol type is plotted,
e.g. a draftsman plot, or an ordination which does not plot different symbol types (by factor) .
Symbbl Size can also be set, and applies also to symbols plotted by factor. Secondly , in the Bubble
area, defaults can be set for bubble fill and boundary colours and the scale of the largest bubble
(see ~OS ordinations, page 82 onwards). Thirdly, overall font scale defaults can be altered. The
shapes, colours and values in all these boxes become the defaults for matching choices in the Graph
Options dialog (Graph>Data labels & symbols or General or the Bubble tab), but note that
default changes are not retrospective: they apply only to plots created from that point onward .
F-
ml
1 Model M&t1x ...

.
Check. ..
Oissim...
Oupk&e ••• Syinb.ll BIJt>I:"'
Rclnk. •• Shape: Colou': Colour: Boundary:
TrMSform...
r. ~ ~
1-;-op~
Resuls wldtt(clwocters~

!_ _: .. -
1200 I Size: jl~--. Scole: 100

Fon!:
Scolo: 1100

Coneel

110
IO. PCA

10. Analysing environmental variables (Draftsman Plot, PCA)


Environment PRIMER uses the term 'environmental variables' as a shorthand for a wide variety of data types
-type data (including biological data!), extending well beyond the archetypal case of physical or chemical
measurements made on the environment surrounding an assemblage sample. 'Environment-!Yru!'
variables can also include matrices of biomarker responses (biochemical, sub-cellular or whole
body 'health' indicators from individual organisms, page 53), morphometric measurements on
individuals (perhaps with the aim of separating putative species), PSA data (size-class spectra for
soiVsediment/water particulates, page 42), organism body-size distributions, etc. The unifying
factor for these disparate examples is that: a) they all give rise to multivariate arrays of variables by
samples, which can be analysed by the methods in PRIMER; b) the criteria which lead to use of
community-type similarity measures such as Bray-Curtis are not appropriate (e.g. always positive
entries, with many zeros and zero playing a special role - joint absences carrying no information,
samples with no species in common having zero similarity - and always a common measurement
scale across variables, of abundance, biomass, % cover etc). Instead, resemblance between samples
of environment-type variables is better described by standard distance meas\lres such as Euclidean
distance (page 45), where zero plays no special role (e.g. zero temperatur~, but on what scale?),
where negative values can occur (indeed will occur if normalising differ~nt scales to common
units, page 39), and where positive similarity is always inferred if two sampl~s have the same value
of a variable, even a zero value (e.g. neither sample has a detectable PCB or Hg level, neither
sample has particles > size x, etc). The key message here is that whole assemblage data is different,
and requires the specialised methpds that are at the core of PRIMER (communi~y similarity
measures, non-metric MDS plots, non-parametric ANOSIM tests etc), envifonmental-type data is
more standard and can usually (sometimes after individual transformations and normalisation) be
treated by 'classic' approaches of Euclidean distances and Principal Components (PCA) ordination.
The derivation and purpose of PCA is covered in Chapter 4 of the Methods manual.

Draftsman Normalisation (subtracting the mean and dividing by the standard deviation, for each variable),
plots then Euclidean distance or PCA calculation operate more effectively the closer the data is to
approximate (multivariate) normality. This is not a prerequisite of PCA, but it is the genesis of the
method, and it is certainly true that if the data is strongly skewed the outliers will dominate the PC
axes and will often lead to poor-quality interpretation. Transformations of specific variables, or
groups of similar variables will often be desirable (using Tools>Transform (individual), see the
last section). A useful aid to transformation choice is given by Analyse>Draftsman Plot, namely
pairwise scatter plots between all (selected) variables. Two things are being looked for here. Firstly,
are the samples roughly symmetrically distributed across the range of each variable? (Imagine
dropping the points down into equal-sized 'bins' along the axis; is this histogram roughly bell-
shaped - or at least symmetric - rather than heavily skewed to one side?) Secondly, if there are
strong relationships between some pairs of variables, are these roughly linear rather than strongly
curvilinear? This is also a characteristic of (approximate) multivariate normality and an under-
pinning assumption of PCA, that ordinary product-moment correlations describe the dependence
between variables, and correlation is a measure of linear relationship. Examination of the draftsman
plot can therefore suggest possible transformations. If a distribution is right-skewed (bulk of the
distribution to the left, with stragglers to the right) then a ..Jy (mild) or logy (strong) transform is
called for. Use log(c+y) if y can be zero or negative, to make all values strictly positive before
taking the log. If it is heavily skewed to the left, consider an inverse transform, ll(c+y) where c is
close to zero, or a reverse transform, log(c-y) where c is larger than the maximum y. Try to use
similar transforms for similar types of variables, and don't be too pemickety! Logically, you will
want to use the same transform each time you analyse new data in the same context, and over-
detailed choices will preclude that. The idea is only to avoid the worst effects of extreme outliers,
when working on original abiotic· scales that do not represent the true relationships between the
samples (which the assemblage is responding to, for example - it is often the case that dose-
response relationships for organisms to contaminants are more appropriate on log concentration
scales). If you are still suffering agonies of indecision(!}, then a purely automatic approach is given
on page 108, namely to replace all variables by their ranks. This certainly achieves the twin ai~ of
a symmetric distribution and linear relationships (see draftsman plot below) but it must lose a bttle
sensitivity (organisms will not be responding solely to the rank order of dose levels!)

111
10. PCA

10 9~ The ~orkspace clwk for the Clyde dumpground study should still be open from the prev ious
section. If not, open C:\Examples v6\Clydemac\clev, of 11 environmental variables from the 12
sites .(page 80). Take Analyse>Draftsman Plot and try out the usual graphics options such as
changing font and symbol sizes and amending the title (Graph>Data labels & symbols and Titles ·~
tab), and zooming in on a viewable-sized portion of the plot by drawing a box and clicking on the
zoom icon fo on the tool bar (or Graph>Zoom In), then scrolling back and forth (or up and down)
across the various plots. Also change the name of the plot to drft untrans (File>Rename Graph).
<.
'iii .drft unlr~ns - .. - . . ·- . .--· ·-.. ~~~--rQ~~

Clyde, untransformed environmental variables . draflsman plot

14; 1400·..
DDD
Cu Mn Co NI Zn ~ Pb Ct Oep '4C '4N oa.120
100
• • • •
26 2470 34 / 160 0 101 531144 j 31 0.53
. . . . 80 • • • .. ••
3011170 15 ; 32 J 156 ~ I • • • •

37 10
74 1 394 12 · 38 182 02
0.5 1:

..... . . ...
349 12 41 227
1151317.. 10 • 37 Clyde, untrans~ ~ ~ •• .,
~ .. :

·o·. . . ~-.
344 . 221 10 ' 37

. ,.
194 257

= ~~~ [LJ-
127 246
36 194
. ....
•+• . .c
34 119 0.6 ,,, • • • • • • •
---11'----.... rr·
o.4 • • • ;: ~
Pre-trebtment... 1,. ·• JI '\ti •W .-_L - _.... ti.I '\Ill - . :ui *' ,.. • " ll ·~ •...., l .. • • -.
Pb Cr Oep %C
r· >
BEST ...
CASWEU
DIVERSE.
·~
OOfo'OIS .. ,:..
LINK TREE
PCA ... ~ ... "'-
SIMPER.. r••
,. . ""'
I.-'"'.'."'~Plot~ ~ ..,. ••.. ,.
...' . - l.i.
Oort*wlce ...
'l·l¢i•·11'-NZffi
Geometric Cass ... ~ ~ ,,, . ~·· ...... .f.. ttf" I•
I;>
,. ·.i.·
,.
'. ·'~ .._
1 Species·Accum Plot ••• Cu Mn Co NI Zn ~ Pb Ct Oep %C
- -
Most yariables are right-skewed, which is why they were log transformed with Tools>T ransform
(individual) on page 105. A milder, square root, transform was used for the two organics, %C and
%N, but this was simply to illustrate that different transformations for different variables are
possible - a log transform throughout is probably better (excepting for water depth, a very different
type of variable, which is seen to be more symmetric and not transformed. Co and N i are not right-
skewed so could also be omitted from the transform, though it is logical to treat them in the same
way as other heavy metals). Re-run the transform from page 105, with logs (except Dep), renamed
clevt and take Draftsman Plot, this time with (./Correlations to worksheet), renamed drft trans .
.. r:;
Clyde transformed environmental - Pearson
Coirelalion (-1 to f) r.-
hCu hMn hCo In NI
hCU
hMn
0.72753
OK 0.23299 0.71 927
0.30857
hZn 0.94039 -0.33876 4.8049E-2 0.57207
hCd 0 .96966
Sa ve Graph A< ...

;., . .,..t
... ~
. ...
...
• •
....
"• ••
• t
• .. a.• ••
fJ;•
t•
.. ,. #> ... ... . • ..

t•
- ... .. - ..

. .- ..
·I
1 .. .
.
...
.. -t..•
t:-:
t'

In Mn In Co In NI lnZn
·~·
In Cd In Pb In Ct !> tp In C

112
&..U~ ntliRlQUE Lo ~
DZB - IBILCE - UNESP 10. PCA

The scales are inevitably unreadable on the full draftsman plot, so the above figure takes the only
graphic option which is specific to draftsman plots under the Graph>Spccial menu, to turn off
.l'Show scales. Leaving the scales in, when they are readable (e.g. under zooming), does make the
point however, that even in their transformed state the variables take values over very di ffe rent
ranges, and normalisation will be required (after transformation) before runn ing a PCA. The
correlation matrix shows that many of these variables are highly inter-correlated (many tend to
increase near the dumpground centre, unsurprisingly). This is not a concern fo r the PCA ordination
undertaken next: part of the point of a multivariate analysis is to represent high-dimensional data in
low-dimensional space, and this will actually be more successful if many of these variables are
inter-correlated, so that the points effectively lie in a 2- or 3-d subspace of the 11-d space. (It is
much more of a concern for linkage techniques, which try to 'explain' an assemblage structure in
terms of a small number of driving environmental variables, see the next section.)
:.. .,. ·.~ The final possibility is to sidestep individual transformations altogether and wo rk with the variable
l ranks (page 108) - a different type of transformation. As noted earlier, the vari ables are then forced
to be symmetric, any (monotonic) relationships are certain to be linear, the variables are placed on
a common measurement scale (ranks 1 to 12) and there can, by definition, be no outli ers - but the
loss of the measurement scale has some drawbacks in using PC axes for prediction. The correlation
matrix is now of Spearman 's Ps because this is ordinary Pearson correlation computed on ranks.

Pre·treotment...
Resembl.w:e ...
BEST ...
CASWELL ...

-·'· ........... ·.: :...


DIVERSE:.•
OOMOIS ..• 10 •••

.
0
u $ • • • ••
LINKTREE ...
PCA ... :; 10 • •• •
- 5 ••. . .
SIMPER...
SIMPROF•.•

Oomln~nce Plot...
~ .
''-,• "•
. .. .
, ...
s.,

Geometric C'4s~)lot .••


..
Specles·Am.rn Plot ...
... _.. . . -· , ,
..,·'·
-·~ _.._. _ ,........_..~ _~_

~iilflJ!nilfl P.ll?t,
......,.. , ..
: 0 .. I • ,

•' ·-
0 Correlations to worksheet
. I :
... '··
....... . .·.,.,.,.
-'·,•..
... ,.......•:. ...·..:·... ........
I •

.....,.. .. . . : ...: .,,..,,


... . .·....,. / -',
Cancel
~ I· t• ,,,,,·~ ' ,•
";!-·-
5 10 ~ 10 5 10 5 10 5 10 5 10 5 10 5 10 5 10
Cu Mn Co Ill Zn Cd Pb C1 Otp ">'C

J ' Principal
Components
PCA is an ordination in which samples, regarded as points in the high-dimens ional variable space
( 11-d here) are projected onto a ' best-fitting' plane, or other low-dimensional solutio n - the user
Analysis can specify how many principal components (new axes) are required, and the routine offers 2-d and
3-d plots of any combination of these PC's. The purpose of the new axes is to capture as much of
the variability in the original space as possible, and the extent to which the first few PC's allow an
accurate representation of the true relationship between the samples in the original high-d space is
summarised by the '%variation explained' (a ratio of eigenvalues). The PC's are simply a rotation
of the original axes and thus a linear combination of the input variables (the coeffi cients are termed
eigenvectors); PRIMER6 allows for superimposition of these vectors on the 2-d PC plot. T he co-
ordinates of the samples on the PC axes are called the principal component scores, and these are
output to the results, along with the %variance explained by each axis and the linear coefficients
defining each PC. Chapter 4 of the Methods manual explains the concepts in a little more detail.
For clevt, a log(O. l+x) transformed form of all environmental variables (except Dep) in the Clyde
workspace, run Analyse>Pre-treatment>Normalise variables (see page 39), renaming the result
clevtn. Note that the results window holds the mean and standard deviation used in normalising
each variable, which could instead have been sent to a worksheet. On clevtn take Analysc> PCA,
·.. giving two outputs: a detailed results window with three sections (Eigenvalues, Eigenvectors and
Scores), and the 2-d PCA plot with superimposed vectors (blue circle and Jines).

113
10. PCA

mm__
I
I Pr"°tre~... • Standardse•••
Resemblance ••• Cl.fnl.lbtes~s ••• tJa r iable Hean SD C.
Transform (overal)•.• O Stats lo w .ln Cu '1.0462 0.92426
ln !In 6 . 0623 o. 75783
We'tQlt.va~s .••
Dispersion welQl>tlr>Q •.
ln Co 2. 4078 o. 221
~ lnNi 3.4733 0.24568
ln Zn 5 . 2905 0. 60832
' ln Cd

l!icprtossion
Clyde transformed +normalised
' 1oq( . 1+V) Environments!

V<!ridbltos t lnCu lnMn lnCo lnlll "


Cu -0 .648 2.308 1.0501 I 022779 J
llln
'Co
IN1
lzn
Cd
jPb %Variation Cum.%Variat ion
....
(

1cr 62 . 4 62.4
l %C 2.96 26 . 9 89 .3
..., 3 0 .'166 4.2 93.6
l:N > .·
4
5
o .365
0 . 206
3.3
1.9
96.9
98 . 8
mm_ - - . ·- - - Eigenvec t or~ f!l (;] ~
Pre-t P,CA.. ·":.; ... (Coe:cticienta in the linea r comb i n CA;de environmental variables (log transformed+ normalised)
Rest!
IBEST ...
Va-riablc!
Ma:drrlJmnoolPC ln Cu
ln Hn
PCl
-0.3 78 · 0 .035
O.Z13 -o. '118
PC2 PC3
0 . 103
0.367 -o.
0.
6

CASWELL. ~ ::B .ln Co 0.075 -0 .539 - 0 .295 -o.


·~
S9
DIVERSE..
ln Ni - 0 . 149 - 0.466 - 0.617 o.
OOMOJS... 0PIOt resuls ln Zn -0.3 66 -0. 156 0.077 o.
LJNKTREE. ln Cd - 0 .361 0.110 0.170 o.
l n Pb - 0.369 -o. 12 6 0.040 o.
2
ln Cr - 0.351 0.091 -0.194 o.

:~
Slf'"PROF. Dep o . 120 -0.459 0.480 o.
Dominance Plot... I ln \C -o . 367 7 0.090 0 .114 - o. 2 $10

ln ~N - 0 . 335 ·.:..0.191 0 . 265 -o. $11


Draftsman Plot... 0
s SS S12
Geometric Class Plot...
1
I
Principdl Cornpontont Scortos
lnlll ~~MnS3
Spedes·AcC\J'll Plot...
----·----' Sample SCOREl SCORE2
Sl 2.01 -2.4 1
SCORE3 sco
0 . 84 -0 . -2
In Co ...
/.'.:S2
52 2.3 -2.09 0.975 o. SI
53 0.985 -0 .905 - 0 . 353
54 -0 .455 -0 . 614 -0 . 792 o.
SS -2 . 24 -0 . 304 0.309 0 -4
- 5.18 -o. 194 -o
56 0 .402
. -6 .4 -2 0 2
~I Jll PC1

\.
PCA eigen- Though the superimposition of vectors has a tendency to clutter the plot (they can be turned off by
vector plot unchecking the 'Show variable vectors' box on the Graph>Special menu), one can still see the
changing contaminant load along this E-W transect of sampling sites (Fig. 1.5 in the Methods '·!.

manual). The end points S 1 and S 12 lie close together and there is a strong trend from S 1 to the ...'
dump centre at S6 (right to left on axis PCl), and a reversal of that trend for S6 to S l2, moving
away from the dump centre. The trajectory is somewhat different on the PC2 axis, however, for the
two arms of the transect. The results window (heading Eigenvalues) shows that a 2-d PCA is a very
good description of structure in the higher (11-d) space, PCl accounting for much of the variability
(62%) and PC2 most of the remainder (a further 27%), i.e. 89% between them, which is untypically
high). The Eigenvectors give the linear combinations which define the axes:
':!.
PCl = -0.378(ln Cu)*+ 0.213(ln Mn)*+ 0.075(1n Co)* - 0.149(1n Ni)* - ... ;
PC2 = 0.035(ln Cu)* - 0.418(1n Mn)* - 0.539(ln Co)* - 0.466(ln Ni)* - .. ..
(the asterisks being a reminder that the transformed variables are normalised). It is the coefficients
in these equations (eigenvectors) that the vector plot shows in graphical form (e.g. ln Cu bas
coefficients -0.378 and 0.035, so its main contribution is to the first axis, increasing from right to
left because of the negative sign, with only a slight increase in the positive PC2 direction). The
vector length reflects the importance of that variable's contribution to these particular two PC axes,
in relation to all possible PC axes - if the line reaches the circle then none of that variable's other
coefficients in the Eigenvectors table will differ from 0. The vector plot (or the results) show that
PC! is a rough!?' equally weighted combination of most of the heavy metals, Cu, Zn, Cd, Pb, Cr

114
:
.,,.,,,. 10. PCA
I ·f"'
and. organics, but not Co, Mn, Ni and Depth. The situation is reversed on the PC2 axis, with the
:l!1' first batch scarcely contributing at all, but the second set all decreasing strongly in the positive PC2
direction. So, the first PC gives a natural way of combining the different contaminant levels into a
-/"' single summary variable that characterises the main contaminant gradient from the sludge-
pi
dumping. See Chapters 4 and 11 of the Methods manual for more on this particular example, but
~ the principle of using a Principal Component axis as a natural, objective combination of a suite of
variables is one that applies equally strongly to biomarkers, morphometric measures, water-quality
~
'metrics' etc. The only difference in the latter case is that the metrics may already be standardised
..If' to a common impact scale (0 to 10, perhaps) so no prior transformation or normalisation is needed
f!' before PCA is carried out. For morphometric measurements too, transformation is often not needed
and lengths, widths etc may be in common units, but normalisation may still be needed if widely
~
different ranges of measurements are involved (overall body length, setae width), to stop the larger
; ,,.
·~ft' measurements completely dominating the PC's. For typical biomarker suites, transformation would
need to be considered and normalisation would be essential, since different scales are involved.
·~,,.
PC scores The final table in the results window is headed Principal Component Scores (this can alternatively
~ be sent to a new worksheet by checking ./(Scores to worksheet) in the dialog from Analyse>PCA,
~pi which makes it easier to export them to other software or use them elsewhere in PRIMER, e.g. to
calculate Euclidean distances between sites in PC spaces of different size). The scores are simply
-~
the x, y (or x, y, z etc) co-ordinates of the sites on the PCA plot- their values on each PC, obtained
.l!A by substituting the (normalised) variable values into the above linear equations for PC 1, PC2, etc. It ·
.f'A is the ability to generate a numerical score for any fresh set of values for the same suite of variables
which is one of the strengths of PCA. If values from a new site are recorded as (Cu0, Mn 0, Co0, • .".)
··~
we can see where it fits on the previous ~ontaminant scale by calculating:
~
PCl = -0.378{[(ln Cu0)-4.046]/0.9243} + 0.213{[(ln Mno)-6.062)/0.7578} + ...
:.•f!'s
-~ where the means and standard deviations used in the normalisations were given in the results
window from running Analyse>Pre-treatment>Normalise variables on the original (logged) data
-·~
set. This is the main downside to using rank variables in a PCA (which on other grounds has much
~(II going for it): it is harder to relate new sites to the PCA on the original set of samples.
·f' Analyse>PCA allows one other selection, namely (Maximum number of PCs: 5) by default.
~p Increasing this number will print more columns of PC vectors in the results window (PC6, PC7,
etc), and will allow selection of these higher PCs to be plotted in pairs or triples in the 2-d or 3-d
-~ PC configuration. However, it is rarely helpful to interpret more than the first 3 or 4 PC's, so the
·1~
default computation of the first 5 is usually perfectly adequate. It is important to note that nothing
',,.. changes at all in the first 5 sets of vectors if it is decided to calculate axes 6 to 10, say. Each lower-
dimdnsion configuration is a projection from the higher-dimensional solution, which therefore just
'!' involves dropping out the higher axes. This is not true of MDS ordination, for which the 2-d
.1P solution is recalculated from scratch, and not just the first two dimensions of the 3-d solution.
-~
PCA plot Many of the options for manipulating PCA configurations are exactly the same as for MDS plots,
....~ options coveted on pages 78 to 88, so will not be repeated here - only features that differ will be illustrated.
·.·;f'A Gen~ral rotation is not allowed in a PCA: directions have defined meanings as the axis of greatest
·.·J.:p variation, the axis perpendicular to this with the greatest variation amongst that unaccounted for by
,.I .
the first axis, etc. Any axis can, however, be reflected ('flipped') without affecting the interpret-
'-.~~ ation in any way. Which direction the algorithm chooses to plot an axis - to the right or left, up or
··"~ down, in or out etc - is arbitrary (though repeatable). In fact, it is inconvenient in this case that the
::~ftt MOS plots of an abundance and biomass average for assemblage data, and of the ranked form of
the environmental data (pages 108 and 109), both plotted sites near the end of the transect to the
.-..~.·~ left and the dump centre to the right (and also Slat the top not the bottom of they axis). So, reflect
c~ the PCA plot: from the right-click menu when the cursor is over the plot, take Flip X (or Graph~
c;~ Flip X) and then Flip Y, and note the reflection of both th~ points and the vectors (of.course~. This
does mean, inevitably however, that the information already written to the results window is now
c~ slightly incorrect: the signs of the first two eigenvectors and the first two sets of scores should
..
, \
t,.;~ mentally be switched(+ to - and - to+), since Cu, Zn, Cd, Pb, etc are n~w increasing strongly to
the right. The current scores will always be output by File>Save Graph Values As however (see
page 73 ), just as they are for current MOS or CLUSTER rotation states. .. ·
115

t~
'~~
10. PCA
,. .
Now, in the Grap h>Special menu, remove the superimposed vectors by unchecking the 'Show
variable vectors' box, and join the points along the transect with (~Overlay trajectory>Trajectory
numeric factor: Site#) - if the factor doesn't exist, create or import it, as at the bottom of page 80.
Note that this PCA ordination is very similar to the MDS plot at the top of page 109: both are using
essentially the same measure of 'dissimilarity' (Euclidean distance) on much the same data, and the
. •I
fact that the samples largely lie on a 2-d plane in the 11-d space makes it easy for both methods to
display an accurate 2-d picture - the greater sophistication of MDS is not needed here. If they had ,.
both used the transformed environmental variables then the match would have been even closer
(site 6 is a little closer to the other sites in the MDS plot because ranked variables were used there).

I =Is
_ GenorDal...
&symbols I
-Da-t._•,ebels
-- &-
symbols
-~
Clyde environmental variables
1
v Pointer Spedal•••
I Zoom In
I ZoomOut
General...
------
- v Poi"iter 2 (.
I - Zoom In
~ ZoomOut
0

2
0 2D scatter x ·2
030scatter §I
0 2DWll>lo Vaxl
-4
[PC2

.: ,1 ~'·

;~: .5
-4 ·2 0 2 6
PC1
Bl4ible dale
T ro/~Clory
00veriQytrojeClory
Tr•Jeclorv numeric loctor:
!sae# .... I
.:... 1
More ...

i Show variable vectors

G=:J 1,--Canc-el-. Help

More interesting is the fact that the PCA (or MDS) of the abiotic variables is an excellent match to
the MDS of the assemblage (page 81), and this observation motivates the BEST routin e of section
11. (Note that a PCA of the biota is poor by comparison, since it implicitly uses Euclidean distance
rather than an assemblage-based coefficient such as Bray-Curtis - and it actually fa ils to display a
convincing species gradient even though there patently is one here. Choice of relevant similarity is
much the most important decision to make in multivariate analysis - a point seen in section 13.)
Of the other options on the Graph>Special menu, overlaying groups from a CLUSTER run (which
to be consistent must use Euclidean distance) is no different than for MDS ordination, page 80, and
bubble plots likewise are executed in just the same way as on pages 83 to 85. Bubble plots are not
really needed in order to judge the contribution of individual abiotic variables to a PCA derived -·i
from all of them - the vector plot provides that information correctly (and simultaneously) for all
variables. This is because the relationship of a single variable to the PC axes has to be a simple
linear one, by definition of PCA. (That is very different from an MDS, where the relation of single
species to directions in MDS space derived from all species will often be totally non-linear and not
even monotonic, explaining why PRIMER does not offer vector plots for MDS - they can be
highly misleading). There could be some point, however, in relating different sets of variables by
PCA bubble plots, e.g. individual species bubbles superimposed on an abiotic PCA.
~)
Multiple 2-d The main difference on the Graph>Special menu for PCA is that any higher-dimensional pair of ·-
& 3-d plots the calculated PC's is available to select for the x, y axes of the plot, so a plot of PC4 against PCS,
say, is doable (if not often sensible to interpret!). As with MDS, use of Tools> Duplicate when the

116
10. PCA

acti.ve window is a plot will allow multiple copies to be displayed on the screen (e.g. with Window
>Tile Horizontal, having taken Window>Close All Windows and clicked on the series of plots to
redisplay them). They need to be saved one at a time, however. The three 2-d plots of PC I v PC2
PCl v PC3, PC2 v PC3 give, arguably, a more accurate way of publishing a 3-d plot, but the 3-d
PCA graph in PRIMER is certainly the better way to view the structure on screen. As with MDS
this can be zoomed with fo ., continuously rotated with P" both from the tool bar (equivalent!;
Graph> Zoom In, Graph>Rotate Axes), and flipped on all three axes (Graph>Flip X or y or Z),
but not the MDS option of Rotate Data · ~.• since the relationship of points to axes is meaningful
and fixed . Any 3-d configuration can be chosen, note, so it is perfectly possible to select Graph>
Special>(Plot type•3D scatter) & Axes>(X axis: PC2) & (Y axis: PC3) & (Z axis: PC4) .

m
M•im)@if i
m D6tal6bels &.symbols lil ~
Stoplos~ '.' Il'!l!:·:!·:!j
· " '• • • - Plottypo·-~-· .~Axoe ·· · -
.·,•
•i
Options...
G el
ener ··
I
0 20 scotter . I1 lj· ,...xo s: _ _ _~
_ xl_
.; Pointer 030sc~er _ _ _ _:"_,I
j j ._lPC2 0-· 2

v
J
'· Zoom In
02DboJlblO • y OxJS: 0
..
I
Zoom Ol.t .--------. i::
P•
FlipX
FlpV
PC1
PC2
E ·2

.4

0 2Dsctltter x oxts:
030 scatter
ci;t bubble Y oxls:
---
PC2
'--- --·
Z oxls:
,-
.

m __
Dalo i;bd s &. s~ Doto labels &. symbols
Special.. . Spedol. ·· PC2
General...

Rip X
Flop X
, Fl;p V
Fl;p 'f
, Fli(l Z <1 >
Fl'P Z
- - - - - - ==::J

Interpreting Another subtle distinction from MDS is that only a single PCA graph window is produced ini tially,
,.. PCAvMDS allowing a choice between displaying a 2-d or 3-d scatter plot. This is because - as previously
pairwise plots noted - the PC algorithm generates just one solution, with up to as many PC' s as requested: a 2-d
PCA is just the first two axes of the 3-d PCA, and so on. With MDS , on the other hand, the 2-d and
3-d plots are entirely separate solutions and thus held in different windows. It is perfectly possible,
,.. starting from a 3-d MDS graphics window, to take Graph>Spccial>(Plot type•2D scatter) and
\ ..
generate the three pairwise plots: MDSl v MDS2, MDSl v MDS3, MDS2 v MDS3. As remarked
above, this might be a preferable way for static viewing of the 3-d solution, rather than an arbi trary
proj ected view of the 3-d box. But, unlike PCA, do not expect the MDS 1 v MDS2 plot from thi s to
be the same as the purely 2-d MDS solution! They mean very different things and the purely 2-d
MDS solution will always be the better "representation of the original relationships in high-d space.
It is clear for the Clyde environmental (or biotic) data that a 2-d ordination is perfectly adequate(%
variance explained is high for the PCA, stress is low for the MDS). The plots show how _little
absol ute variation there is on the third axis (another good reason for preserving the aspect rat10, as
PRlMER does for all ordinations - a distance ofO to 2 units is the same on all PC axes, note).
~
f. 1 20~ Save and close _the clwk workspace (which will be needed again shortly).
'Ut·

11 7
10. PCA
,...
PCA of data An example where a 3-d plot is marginally more necessary is given by the biomarker data first met
'
on biomarkers on page 53. Re-open the brbmwk workspace (or, if not available, open brbm.pri from the C:\
Examples v6\Biomark directory). The variables need first to be normalised, with Analyse>Pre-
55 -> treatment>Normalise variables, before running the PCA, since they are on different measurement ,(. ' ·
scales. (You might also wish to log( 1+x) transform the EROD variable before the normalisation, r•.
I

see page 105 - the other variables don't look as skewed as that, and there is not much you can do
about the binary variable N-ras! Alternatively, you could transform by ranking the variables, as
suggested on page 53 - see also 108 - but don't get confused by the reversal in the vector plot that
this causes, low ranks corresponding to high variable values) . It is rather easy to overlook the
normalisation step when running PCA, and the analysis here would be fairly disastrous without it, ·,
since the PC's are simply hijacked by the variables with the largest numbers. In other cases, where
there ~ a common measurement scale, the normalisation step may not be needed at all - a good
example would be the Particle Size Analysis from Danish sediments, on page 42.
Take Analyse>PCA and this time send (..l'Scores to worksheet); 5 PC's should be enough since
there are only 11 variables. On the Graph>Data labels & symbols menu, then choose Symbols>
..l'Plot>( ..l'By factor: Site) and tum off the labels. The 2-d PCA shows the separation of fish
biomarker responses in the 5 areas, with sites 5 and especially 3 in the direction of decreasing
lysosomal stability and pinocytosis, and increasing levels of oxyradicals, size of lipid vacuoles etc
(indicating stress on the organism); what tends to separate site 7 and particularly 9 from site 6 are
increasing EROD and Tubulin, decreased Ubiquitin and Endoplasmic reticulum etc. The coeffic- \.....-
ients (eigenvectors) in the results window also show that N-ras only tends to come out in the
higher PC's. The eigenvalues show that 3 PC's is enough to capture over 70% of the total
variability (a good target figure), so it is worth a look at the 3-d plot with G.raph>Special>Plot
type•3D scatter. This certainly separates the sites clearly but the extra 10% of explained variation
c.f. 2-d does not change the interpretation to any extent.

•Vo.ri<lCion C\L•11. r,
40.l
2.29 20.6
1.09 9.9
0.74 6.7
5.2

• •J

0.206 0.014
-o. 374 0.034 0.414
-o. 352 0.067 o. 491
0.'101 -0 .046 0 . 166
o. 227 -0.461 -o. 12 6
0.272 0.141 0 . 509
-0.174 -0 . 471 0 .162
-0.229 -o. 366 0.072
0 . 424 -o . 373

Bremerhaven /OC worl<shop - biomarkers


Site
.A. 3
TS
•• :; 6
+7

.. .... >
-4-1---+----<---+----+-----1
-2 0
PC1
2 6

133 <- (Re-)save the workspace as brbmwk and close it.

118
~
~ 10.PCA
~ Re-open the Clyde workspace clwk in C:\Examples v6\Clydemac, and in particular the transformed
..
-~ ~ environmental matrix clevt (log of metals, C, N).
f' Missing data The definition of missing data was given on page 23. It refers to variables (usually environmental-
~ estimation type variables) that are not recorded for some samples; it does not refer to designs which were
-~ intended to be balanced but for which some replicate samples have been 'lost', for all variables.
._r (The latter are not generally a problem to handle in PRIMER, since balanced replication is not
required for most of the simple testing that PRIMER is able to carry out.)

~ Many of the routines, including PCA, require the user to enter a complete matrix, with !lQ missing
values. At a simple level, it is fairly clear why this should be so: for the trivial 2-variable case in
~~ which PCA was introduced in Chapter 4 of the Methods manual, imagine 'losing' one of the
•f'!' variable values for one of the samples. What is now that sample's contribution to the total
~-~ variance? How can it be projected perpendicularly to the 'best-fitting line' through the points? How
can that first PC axis be determined at all without knowing the contribution of this sample, and so
·~~~ on? Moving to an MOS ordination might be thought to ease the problem: we then only need to
,_.-~
.. worry about (Euclidean) distances between pairs of samples. But again this is problematic: for each
pair of samples we could retain only those variables which are present for both (known as pairwise
-~~
deletion of missing data, rather than 'listwise' deletion), but now the Euclidean distance between
(:t" samples for which there are many pairs of variables is inevitably - and artefactually - larger than
.···~ the distance for pairs of samples left with fewer complete variables. The same will be true for many
·:~ other measures since, to some degree, all (dis)similarities are sensitive to the size of the space they
• are calculated over. There is a simple answer to this problem of course: remove (listwise) as few

".."
··~
variables and samples as possible, in some judicious balance, such that a complete matrix is left.
The routines Tools>Check, Select>Samples>( •No missing values) or Select>Variables>( •No
missing values) will help with this. When there are large blocks of missing data - a subset of the
variables were simply not recorded at a large group of sites - then this is likely to be the most
·' f'1'
realistic option. In other situations, where there is very little missing data, it can seem very wasteful
·~"'
·f'A
of valuable resources - a whole sample has to be deleted because one variable is missing, or a
whole variable deleted because it was not measured for one sample. In this circumstance,
PRIMER6 provides an alternative - estimation of the missing data point(s).
·:',~

.·~~ EM algorithm Too~>Missing is designed to operate only on matrices for which: a) assumptions of multivariate
::·P, assumptions normality can be made; b) there are many fewer variables than samples, so that there are enough
data values to be able to estimate the parameters representing means, variances and correlations of
·'.:~ all the variables, with reasonable stability; c) there are rather few missing data points (each of those
;;rr is a hew parameter that needs estimating also); d) the data points are thought of as 'missing at
random•, rather than missing because they were so extreme that they could not be recorded; e) in
/;,"'
';.:• the current implementation, the samples are assumed to be unstructured, rather than, for example, a
,r:~
series of replicates from a set of a priori defined groups.
·~:~
ii'
:.i.r'
These are the assumptions that most of the methods of PRIMER are trying to get away from, of
course! But that is mainly because they are completely impossible to satisfy for assemblage data;
.-_;.,)·
they may be much more realistic for continuous, environmental-type data (including, for example,
~':'-"
··.i,,,
,·"1'
.··.~
..
morphometric variables). The estimation technique that PRIMER uses is the standard statistical
method under these conditions, namely the EM (expectation-maximisation) algorithm. It is rather
tricky (and dangerous!) to give guidelines for when the method will prove acceptable, but you do
.. f4'A have some help from the algorithm. Firstly, if you set it an impossible problem (far too many
parameters to estimate for the number of data points you have) then it should fail a convergence
.-·~
F
threshold and display an error message ('max number of iterations exceeded'). Secondly, when it
·f!A does converge, it is also able to provide an approximate standard deviation for its estimate of each
missing value. If this is large then there has clearly been insufficient information to pin down a
...·~-
likely value for the missing cell. As a rough rule-of-thumb, you should not expect to be estimating
more than about 5% of your data points if your analysis is to retain any credibility(!), and you
should have enough samples n compared with (selected) variables p and missing cells m, so that
there is a half-decent numbe~ of data points per estimated parameter DpP = n/[(p+3)/2 + (mlp)]
(around 7 is sometimes cited, in general contexts). When this criterion is far from being met usmg
the whole matrix, you may be able to take a piecemeal approach, selecting just a small set of the

119

d
r
10.PCA

"t" •.
I
most relevant variables to drastically reduce p. The method is clearly only going to provide you
with something useful if there are variables that correlate fairly well with the one containing the .)
·1

missing data, so that it has some basis for the prediction. Draftsman Plot will work on datasheets
with missing cells, so you can use this (and its correlation table) to select out good subsets of r'
variables for estimating each missing data cell. Use of Tools>Missing should not be seen as an
automatic process therefore - you must expect to have to work hard to justify any data points that
you are making up! In the end, common sense is the best guide here, as always. Look at each
estimated value - they are always displayed in the worksheet in red - and compare it with the range
of values from the other samples for that variable. Does it look ' reasonable', or has something '·
clearly gone wrong with the fitting routine? If all app~ars well, then it does have the objective -.
credibility of being the maximum likelihood estimate of that cell, and not just some subjective
value that you wish it was! Also, look at the standard deviation (cr) of the estimate in the results ,..
window and try sensitivity analysis. Add or subtract up to 2cr from each of the estimated cell values
at random, and re-run your PCA (or MDS, ANOSIM etc). Whatever you esti111ate for the missing
values may make no difference to the outcome, if they are within a reasonable range of the other
data - you then have a very credible analysis.
I I 7 -) For the transformed Clyde environmental matrix, clevt, take a copy with Tools>Duplicatc a nd
from this remove two cells at random: double click on say (S4, Jn Cu) and hit the delete key, then
(S7, In Pb) and delete again. Both cells will now be displayed as Missing!. Analyse>Draftsman
Plot shows that the normality assumptions are probably not too bad - see page 112 - but the DpP
criterion for the 'whole matrix fails badly (n = 12, p = 11, m = 2, so DpP = 1.7) and we should not
trust the outcome even if Tools>Missing converges. The correlation matrix output with the drafts-
man plot shows, however, some very high correlations between, for example, Cu, Pb and Zn,
which ought to give a better basis for prediction than the whole matrix. So, select out just these
three variables (highlight them then Select>Highlighted), and Tools>Missing produces credible
missing data estimates of 4.18 (S4, ln Cu) and 5.26 (S7, ln Pb), c.f. the original 4.31 and 5.17. Note
.......
that DpP = 3.3, which is still some way from respectability, but clearly is capable (sometimes at
~

least) of producing useful results. When they pass further perusal (redraw the draftsman plot for
these 3 variables), the estimates need to be individually copied (click in the cell and Ctrl-C) and
pasted back into the full matrix (Ctrl-V at the cursor). Of course the process is more automatic in
less borderline cases, with larger n, when the full matrix can be input to Tools>Missing.

AoorOQate ...
Aver/11¥'...
Check. .. 0 Zeto (biotlc)

0EM (ermOMICAel)
M4X letolioN:
rooo :±1
4 3957
4 I 807 5.4254 4 5757
·==---1 4.7458 5.7964 4.9207
5.8409- 6.4602 . 5.76Ss
5.26646.0523 5.2619 l

. 4.76 4 0622

6.0


.. •
Data type: lnvirorunental
Samp le ~election: All
Variable ~ el ection: 1,5,7
E 5.0
4.5
-.•••

~ BEST ...
CASWELL. ••
DIVERSE ...
OOMOIS ••.
Algorithm: EH UNKTREE •••
• •
..
Maximum iteration~: 1000 5.5 PCA...
Minimum change (proportion) : l E-06 • • SIMPER .. .
~ 5.0 11111
~ ~ Sll'V'ROF .. .

• • •~ • ...
Missing V<!lues £:; 4.5
Sample Variable E~timate Sigma •• Dominance Plot .. .
s~ ln cu •1. 1806 o. 29702 4.0
~·•• ••
I •

1 52 ~
S7 ln Pb
~ ~:-; ~· -, =~~
5.262 6.8799!:-2
. 3.5 4.0 4.S 5.0 5.5
In Cu
. 4.S 5.0
In Zn
S.5 6.0 Spedes·Am.rn Plot ...

120
6-.- I
11. Biota-Environment

11~ Linking assemblage to environment (BEST: Bio-Env, LINKTREE)


BEST The main rationale for the Analyse>BEST procedure in PR1MER6 is to find the 'best' match
rationale between the multivariate among-sample patterns of an assemblage and that from environmental
~ variables associated with those samples. The extent to which these two patterns match reflects the

.,
-~ degree to which the chosen abiotic data 'explains' the biotic pattern. This leads naturally to the idea
of searching over subsets of the abiotic variables for a combination which optimises that match
~ namely the best explanatory variables - see Chapter 11 of the Methods manual for details of th~
method. The concept is a more general one (see also Chapter 16), and BEST can equally be used to
··~ find: subsets of taxa which best match a fixed environmental array (e.g. vulnerable and opportunist
;
species characterising a known impact gradient); subsets of biota which best match a different
.~fr
biotic matrix (e.g. key coral species 'structuring' a reef fish community) or even the same biotic
~ matrix (e.g. a small subset of species, perhaps chosen from a set of easily-identified taxa, which
generates the same multivariate sample pattern as would the full assemblage). Parallel applications
~
for different data types can also be envisaged, e.g. a subset of tissue chemical concentrations that
~ best 'explain' a suite of biomarkers, or conversely, a subset ofbiomarkers that best identify a body
~ burden contaminant gradient, a subset of geomorphological variables that best characterises an
existing classification of rivers or coasts, a small set of morphometric or genetic/molecular
~ measures that is as effective as a larger set in discriminating two putative species, and so on.
p
Bio-Env v. BEST amalgamates the BIO-ENV and BVSTEP procedures of PRIMERS (hence BEST= fiio-£nv
-~
BVStep +Stepwise) since they have an identical purpose - to search for high rank correlations between a
~ secondary, fixed san:iple similarity matrix (tYpically from a species assemblage) and resemblance
~t" matrices generated from different variable subsets of a primary matrix (the active window, usually
-~
a transfonned and nonnalised suite of environmental variables presumed to include those 'driving'
the assemblage structure). The only difference in operation is that BIO-ENV carries out a full
~ search of all possible combinations of variables from the primary datasheet, whereas BVSTEP
~ caters for the common situation in which there are too many variables to do an exhaustive search,
and a forward-stepping and backward-elimination stepwise procedure is necessary to arrive at a
f!1t'
(possibly) optimal set. Within Analyse>BEST, these two main alternatives become the Method•
-f$' BIOENV and Method•BVSTEP choices with associated BIOENV and BVSTEP tabs; features on
•ft' the General tab are shared for the two methods.
~ General & Since the active window is the data sheet from which variable selections will be made, the other
~ BioEnv tab worksheet to be specified must be the fixed resemblance matrix to compare with these selections.
choices The two matrices must unambiguously refer to a common set of samples otherwise no matching is
~~
possible. They do not need to contain the same number of samples: sample labels which are shared
· ~ ... between the two will be extracted and the matching carried out only on this common sample set.
~ Alternatively, if they have no sample labels in common, but they do have the same number of
...~ samples, an opportunity is given to relax the strict label matching and pair up the samples in their
order of appearance (it is quite common to want to use different sample labels for environmental

:-.
··~
"""' than species matrices but, if so, the onus is on the user to make sure that the samples are in exactly
the same order in the two matrices). The next area on the BEST dialog concerns choices on the
(primary) Sample data: Resemblance and Select variables buttons. The default Resemblance will
depend on the datasheet type. In the absence of any other guide this will be Bray-Curtis for biotic
t~ data and Euclidean distance for environmental data (which should already be normalised before
·~~ entry to BEST). If, on the other hand, that data matrix has already had a resemblance calculated
from it, PRIMER6 may utilise that knowledge to set different defaults. In any case, you should
·.~
always click the Resemblance button to ensure that you are getting the choice you wish.
)~
The Select variables button gives a selection dialog with three panes. The default is for al I
..~ variables present in the active matrix to be displayed in the Available: pane. These will then be
··~ picked and dropped in all combinations. Variables moved to the Force exclusion: p~nc will ~cvi:r
'-ii/~ enter any of the combinations considered (e.g. you might choose to exclude a vanable wh1~h 1s
.
, ~

very highly correlated with another in the list). Those variables in the Force inclusion: pane .w1ll b.e
~i~
included in every combination (e.g. you might know that a particular environm:nta! variable zs
®
Q)~ causal for the assemblage, and therefore always want to include it when cons1denng whether
adding other v~riables improves the 'explanation').
"'
'",,""
9
~
121
11. Biota-Environment
'
~~.'

Moving to the right of the BEST dialog on the General tab, the area dealing with Rank correlation ~~/
method offers three choices: •Speannan, •Weighted Speannan or •Kendall, defined in equations ;·::I
(11.3) to (11.4) of the Methods manual. These are the measures of agreement between the two ~
resemblance matrices; e.g. biotic and abiotic, and rank correlations (p) are calculated by matching ~~-.
element to element. The logic is that if the true driving abiotic variables (say) are selected, and two
sites have very similar suites of values for these, then the assemblages will also be very similar

(and vice-versa), so the triangular matrix elements should rank in the same order. Ranks are 4>··
f"'
appropriate not only because of their central role in PRIMER, underlying a successful non-metric ~-
MDS ordination and the hypothesis testing procedures in ANOSIM and RELATE, but also because '
the two resemblance matrices may use entirely different coefficients (Euclidean and Bray-Curtis). ~
Whilst the above logic then leads one to expect a monotonic relation between their values, there is ~
no reason to expect that relationship to be linear, so ordinary correlation will be less effective. ~
There is little to choose between the three rank coefficients in practice - any c~uld be used. r:
~
The Results area on the General tab simply determines the quantity of output to the BEST results
~
window. Results detail: Brief or Nonna! for a Bio-Env run will output just th~ Max number of best
results: 10 (by default}, namely the 10 combinations ofvariables giving th.e highest p, in decreasing
order (irrespective of the number of variables in the combination). Results detail: Detailed is
preferable, initially, because it outputs the ordered decreasing values of p fo~ all variable combin-
ations, organised into groups of the same number of variables. Thus, all single variables are listed,
then pairs, then triples etc, giving in three s::olumns: the number of variablesi the matching coeff-
icient p, and then the variables used. It finishes with the best 10 (or whatev~r) overall results, as
before. (The distinction between Brief and Nonnal is used only in BVStep, where there is an extra
level of results from different random starts of the search procedure - see sect~pn 13).
The final dialog area on the General tab, headed Permutations, is deferred until the end of this
section. Moving to BIOENV tab, there is one entry, a choice of Max number of trial variables: 5
(by default). This limits the search to SS variables at a time, and this is frequeµtly increased, where
feasible. It is not set to the maximum number of variables possible because the total number of
combinations in an exhaustive search could be very large: for p variables there are 2P - 1, and a
practically realistic limit has to be about p := 17 (a hundred thousand combinations). r

The emphasis of this section is on matching subsets of environmental variables to assemblage
patterns. The number of abiotic variables is often <17 or, if not, should probably be pruned before ~··
running BEST, so only a full search (BIOENV) will be illustrated now. BVSTEP could be run in ~
much the same way on a larger set, but the reason this is likely to prove unattractive is that, with so ~,
many abiotic variables, it is inevitable that they will be strongly inter-correlated. There are th~n a
plethora of equally good solutions and a rather unfocussed interpretation. Deletion of all but one of
:;;.
a highly mutually-correlated set of variables and/or prior reduction to one representative of each ~·
different ~ of environmental variable, may be desirable, just as in multiple linear regression (see
the discussion in Chapter 11 of the Methods manual). In many of the other applications - e.g. when j:·:·
~··.
the primary matrix is of species variables and an a priori selection defeats the point of the analysis
- the stepwise form (BVSTEP tab) will be essential, and such an example is seen in section 13. ~:
(Messolongi ~·
A study of diatom assemblages (abundances of 193 species) at 17 sites in the lagoons of Messol-
diatoms & ongi, Aitoliko and Kleissova in Eastern Central Greece was undertaken by Danielidis DB ( 1991 ), ~
abiotic data) Ph.D. thesis, Univ Athens. At each site, a suite of 11 water-column data was also recorded: Temp, ~
Salinity, D02, pH, P04, Total P, NH3, N02, N03, Inorganic N and Si02• These community and ~·
·~ environmental arrays are in C:\Examples v6\Messoldi\msda and msev. This is not an impact
scenario, but an ecological study of how the diatom communities relate to water-column variables. ~
In a new workspace, File>Open the diatom abundances msda and abiotic variables msev. With ~
Analyse>Pre-treatment>Transform (overall), square-root transform the abundances, then take ~
Analyse>Resemblance>•Bray-Curtis similarity (on samples) to give Reseml, and Analyse>MDS
to give Graph!. With the abiotic variables, Analyse>Draftsman Plot shows that a log transform
Q.
~
would be desirable on P04, TotP, NH3, N02, N03, Inorganic-N and Si02 , but Temp, Sal, D02 or
pH can remain untransformed. Carry out the transform by highlighting (not selecting) the columns ~:
and using Tools>Transform (indlvidual)>(Expression: log(V)) & (¥"'Rename variables) to give ~
Data2, for which a re-run of Draftsman Plot (take also ¥"'Correlations to worksheet) shows the

122 ~·
~
~
11. Biotn-Environmcnt :

distributions have greatly ~educed ri~ht-~kewness. Two of the variables, log(P04) and Iog(TotP),
are seen to be s trongly collinear, and 1t will make sense to drop one of them in the BEST run (they
are, in effect, the same variable). You can pick out which are the strongly correlated variables by
inputting Resem2, the correlation matrix output by Draftsman Plot, to Select>Samplcs>•Values
>0.95 (and potentially repeat again with Values<-0.95, though there are none of the latter here).
This will display only those rows and columns of the resemblance matrix with a value >0.95
somewhere, just log(P04) and log(TotP) in this case. Return to Data2, the transformed environ-
mental array and normalise it with Analyse>Pre-treatment>Normalise variables, then File>
Rename Data to msevtn, which is the transformed, nonnalised environmental matrix. The among-
sample relationships, in terms of these I 0 abiotic variables, can then be seen either by Analyse>
" Rcscmblance>•Euclidean distance, and Analyse>MDS, or directly on msevtn take Analysc>PCA
(turning off the vector plot, by Graph>Special and unchecking the 'Show variable vectors' box).
As expected in this case, since both are based on Euclidean distance, the two ordination methods
give very similar 2-d plots (you may need to Graph>Flip to match them up with each other and
., the biotic MDS), but more remarkable is the near-perfect match of biotic and a biotic analyses - the
193-species diatom community is highly predictable from knowledge of these I 0 water-column
variables (the PCA'. results window or vector plot will allow interpretation of the axes) .

Diatoms: Bray-Curlis MOS of diatom assemblage MOS of waler column vanab.t.:s


fninstonn: sq;.,.. n>Ot I
Similan·ty (0 to 100) lRutmblanct: St7 Bt>y Cl.W'lls sm'brtJ ~~:=:Mt: 01 EucldundinaneJ
l O Stnu:0.00 2 0 S1ru s :O 08
12
'.!# 1 2 13 le 12
~1 . 15 16
.e!!. 46.722
! 15 8 13
5 f7
i ~3 61.876'7a':216 13 14 11
14 11 10
~~ 55.094 36.6 55.726 • 10 9
~ 5 36.628 25.693 46.874 ! 17
..._ - - 4 ~ - -·

•. 6 46.11 50.499 35.244 ' 9 8


7 5 6
~\ < 6 16

~-=-=-=-=-=-=-=-=-=-=-=;;;;:,i -~~~
f!lil-ii~ " "" u·'" 'ir.>:'IOIL!.il~
,;,_~~ Water column transformed & normailsed
Transformed water column Enwonmentel

I?~·
Temp
-~
1531 1002 lpH i~;';;i I ~
.;. 1 024664 1 ·1 08 0 .80584 ·0 • 1928 -051 065
~2 ·1.8154 ·1.1735 0.42531 ·2.0367 ·0%305
E3 -02466• • .0.78894- .o.51602 .o 55872 .0.11021
'; , , ; ·, , . ~~·o(. ,.;... . .~:4 ~4 1 5525 . .0 53326 - 0 .26508 0 11 1)56 1 S655E·2

!! , , ·~t•'' ....'•o •\if.•,• •••• ••'o ,. .: ~~ 093J99 0.30863 0 .88596 1.1814 -05 1865
~6 ·026887 ·U604 2.1678
I I 4 I ,.

6 . . ~· -1189 ·O 12J95 ..,.


a..~.. ~ .' ·· il •. ~ ." .) ..... ·'1•· ' . . \ ''!'<.·, ;..:·. J( -
~
,>- .
: ·...... .. _....:. •,
••
..·
2 ..
:: - • l.....a ~ . ,, ~-
1 .. · ·'· • ..,. ••,..
·''·. -.....>. .,."' .· ..r·:·· -i.....
.... /·
~
Weter column transformed
! ·., · \ .;· · ·-~ .. ':': · · :•... · ·..... ?.' • 'l'I Correla/ion (-1 to 1)
8 · .. .: .· 1 ·. "·• "'t • •., •• •t ~· =" • .~ ·.• • .,. I
f ,, # ~· I~ T, ..
0 0
t' ~, , , ...: • ' )' ..•jf/• 11:":7"'
r o'H_....,
c• ',.......,'';-:;
~'"';,; ""
...~
.....~.,,,!:",,,---C-£~
~("1,.;.\"'.~""•.""'.·. \ 'f.":0.--*~~
•• . t--,.lot=¥-
), t•:, ,oit-,f-,.-.-1;~'.'::'.'~·
·~~-:-..,...
··~~ · .,,.··~..,..··~-,·J...,,:
... o-.t•
"',,,..,...'::o~=''="
•'::::•,..-,;:;.::-;-;::::'~·=:-'l fa
:: oj, •' • • •'' • f #' •' ,,

::
c_ •• -, ! ' !
· ' " " .••
,""

• • ho t• •" ""' 'H• .,,,.,, •u•• ... .,, _,


<f' .~
1og(P04) 1;;i;?j 'I
E;~·~1.~1o.,;.
-
1 .,~~ ; ~ ~~,-~,- .* 1 -t~:.;..
. .... , -~·.,_.
. ...i:.-
.. ""
. , .~-1rr-1--------------., ~log(P0 4 J I
<II • , . ; , -;r,- · :i.•' -.', i' ·,1,· I' ' : . II §hog(TotP) 0 .98253
Ttmp S • I 002 pH P04 TolP llHl 110 2 1103 ln-11 I

Jn fact, the match is even better with fewer variables: · with active matrix as mscvtn, run
Analyse>BEST>(•BIOENV) & (Resemblance matrix: Reseml) & (Rcsemblance>Mcasurc• Euc l-
idean distance) & (Select variables>[move log(TotP) to the Force exclusion box]) & (Rank co rr-
elation method• Spearman) & (Results detail: Detailed), leaving out the Pennutation test, and
finally on the BIOENV tab> Max num of trial variables: I 0. The results window, BESTl, shows
that p is optimised (at 0.88), for the 5 variables, 2: Sal, 3: 00 2, 5: P04, 10: lnorg-N, I~: Si<?~ and
slowly decreases beyond that, as more variables are added. (Note that TotP is rcta 1 n~d 111 the
numbering scheme though it never features in the solutions because of the forced exclusion). The
best 3-variable solution (Sal, Pb 4, Inorg-N) does nearly as well (p = 0.84), and on grounds of
parsimony might be preferred.
123

'1-------------- - - --- --· --


11. Biota-Environment

Z Sill
3 DOZ
0Spe111mon q pH
0 BVSTEP (slepwlse search)
0 Weighled SpeatrMn
Mox run ot lrlol v oriobles-
5 l oq( POq )
Q Kend~
Fo ~ 6 l oq (To t. ?)
R~emblonce rnelrix (fixed): 7 l oo (NHJ l
!Reseml v j B loq (NOZ )
9 l oq(NO')
SM\ple d111e (active w0<k:heol) ' 10 loq ( I n - N)
Ma. run of bes! resu.'!s:
l l loq (Si OZ)
I ReserNllonce... I ~o :B
H~'!lb er or Vdri dbl e s : l
I Seled vatl&~.~ No. V!lrs
1
Corr. Se lec t. i ons
0. 753 10
Orolt
Geomeu 0.675 z
PermtAellons 0 .6 25 9
Species·
0Penr...cotlon 1es1
---------·· --------- -----··
Selection
----·-·-------~----

. . Nwnbcr or Vdridblcs: 2
No.Vars Corr. Select. i o ns
Select variables z O.BlZ 5 , 10
0.779 Z,5
FOl'co excluslon: 0. 766 3 10i"!!:3='S=?lli
Ter!l>
Sal
002
pH
~)
CltiIU!lli!
log(NH3)
log(N02) No.Vars Corr . Select.ion!!
log(N03) 10 0 .85Z 1-5 ,7-11
IOO(ln·N)
log(Si02) Best results er or Vdridbles: 5
No.Vars Corr. Select.ions . Var~ Corr. Select.ions
OK Concel 5 0.882 2,3,5, 10, ll S 0. 88Z 2, 3, 5, 10, 1l
6 0 . 872 2-5,10,ll 5 0 . 855 2 ,3,5,9,ll
6 0 . 870 2,3,5,9-11 s 0 . 85 2 3,5,9-11
7 0 .8 65 2-5,7,10 , 11 - ~ o. 0 q 9 z , q,5, 10 , 11
7 0.86'1 2-5,9-11 s 0 . 8 4 6 2- 5, 11
~ I
v

l
-

Global BEST The question of statistical significance testing on the results of the Bio-Env (or BVStep) procedure
match test naturally arises. Section 13 describes Analyse>RELATE, a non-parametric fonn of Mant~I test.
For any two independently-derived resemblance matrices, defining the relationships among the
same set of sample labels, one can use permutations to test the null hypothesis of ' no agreement in
multivariate pattern'. The measure of agreement is the rank correlation coefficient p (discussed
above) between the corresponding elements of the two triangular arrays, with p = 0 representing the
null hypothesis. The p values that it is possible to observe by chance, if the null hypothesis is real ly
true, can be generated by randomly permuting one set of sample labels relative to the other (thus
destroyi ng any real link) and recalculating p, over many random permutations. RELATE could ·'
therefore be applied to testing agreement between an assemblage and the full set of environmental
variables for the same sites (though not for some of the other linkage problems mentioned earlier,
such as between the full assemblage and a subset of conspicuous species, since the independence
c lause is violated - any subset of species will bear some relation to the full set). It is important to
realise, however, that RELATE cannot be applied to the subset of environmental variables that
result from the Bio-Env procedure: these have been selected precisely to maximise the matching
coefficient p with the assemblage similarities. Even where there is no real match, the optimum p
produced by Bio-Env will inevitably be >O. What is needed is a test which allows for this selection
bias, and this is given in PRIMER (a new feature in v6) by the global BEST matc h permutation
test, accessed at the bottom of the Analyse>BEST dialog box, on the Genera l tab. The idea is
s imple: randomly permute one set of sample labels relative to the other, then run through the fu ll
BIOENV or BVSTEP procedure to generate the best match p . Another permutation of the labe ls is
then generated and the whole BIOENV procedure repeated again, and so on (for 99 times by
default, because of the intense amount of computation, but preferably more). This produces 99
values of p in a histogram, which must represent the null hypothesis case. The real p is compared
with these, as for any of the PRIMER permutation tests: if it is larger than any of them, then the
null hypothesis can be rej ected at p< l %.

124
11. Biota-Environment

For the lagoon diatom study, re-run Analyse>BEST on the transformed and normalised abiotie
variables, active window msevtn, with fixed matrix the assemblage similarities Resem 1, as above,
only this time take (..l'Permutation test)>(Number of permutations: 999) & (..l'PJot histogram). On a
slow machine, or with a larger number of samples or variables than here, you will probably need to
reduce the number of permutations to 499, or 199, or 99. The latter is adequate if the result is clear
cut, but results in a much less smoothed histogram, and you will wish to calculate more in border-
line cases. (Remember that you can use •!4 on the Tool Bar to interrnpt an execution that is clearly
going lo take too long, or you can multi-task, carrying on with other PRIMER activities whilst this
runs in the background). It is always wise to run BEST without the permutation test, firstly, and
then you know how long 99 permutations will take (= 99 runs of BEST). On the resulting
histogram, take Graph>Special to change the bin size (see page 68) and then the More button to
get to the usual Graph Options X axis tab to change the default scaling. The real value of p is
displayed as a dotted line on the histogram. The test adds a small section to the results wi ndow,
BEST2, headed Global Test, whose format is as for the ANOSIM test, section 12. It gives the real
value of p and its % significance level, 1OOx(l +(no.of permuted p~observed p))/(I +no.of perms).
The real p (0.88) is well to the right of the upper tail of the null distribution, p<0.1 % (i.e. p<0.001 ).
Note also that the mean of the histogram is not zero but around p = 0.2. The strong selection
pressure, over a large number of combinations of variables, is able to produce an artefactual match
up to about p = 0.4 or even 0.5 in this case, though there is no question that the null hypothesis is
rejected here - such a good match of water column data to diatom assemblages as displayed on
page 123 could not be due to chance, of course .
..
MethOd
Pre·trcatmc
Resemblo nc
0 l310ENV (el eomblM!lons)

0 BVSTEP (stepwise sc0<ch)

ResemblMCc rMlrtx (ftXcd~


DIVERSE .. .
OOMOIS .. . ~,--~ Water column transformed & normalised, ma:ched to diatoms
RC.~$
llNKTREE .. . BEST
Siunple dot~ (achv~ wor~She.t)
Max nurn of best resu!
Bl
PCA ...
SJ""E!l ...
Sll"PROF ...
I Resemblonce... j ~o
Resuls de1ot
DotrlnMce I Se!ee1 vorloblcs...
Draftsman P
Geometric C
Permut~oos
Spedcs·Acc
EJ Permutation le~ (BVSTEP elweys starts with
llMTlber of perrmtatlons: no variables selce1cd)

I~ I
~--~~ nstogram
0Rho vokJes lo file O·0.1 O 0.1 0.2 0.3 0.4 0.5 0.6 0.7 O.B 09

OK Cancel

S<IJ!lple sce.tiscic (Rho): 0.662


Sic;initicance level ot sample ~t.e.tistic : 0.1'<
lNumber ot permuce.cions: 999 (Random sample)
_

Number o! permuted statist i cs o=ee.t~r the.n or e que.1 to Rho : 0 v

Linkage trees Another new technique is introduced in the area of linking sample patterns based on assemblage
- rati onale data to a sui te of environmental (or other) explanatory variables. The well-established statistical
procedure of 'class ification and regression trees (CART)' was further developed in an ecolog ical
context by De'ath G (2002, Ecology 83: 1105-111 7), termed ' multivariate regression trees (MRT)_'.
PR.IMER 6 implements a modification of this, in a form which is consistent with the .n?n-metnc
philosophy underlying the rest of the package. The connection with ' regressio~' is n:rn~mal (and
confusing) so the more descriptive term 'linkage trees' is used by PRIMER for its vanat1?n of the
procedure, accessed by running Analyse>LINKTREE. We have already seen two tech111ques for
linking assemblage patterns to abiotic variables, simple bubble plots ~age~ 83 -85! and th~ ab?vc
£3io-Env procedure. Bio-Env has the advantage of looking at abiot1c .va~ables in combmation,
' trying to identify a subset which is sufficient to 'explain' all the b1ot1c structure capable of
"
,..
~
125
. . ..
{'
~

11. Biota-Environment ~·
~·:'
~--··
explanation, and the matching procedure takes place in the full high-dimensional space. But on its
own, it falls short of full interpretation, because it does not demonstrate which variables take high ~
~'
or low values for which samples. Bubble plots give the latter but are only satisfactory where the
low-dimensional l\IDS has acceptable stress as an approximation to the full high-d biotic pattern. ~.I
Linkage trees fill this gap: they can take the subset of abiotic variables identified by Bio-Env, and
use them to describe how best the assemblage samples are split into groups, by successive binary ~)
division (in the full high-d space). Each division is characterised by a threshold on a single ~;
environmental variable, e.g. group 1 communities have Salinity<23ppt, whereas group 2 have
~c
Salinity>26ppt (and there are no samples with salinity between 23 and 26). Group 1 and 2 samples
are then each divided into two, by a different threshold on the same abiotic variable, or more likely ~1
by different abiotic variables. The end point is a divisive clustering of the biotic samples where 4~
each cluster has an interpretation in tenns of a sequence of inequalities on the environmental
variables, e.g. for the lagoon diatoms, the cluster consisting of samples 13, 14, IS has Salinity<23,
54<P04<82 and Inorg-N<965 (see below), and they are the only sites to meet .those conditions.
i''
~·:;:

Non-metric, The LINKTREE algorithm has a number of features that fit well with th~ l>RIMER approach. ~
non-linear, Firstly, each successive division of the biotic samples into two groups (wh{ch can be of unequal
non-additive
~
sizes) is chosen to maximise the ANOSIM R statistic, which measures the degree of separation of
~,-
those groups in high-d space (see Chapter 6 of the Methods manual for ANO~IM R definition). An
ANOSIM test is not carried out (that would not. be valid) but R has a usefu.l role as a non-para- ~·

metric measure of multivariate difference between groups in its own right, rather than just as a test 1\:-
statistic. Not all possible binary divisions are looked at (there are -2 16 po,sibilities for just the
initial division of the 17 sites of the lagoon diatom study!); the problem is co~strained to just those ~'
divisions that correspond to a threshold condition on an environmental variaqle' (so for 3 variables ~
there are at most 3x16=48 ways to divide 17 samples into two groups). Secoµ,dfy, the procedure is ~
truly non-metric, not just on the assemblage resemblances but also on the· abiotic variables. A
(monotonic) transform of the environmental variables can make no difference to the outcome of ~
LINKTREE, since all that is being used is bow a criterion like Inorg-N<965 or> 1380 splits up the ~·

samples (again there are no samples with Inorg-N between 965 and 1380). That division is ~·
unchanged under transformation, just becoming log(Inorg-N)<log(96S) or >log(l390) for example.
Thirdly, the way the different abiotic variables are combined in the overall partitioning of the biotic ~
samples is clearly non-linear but is also non-additive. This is in contrast with Bio-Env, which has
quite general applicability - it is non-metric and can certainly accommodate non-linear responses
4F
~
of the community to driving environmental variables - but does make an implicit assumption that

their effects are additive. For example, if high P04 was an important variable in separating lagoon
diatom communities but only in low salinity environments, with equally large variation in PO.i ~
having no effect on the biota in high salinities, then this would clearly reduce the value of the Bio- ~
Env matching coefficient p. Such interactions are one explanation for the failure to get a good
~
match (along with high 'noise' in measurement, failure to measure the right abiotic variables,
communities structured by competition not forcing external variables etc). LINKTREE does not
attempt a holistic explanation in the way that Bio-Env does, however, and is clearly capable of
demonstrating that salinity is important for internal structuring of one group of samples but not for
" ~.

~
another group (with similar salinity range). A disadvantage of LINKTREE for interpretation is
that the 'explanation' of the biotic structure is local, and thus inevitably piecemeal, and there are ~
likely to be many different abiotic conditions that will 'explain' the same division in assemblages,
unless the abiotic variable set is drastically pruned before entering the routine. An advantage is that
it is geared towards prediction, and not just interpretation: for a new site within the study area, with
"'
~
~

known environmental data, the set of inequalities may determine in which group of sites one would ~
expect the diatom community to fall. Ai\
LINK.TREE '
Continue with the lagoon diatom study in directory C:\Examples v6\Messoldi, making msev the ~
on lagoon
diatom study
active window (since the linkage tree is the same whether the data is transformed or not, it aids the ~
interpretation to use the original variable scales). Highlight the best three-variable combination
picked out by the Bio-Env run (bottom of page 123): Sal, P04, In-N, and select them with Select> ~
Highlighted. Take Analyse>LINKTREE>Resemblance matrix (fixed): Reseml, the latter being ~
~·:
the Bray-Curtis similarities for the square-root transfonns of the diatom abundances in msda. It is
this biotic matrix on which ANOSIM R statistics are calculated, for different divisions of its 17
sites, and the maximum R selected. Defaults can be taken for other entries on the LINKTREE
'*'
~

126
'
~
~
11. Biota-Environment
. 'lf

d~a~og box. Min group size: 1 and Min split size: 4 detennine that a group of size 3 or 2 will not be
d 1v1ded but that group~ ~f size 4 o~ more are allowed to be split in any way, so that splitting one
sample off from ~n existing ~r~up 1s pennissible. Min split R: 0 is unlikely to come into play _ a
.' set of samples w ill not be d1v1ded into two groups if the bes t division does not give a positive
ANOSJM R. Leave the SIMPROF test box unchecked for now; an example is given later.

Messolongi lagoons: water column data


... 100
A
BO B 12

60 2,6 c
at
CD
40 E
5.10
, ..,.,., : SIMPROF I 1,3,4 0 13·15
32 15 25 20
16
. 29:015 -·-1·i s.. r 87.7s Resembloncc matrix (fixed): 5,7,17 0 11
-··---- - --- -·u
35
-~···--
10.5
·-----·----· I 1Resem1 _3 0
25·~~ . -~:7.5 •---~-.2~ IA: R•0.72: 13%•91; 5<11<22.5(•26.2) or ln-N•158(<112)
31 .75 21.5 112.25 Mn group size: Mn spl.t R: le: R• 0.76; 9%•65, PO'<fn.5(•322)
7.375 322.25 1262.6 o J C: R•0.82; 9%•66: n.N>1 .J6EJ(<962) or Sal<6.36(•9 SJ
14 .25 62.5 645
l~.- - - .0: R•0.76; 9%•34: P04<26.tl(•53.5)
le: R•0.73; B'l!.-40; P04<13.5(•15.3)
17.S 72.5 . 70t.75 Mn split size: F: R•0.75, 9~21 ; Sol• 35.J(•44)
22.S 53.5 157 .75
- ·-- --23·------ . F ::B 0: R• 1.CIO; 9%•14; ln-N<1 01(•112)
79.75
--· - - · · -:,; if.iiiiiiii
0 - > (l,3 ,4)' ( 13-15) ~
91 . 2 R: 0.78 B< : 33.9
P04 <2 6. 8 (>53 . 5 )

£-> (8- 10) ' f


12 B->C, ( 12) .R: 0 . 73 D~: 40. 5
R: 0.76 B<: 85.3 P04<13.5( > 1S. 3)
15 6 P04<82.5(>322)
13 f->G, ( 16)
14 11 10 C- > (2 , 6) ,0 R: 0. 75 B<: 20.6
R: 0.82 B<: 66.4 Sal<35 . 3 (>44)
17 In-N>l.38£3(<962)
9
36.628 25 693 46.67 4 50.366 7 5 S<a
.. 1<8.38(>9. 5) G-> (5,7, 17), (1 1)
6 46.11 50.499 35.244 36.105
6 16
! >
...--------f R: 1 B<: 13.9
In-N<lO l (>1 12 )
1( 2

The output is a tree diagram (Graph9) and a results window (LINKTREEl). The bottom part of the
plot is a text pane which repeats the infonnation in the results window in slightly more condensed
form. The first sp lit (A) in the assemblage data is between s ites 1-4, 6, 12-15 (left hand side of
MOS plot, Graphl) and 5, 7, 8-11, 16, 17 (right hand side)- a very natural divide in the ordination
(thoLtgh remember that the procedure works in the high-dimensional space not the 2-d MOS). Thi s
has ANOSIM R = 0.72. It is characterised by low or high salinity (Salinity<22.5 to the left, and
>26.2 to the right). Note that the inequalities in the text pane and results window are always in this
order - branch to the left first, branch to the right follows, in brackets. Alternatively, the same split
of samples is obtained by choosing high or low Inorg-N (ln-N> 158 to the left and < 112 to the
right) . R is the same whichever of the two variables is used, since they give the same split of the
biotic data, of course! There is no basis to choose between the two sets of conditions (both can
'explain' that biotic division), so both are reported. Now, moving down the left of the tree, the next
split (B) divides sites 1-4, 6, 13-15 from site 12, w ith an R of 0.76, on the basis of P04 (phosphate
levels are very large at site 12). Then C splits 2, 6 from l, 3, 4, 13-15 at R = 0.82, again with two
possible explanations (and again fairly convincingly on the MDS), and so on. The process ends up
with 8 terminal nodes: 8 groups of sites, each of which has an 'explanation' in terms of a set of
abiotic inequalities. Note that the ANOSIM R values have no tendency to decline or increase as
you move down the tree - R is rescaled (similarities re-ranked) for each new subset of samples . An
absolute measure of group differences is given by B%, which uses only the ranks from the original
s imilarity matrix Reseml, and calculates the average of the between-group ranks as a % of the
largest rank in the original matrix. This is large for split A (B%=9 l %) and declines (split B at 85%,
Cat 66% etc) as groups get closer together, in absolute tenns; B% provides the y-axis for the tree.

127
11. Biota-Environment

SIMPROF Low values of B% correspond to samples which are rather close together on the M DS plot and the
test in question naturally arises as to whether these samples should be split at all - is there any evidence
LINKTREE that the biological assemblages differ between the sites 5, 7, 11, 16, 17? If not, then we should not
be searching for an environmental variable that distinguishes two subgroups within them. This is
answered by the SIMPROF test, described on pages 64-68. The test is no different than when used
with Analyse>CLUSTER: a profile of the biotic resemblances, in rank order, is co mpared w ith f
profiles from randomly permuting the species values across these 5 samples, separately for each (

species, and recalculating the resemblance profile (repeated many times). The statistic re measures
departure of the real profile from the mean of all random profiles, and this is compared with the
range of values it takes when random profiles are compared with this mean profile. A large real n
implies significance, e.g. if it is larger than all but 49 of the 999 random profiles then homogeneity
of the assemblages for those 5 sites would be rejected at the p~5 % level, and LINKTREE carries
on and splits the group. The test is carried out before every binary split and a branch stops di viding
once the test is non-significant. SIMPROF rr and p values are written to the results window.
Re-run Analyse>LJNKTREE on active sheet msev, with resemblance matri x Reseml again, thi s
time taking (..fSIMPROF test). Look at the entries on the SIMPROF tab, but you should not need
to change any of the defaults. Note that it is a test on the biotic data not the ~nvironmen tal , so the
program has to search back in the Explorer tree for the (transformed) data matrix Data 1 from which
the biotic similarity matrix Reseml was calculated, to put in (Data sheet: Datal ). It is this matrix
whose rows are permuted, and the Resemblance button should give the sensible default of Bray-
Curtis similarity on the randomly permuted arrays, because that was what was co mputed to give
Reseml. You may need to reduce the permutations for much larger data problems than this one, or
run LINKTREE without the SIMPROF option and then do a selective test 011 just one or two key
splits near the bottom of the tree, directly using Analyse>SIMPROF on Data 1 (see page 67).
The plot now displays only those divisions for which the SIMPROF test is signi ti cant, e.g. 5, 7, 11 ,
16, 17 do not differ and are not split (rt= 0.95, p<33.8%, see results LINKTREE2) but 1, 3, 4, 13-
15 do differ and are split (rr = 2.27, p<0.7%). The tree is rotated in relation to the previous one, and
this is clearly arbitrary. It can be rotated to match by clicking on a horizontal li ne in the plot, just as
with dendrograms. The text pane readjusts itself when you do this (the first inequali ty always refers
to the left hand branch) but the results are already written and cannot avoid now being inconsistent.
-
·~· ··i .,::. ~eiLl
.' , .J.1 ·n.i
CI.I t!!J A- >!:, B
Pre·treatme
Messo/ongi lagoons water column dal .Pi: 5. 67 s10 t < l : 0.1
ResernblMC 100 IR: 0.72 B<: 91 . 2
BEST ... isal>26 . 2 (<22 .5)
Oato sheet BO A 1.-2~
9 --, 1rn- N< ll2 (>156)
CASWELL.. .
DIVERSE.. . joo101 Moon:
DOMO!S.. . ~ 60 c 2.e E->(6-10), (5 ,7,ll , 16,17)
IPi:
j Re•e mblonce... I Si!Uo!ions:
1/1
a>
l. 7 7 s 10 ( " ) : 0. 3
jR: 0 .73 B< : qo . 5
Sig level(%~
1999 40 8- 10 ~.7.1 1,10,1l·,---_,__~ PO'l< 13 .S(>l5.3)
1,3 .4 D 13-1~
1 _:~.~ Is :J 20
DornnMCe P
Draftsman Pl 0
Geometric C A:. R-0.72; 6%•91; Sol>26.2(<22.5) or ln-N•11 2(•1 58)
Species-Ac ~ Coned R-0.76; 6%•85; P04>322(<82.5)
D-> ( 1,3,'I ) , ( 13-15)
~ C:R•0,82;B%•66; n-N<962(>1.38E3)orSal>9,S(<8.38 P i : 2 . 27 S10(~): D.7
, _ _ - - - - - - - - - - - 1D:R•0.78;B%•34;P04<26.8(>53.S) R: 0 .76 B\ : 3 3.9
E: R-0.73: 6%•40: P04•13.5(>15.3) PO'! <2 6. 6 ( >53 • 5 ) v

< " I >' " \_·

Missing Finally, note that LINKTREE is able to tolerate some missing data in the abiotic matrix. T he
data in localised form of its conclusions lends itself to analysing w hatever data is available locally. Try
LINKTREE taking a copy of msev with Tools>Duplicate and deleting the P04 value at s ite 13 (it wi ll be (
replaced by Missing!). Re-run LINKTREE (without SIMPROF this time) and note that it accepts
the missing data. When it tries to divide sites l, 3, 4, 13-15, a criterion using P04 can no longer be ~-
used since site 13 is missing. The division uses Sal instead and gives a s lightly different result at

.
the bottom of one of the branches. But the other main branch has no missing data and is unaffected.
PRIMER6 sidesteps missing data in this way, fully automatically, but this is not to say that missing
(-
data is unimportant - if crucial variables are missing, interpretation is inevitably compromised .

128
..
12. ANOSIM

~~ 12. 'Analysis of Similarities' and species contributions (ANOSIM, SIMPER)

,."'
~

.·:~
ANOS IM
introduction
The series of ANOSIM (analysis of similarity) tests, accessed through Analyse>ANOSIM, operate
on a resemblance matrix as the active worksheet and carry out an approximate analogue of the
standard univariate 1- and 2-way ANOVA (analysis of variance) tests. For example, they allow a
test of the null hypothesis that there are no assemblage differences between groups of samples
:"I
specified by the levels of a single factor (a 'one-way layout', e.g. of different times or treatments or
-~ sites). It is crucially important that the group specification is made prior to seeing the data·
'.~ ANOSIM is not a valid test of differences between groups generated by a cluster analysis, or othe;
,ft' inspection of the data, otherwise the argument becomes circular. ANOSIM also caters for a 'two-
.... way crossed' layout, for example testing the null hypothesis that there are no differences between
:.~ treatments (factor A), allowing for the fact that there may be site differences (factor B), in a case
:·~ where all treatments are replicated at each site. Two-way crossed designs are symmetric, so the
procedure can be reversed to give also a test for the hypothesis of 'no site differences' given that
-,~ there may be treatment differences - the routine gives both sets of tests automatically. A third
~ option ('two-way nested' layout) is when the two factors are hierarchical: perhaps a top-level factor
:-~ of treatment differences (control v polluted areas), with a second factor of different sites within
each treatment condition, and representative replicate samples from each site. A test can be carried
.~ out for significant differences between sites within a treatment, but at the next hierarchical level up,
~ the primary interest would be in testing for assemblage differences due to treatment. This compares
.f" apparent treatment differences against assemblage variation among sites, within a condition, rather
than among sample replicates, wit~in a site. (For such a study, of nematodes in the Clyde estuary,
~
see Fig. 6.6 in the methods manual.) A final possibility is a different style of test catering for the
~ special case of a 2-way crossed layout when there are no replicates (or perhaps no genuine
•~ replicates and it is wise to average the 'pseudo-replicates' for each of the two-factor combinations).
This was a separate routine, ANOSIM2, in v5 but is incorporated within ANOSIM in v6.
~
These routines are all permutation/randomisation tests, making a minimum of assumptions and
~~
consistent with the philosophy of the PRIMER routines that the primary information on relation-
~ ships between samples is summarised in the ranks of the resemblance matrix - the basis for the
·~ preferred ordination technique of non-metric MOS. The tests apply to any resemblance matrix, so
~ are equally effective at testing for assemblage change on biotic similarities, environmental change
on Euclidean or other distance matrices, change in biomarker responses, particle size distributions
pi
etc. They are also relatively easy to understand and the user is urged to read Chapter 6 of the
·~ methods manual (page 6-2 onwards) and study the options and examples there, in order to
~ appreciate the interpretation of the outcomes, most of which appear as tables in the results window.
~ I-way layout Return to the W Australian fish diet study, introduced on page 37 and for which the MDS plots
-~ (WA fish diet were seen on pages 86 to 88. The workspace C:\Examples v6\WAfdiet\WAwk should contain the
·'~ example) sample-standardised, root-transformed, Bray-Curtis similarity matrix, probably named Reseml (if
not, recreate it from the data sheet WAfd). This represents the similarity in relative gut contents of
~ 92-+ a total of 68 samples (each a pool of 5 fish) from 7 fish species, indicated by sample labels starting
~ with A to G and abbreviated species name in the factor sp,ecies. Note that the number of replicates
l~ - the pools of fish from each species - is very variablP. (from 3 to 16), and page 87 also notes the
large differences over species in the variability of diets. Assumptions of balanced replication and
·t"t the equivalent of 'homogeneity of variance' in ANOVA, are clearly not met here, but the ANOSIM
·4
~ test does not require suc·h assumptions for its validity. (Approximate balance in replication is still a
good idea because it enhances the sensitivity of the test, and comparable multivariate dispersion
.·:.~
within each group makes interpretation slightly simpler. ANOSIM here tests the hypothesis that
·p there are no differences between fish species in the composition of their diets; this null hypothesis
. pi can be rejected either because two spec~es require different food sources or because one has a much
more variable diet than the other, though they may predate on some of the same items.)
' ... ~
.-~ ....
~:11'\Z.v.-.-..
From the active window Reseml, take Analyse>ANOSIM>(Design•One way>Facto~ A: species)
& (~Pairwise tests to worksh.eet), with the other options left at their defaults (e.g. m~x1m~m.of~99
C,,~ permutations). Three windows are created. On top is the histogram of the permut.at1on distn.butt~n
('.J!fl\
., of the ANOSIM test statistic, R, under the null hypothesis that there are no differences i.n d~ct
(~-~
:,·
among the 7 fi~h species (groups). This is centred around zero - if there are no differences m diet

129

j
,.(
12. ANOSJM

then the average rank resemblance among and within groups will be much the same, and R will be
near zero. It can rise a little above (or below) zero by chance, when there are no differences
between the fish diets, but the histogram shows that it will never get larger than about R = 0.15.
The true value of R for these data is also shffi'.fil,_a t ed line, namel = 0.42 and this is
clear! much l~er than_an,y_oLthe._9.9-9.-permuted_.values,_so_th_e_nullhyp_Q_thesis is re jected at""a

-
s ignificance level of at least 1 in 1000 (p<0.001, or as the PRIMER out~prefers it: p<O. I%).
--
(fhe same information is repeated in the results window ANOSIMl under the heading Global Test,
·.

/ namely the overall observed R statistic of 0.421, its significance level (p<0.1 %), how many
permutations were computed in order to determine this (999), and how many of those permutations
gave an R value as large, or larger, than the observed R of 0.421 (none). The total number of
distinct ways of dividing the 68 sample amongst the 7 fish species (keeping the same number of
replicates for each species) is extremely large and is therefore not displayed. In other cases, w ith
r.L o,CK>J.. few replicates, this third row will give the exact number of possible permutations, and if this is less
than the specified 'Max permutations' in the ANOSIM dialog box, then R will be evalu ated for all
rL: o o .f.}."!) .~ l possible permutations. Setting Max permutations: 9999 will simply increase the significance level
here to p<O.O l % (It is clear from the histogram, Graph5, that almost irrespective of how many
permutations are chosen, a value as large as R = 0.42 will not be obtained by chance, so the
f LOlO /, significance level for the global test of 'no differences' can be made arbitrarily sma!.![I

f,:: oooJOI -·-·-· __ .,.........


~· Rcscm1
-· . .. .

Die ls of 7 nearshore fish species from W.C


- . -

O.USTER•••
1"0s .•.
Similarity (0 lo 100) I MVDISP•••
RELATE •••
06slg\

~ 2STAGE•••
0one way
,W (], A1
0 Two way crossed (with replieat~s)
·~ 0 Two way cro5'ed (no replicate•)
0 Two way nested (B wi1hi1 A) ,,,
54.801
rt ...·
i31!3 72.917 I
Factor A:. species .____,::'] • I A Levels ... I
)
Diets of 7 nearshore fish species from WA
216 species Tes/
Max perrr>.lations: ~.rwisc lests lo workshee1
999 I 01ot tlstogrcm
OR val.Jes to fie

Globod Te,;t
Samp le ~tati~tic (Global R): 0 . 421
S1gnit1cance level ot ~ample ~tati~tic: 0.1'<
NW!'l:ler ot permutat ion~: 999 (Random sa.~ple trom a large nwnber)
NW!'l:ler ot permuced scatist1cs ~reater than or equal co Gl ob a l R: 0 v

< >

0.2 0.3 0.4 0.5


R

\... -
Pairwise The tab le that follows in the results window gives the pairwise comparisons. For each pair of fish
compansons species (groups), the first data column is of pairwise R statistics. These are again a difference of
average rank dissimilarities between and within the two groups, scaled so that R varies between
roughly 0: there are no differences, and 1: all dissimilarities between gut contents of different fi sh c
species arc larger than any dissimilarity among samples within either species. The second column
gives the statistical significance for a test of R = 0 (again a percentage note, so that p<O. I% means
< 1 in 1000 chance). The number of possible permutations follows, then the number actually
computed (999 in most cases because the actual number is usually much larger than this). The final
column gives the number of R values from the permutations that exceed (or equal) the real R in the
first column, from which the significance in column 2 is calculated. [There needs to be a slight
difference in this computation depending on whether all possible permutations are evaluated: thus
row 1: A. ogilbyi v S. schomb., R = 0.763 , p< lOO( l+0)/(1+999) = 0.1 %, whereas row 10: S.
schomb. v S. robust., R = 0.951, p = 100(1/286) = 0.3%. The second is clearest: the observed value
of 0.95 1 is the most extreme in 286 permutations and thus has probability I in 286 of occurring by

130
12. ANOSIM

chance. In the first case, we do not observe the real value of 0.763 in our randomly chosen set of
999 permutations, but that does not make the probability p = 01999 = 0. We know there exists one
permutation which would give Rat least 0.763 - the real configuration - and we have looked at
I 000 permutations overall (the 999 random plus the real one) so the probability is< 1 in 1000.]
Interpreting these pairwise tables must be done with care. The s ignificance level is very dependent
on the number of replicates in the comparison. For example, row 4: A. ogilbyi v S. bassen., p<0.2%
(your value may differ slightly because each time the program is run a different set of random of
permutations is generated). This appears highly significant, but the R value is negligibly small, at
0.206. The tesuells us that these two species probably do not have exactly the same diet (the_
11,Y-.p_.u.th.esis R = 0 can_b~jected)_but_the..R v~lue-tells_us thaUbe_cbets_are_str.ongly_overla ing and
baLCly_diff.er_(E. is close to zero). This can happen, just as in ordinary univariate statistics, because
,.
the number of replicates is large for the two groups, giving 145 million possible permutations -
biologically trivial differences can still be statistically significant when 'power' is large. In total
contrast, row 17: P. jenynsii v S. robust., p<2.9%,,still s ignificant but only just (at the 5% leve l),
has an observed R of 1.0, the largest possible value, showing completely different di ets. Such a
large value of R does not give a small value of p because there are only 35 poss ible permu tations
(few replicates in both groups). Which is therefore the most useful column to interpret? It has to be
the R_ v~s rather than the £. values. R is largely not a functi on of the number of rep li cates (i.e.
possmlepe rmutations) but anabsolute measure of differences between two (or more) groups in the
high-dimensional space of the data, whereas p is always hijacked by the sample size. It is for this
reason that PRIMER does not implement a Bonferroni-type correction on its pairw ise s ignificance
levels - it gives an illus ion of certitude which is not justified. The global test of .illlY differences
between groups is important: if the null hypothesis is not rej ected then the user has no licence to
even look at the pairwise comparisons. Ifthe global test strongly suggests that there are differences
worth examining, the large pairwise R values indicate where the major differences are to be found.
- - - - - - - - - · · ---------- · - - - - - -·-------------------:.-i-..- - -- - - - - -
t';.ANDslMf' ,._, ·.. ·.. 1·: ·
~ .....~ . .. ...... ~,. ,' :~.1:.J ... ..,.....·..
=· ~·
· ',,' ·- · · ·
• .,,,__..~.· - ·-~-·-·:.. ' _
• · • ·•
... : -.·':...-
• ·'"' .~·. ·: ... '' •••
•:'J.;•..ii;•O
',,.; '••'-t
, • .,t\1:1•.1• ..... ; •. ,~io"
,;lt!'.l'li11:1;i1"3!l
t:i.Jl:=:!J~
P4i.t:wis c Tc" ts
R Signiticance Possible Act.ual Numbe r >•
--"'
Groups St.a t. ist.ic:: Level % Permut.at.ions Permut.ation3 Observed
IA.OQilbyi, S . sc::homb. o. 763 0. 1 5311735 999 0
A.ogilbyi, A. e long at.. 0.698 0.1 20349 999 0
A. ogilby i , P . je n ynsii o. 762 0 .1 48 45 999 0
A. ogilbyi, S. b assen. 0 . 206 0.2 145422675 999 1
. A. o g ilbyi, S. r obust. • -0 . 162 73.l 969 969 708
IA.ogilbyi, S .vit.to.t.o. o. 463 0.1 300540195 999 0
s . schomb., A. e longat . 0 . 409 3003 999 9
1S . ~chotnb . , P . jenynsii 0. 7 4 5 0.1 1001 999 0
!s . sc homb., S . bo.sse n . 0 . 5 02 0.1 1961256 999 0
js. sc homb., s. robust.. 0.951 0.3 286 286 1
s.schomb.' s . vit.tata 0.241 0.5 531173 5 999 4
1A. e l onqe.c ., P. jenyns ii 0.919 o.8 12 6 12 6 1
IA. e l onqae . , S . bassen . 0.637 0.1 11 628 999 0
A. e longa t. . , S.robU!lt. l. 8 56 56 1
A. e longat. . , S .vi t.t.ata 0,271 l. 6 203 49 999 15
!P. je n ynsii , S ,bassen . 0. 6 1 6 0.1 3 060 999 0
IP.Jenynsii, s . robust . l 2.9 35 35 1
P . je nynsii, S .vit tat.a 0.4 59 o. 3 4845 999 2
S .bas::sen., S. robust. . - 0.08 1 64 .4 680 680 138
IS . bassen., s.vittata o. 12 5 3 115422675 999 29
S . r obust.., S . vit.t.at.a 0.387 0.6 9 69 969 6

Other 1-way Checking the (/Pairwise tests to worksheet) box has also sent the above R values to the worksheet
ANOS IM Resem2 in triangular format, which could be a useful layout for tabulating ANOSIM results in a
options publication. More subtly, Resem2 can be regarded as a resemblance matrix (of di stance-type) in its
own rig ht - the higher the value of R the greater the separation of replicates from two groups in the
high-dimensional (prey) space. Inputting this to an MOS plot will display the relationships between
these 7 groups, and can be seen as a type of ' means plot'. An alternative 'simple ' means plot is to
take the average across replicates before transforming, calculating Bray-Curtis between these mean
diet samples and then ordinating by MOS. (But there are also two other poss ibilities for a direct
means plo t! Transform the data first and then average before entering into the multivariate analysis,
'or average the dissimilarities rather than the data. These mean dissimilarities between groups can
be extracted from the SIMPER results w indow - see below - and manually entered into a triangular
matrix.) The result wi ll be 'means plots' with slightly different emphases: in the case of .the m.atrix
of R values this highlights group separations, i.e. adjusting differences by within-group d 1spers1on.
131
12.ANOSIM

A. vt
rf S.ro~ust. S.bassen.
S.:chomb. 0.76269
'i' • S.vittal a

A .elongoe. 0.69788 0.40909
0.76166 Aogilbyi S.schomb.
P,Jenynsl
S.bossen. 0.20603i 0.61635

S1~ . -0.16226 1 11
S.vctll!o 0.46331l o.4s933f
< Cl
A elongal.
I

Other options within the ANOSIM routines include the ability to manipulate the histogram for the
global.R statistic by rescaling axes, titles etc (Graph>Data labels & symbols menu) and changing
bin widths (Graph>Special menu), as for any other histogram plot in v6 (see page 68). There is
also a check box in the ANO SIM dialog to send ( ,/'R values to file) . You will be prompted to
supply a .txt file name which will hold a simple list (one number to a line, in simple ASCII tex t) of
the R values for the 999 (or however many) permutations carried out for the global test. This would
allow the null distribution histogram to be replotted in some other statistical or graphical softwa re .
Note that both the plotted histogram and the listed R values refer only to the globa l test of no
differences among any of the groups. If you require a histogram for a specifi~ pairwise comparison
then you will need to select out that pair of groups and re-run ANOSIM, sele~ting either externally,
by Select>Samples on the original resemblance matrix, or internally, using tpe A Levels button on
the ANOSIM dialog box. Both lead to the usual 'levels selection' choices, see pages 91 , 92. For a
pairwise test, it makes no difference to the R value (or its significance) whether the results are read
from the above pairwise table or recalculated with just those groups selected,' so this would only be
useful: a) if you required the pairwise histogram,. orb) a test for a specific subset of three grou ps,
four groups etc was needed. For example, a relevant.a priori hypothesis concerns whether there arc
detectable dietary differences between the three congeneric Sil/ago fish species (schomburgkii,
bassensis and vittata). After testing this, save and close the workspace WAwk.

Select levels for factor A


Avr loble Include

1<.:I ~t!::i:~;::a\,.•••!',

1
flnsil:e:

~I Tw o w oy crossed (no repl


~ !- l!m-=-1 More ...

I -··--- [:] IS.bossen.


~iZ'ttYM w Cancel Help
1
I
[ >C! 'IJ"

00ne w oy
B 1•~0 ~~G I
- h6 . • •. . .... ~" ·.. ·, IF'l~E'l
....:
... rap ~ ' .·. . :~ • . . . . . . ·~l. .,t~~:l"',..:,.-i ~L!:::!J~
EJ
r •• ; - ' : . . ... • •• • •

0 Two way crossed (wih repli Diets of three Si/Iago species


0 Tw o w ay cro:sed (no repic Null disln"bulion
OK Cancel 445
0 Two w ay nested (B wilWl A)

>.
u
.,c
.,:>
CT

u: t...-

Sample statistic (Glo bal R): 0.2 6 1

.
S ioniticance l evel ot sample statistic: 0.2;
Nwrber ot permut.at.ions : 999 (Random samp l e tro m a laroe nlllt'b er ) 0 0.05 0.10 0.15 0.20 0.25 0.
Nwrl:>er ot permut.e ci s t.at. i st.ics oreater t.han or equal t.o Global R : l :-'._ . __ _ _ _R_ _ _ _ _ _ _ __.
(- <' -· - . ,- . I > ..

132
12. ANOSIM

J-way layout ANOSIM applies equally well to data on environmental, biomarker or morphometric variables,
(Biomarkers which might be transformable to approximate normality; it is then a robust alternative to standard
examp le) multivaria te (MANOVA) tests such as Wilks' lambda. The inevitable slight loss in power of the
non-parametric test, if the data really were multivariate normal (and with few variables in relation
to samples) is more th an compensated for by. its robustness, general applicability, and Jack of
subs idia1y assumptions such as constant variance-covariance structures.
11 8 ·-> Re-open the biomarker workspace brbmwk. The suite of 11 biomarkers (transformed, normalised),
measured for 10 independent pools of fish tissues from 5 N Sea sites, was earlier subjected to PCA.
This was calculated directly from the normalised datasheet, Data3 (derived from brbm) so an
appropriate resemblance matrix to input to ANOSIM has not yet been generated. Do thi s on Data3
with Analyse>Rcscmblance>(Analyse between•Samples) & (Measure•Et1clidcan distance) and
input the resulting Resem4 into Analyse>ANOSIM. The resu lts window, ANO SIM I, bears out the
clear pattern of significant, and generally large, separation of the biomarker responses at all sites
S3, SS, S6, S7 and S9, seen in the PCA plots of page 118. Save and close the workspace brbmwk.

Imm -
CLUSTER...
r-os...
I MVOISP... ~·ioo
RELATE.. .
00newoy
~vo r:'';' r:r~s~o;-·.'(wr.·11~•. 1, , >:~·: 1
- :; ~·~ '"::1t:• ·:..· ~· .... ,.,_

Foctor A:. ,s;;-- --··--···· j A LevclL .

.Gl ob41 T~s t


isrunple ~tatist ic (Globa l R) : 0.829 otions: 0Poirwtse tes1s to wor ks heet
S i qniticance leve l o! sample statistic: 0.1% __J 0 Plot his1ogrorn
!Number
I
o! permutations: 999 (Random erunp le !rom a l aroe number)
iNuh\ber o! p ermuted s tatistics qreater than or equal to Gl oba l R: 0 0 R v~luos to lie
I iiiir;;<:~=
- ;:;=ci1::;::;::;:~31
~
. Pcri.c:wi!i e Tes ts
! R Siqni!icance Possible Actual Number >• Biomarkers normallseo·
'Groups Statistic Level % Permutations Permutations Observed S ite Tes:
219
~3 I 5 o. 726 0.1 92378 999 0
3, 6 0.986 O.l 92378 999 0
' 3, 7 0.998 O.l 92378 999 0
.3,
'5,
5,
9
6
7
0 .991
o. 397
o. 962
0.1
0.1
0.1
92378
92378
92378
999
999
999
0
0
0
i..
.::
5, 9 a.ea o. 1 92378 999 0
6, 7 0.765 0. 1 92378 999 0
i6, 9 0 . 95 0.1 92378 999 0
II <-- J1, 9 0.718 0. 1 92378 999 0
0 . I 0.2 0 .3 0
R
O 0 .~ 0 0 0.7 0 .8 0 .

2-way crossed An example of a 2-way crossed layout was introduced on page 24, for meiofaunal communities in
ANOSIM sediment patches either disturbed or undisturbed by soldier crabs ('treatments' D/U, factor A),
(Tasmanian across fou r areas of Tasmanian sandflat ('blocks ' 1-4, factor B), with two replicates for each of the
crabs study) 8 comb inations. The setting up of these factors was described in section 2 (pages 29 to 36) and the
workspace required was last saved on page 94, as tawk in directory C:\Examples v6\Tasmania. [If
94 -> this does not exist, read in tana(.pri) to a new workspace, look at the two factors Treatment and
Block (Edit> Factors) so. you understand the sample structure, take a fourth-root transform of tana
(Analysc>Pre-treatment>Transform (overall)), and create a Bray-Curtis similarity Rcsem l (with
Analyse>Rcscmblance).] Run Reseml through Analyse>MDS (first, if necessary, clearing any
selections or highlights left in place in the earlier workspace), and note the way the samples split
fairly convincingly between the effects· of the different areas of sandflat (blocks, roughly across the
page) and disturbed or undisturbed (treatments, roughly up the page). There are so few replicates,
.~ however, that this is by no means clear-cut and needs testing. Also the stress is quite high (about
0.1 6) so the picture may be misleading, and the test needs to be in the full-dimensiona l space (as
ANOSIM tests a lways are). Two-way crossed ANOSIM is carried out for the null hypotheses of
'no treatment effect', allowing for the fact that there may be differences between blocks, and also
(symmetrically) for 'no block effect' allowing for the fact that there may be treatment effects. For
133
12. ANOSIM

details of the test construction see Chapter 6 of the methods manual which analyses the same study
(Fig. 6.7) but for the full meiofaunal data (nematodes and copepods combined), for which the
outcome is even clearer. Here, with the similarities from the nematode data, Resem l, as the active
sheet, take Analyse>ANOSIM>Design•Two way crossed (with replicates)>(Factor A: Treatment)
& (Factor B: Block). Again, note that we could have restricted the analys is to use onl y some of the
levels of e ither factor, with the A Levels or B Levels buttons, though this is not appropriate here.
Graph3 , for the test of treatment effect, shows a typical null distribution for the ANOSIM test
statistic (in this case, the average of the one-way ANOSIM R statistics for comparing treatments
within each block) when there are few replicates. With only 81 distinct permutations permitted
(these correspond to all ways of simultaneously exchanging the four replicates within each block),
the range of values that the statistic can take when there is no treatment effect is not a t all smooth.
This demonstrates why the null permutation distribution has to be recreated for each new data set
and cannot rely on standard tables or distributional forms. Nonetheless, the results in ANOSINfl
show that the observed statistic for testing treatment differences (global R ::::: 0.813) is the largest
obtainable for the 8 1 permutations, so gives significance level of 1 in 8 1 (p ::::: 1.2%). This would
normally be consiOered sufficient to cast doubt on the null hypothesis of 'no treatment effect'. If
(.
there were more than two treatments, the global test would be followed by pairwise comparison of
treatments, exactly as for the I-way ANOSIM output.

1
• •
1
..
2
4

20 Stttu : 0 .10

3

.,..:.wnt
• 0


2 3
v

v
<;]
3.
v v
• v 3
v v

Tasmanian nematodes
t
Treatmen/ T~s:
Oe:ign 20

00noway
~wo way crossed (wth repica1es)
c,.Jit,o way crossed (no repica1cs)
0 Two way nes1ed (B wlhin A)

Factor~ .jTrca1rnent v ] [ AL eves


I ... [

FactorB: ~
j B_
loc_k_ _ _ _v~J I BLcvcls ... I 0
.0.0 ·0.0 .0.4 .0.3 .0.2 .0.1 0 0 .1 0.2 0.3 0 .4 0 .6 0 .0 0 .7 0 .8 O.Q
Max penoota1ions: 0Paitwboles1stowol~Mililllll1L--------.::R~------J
1
c
TESTS FOR DIFFEREHC!S BETWEEN Treatmen t GROUPS
1999 I 0 Plot l'isl(9'om (across llll Bloc~ groups)
OR YM.los to tio Global Test
~------------l!Se.mpll!! s t a ti:stic (Global R) : 0.813
Siqnit icancl!! ll!!Vl!!l ot :sample :stati:stic : 1 . 2 •
OK Cancel Number ot permutation:s: Bl (All possible permutatio ns )
- - - - - - - -- - -- -1Number ot permuted statistics qrl!!atl!!r than or equal to Globa l R : 1
'

The second plot, Graph4, is much smoother because there are 11025 possib le permutations of the
replicates across blocks within each treatment, corresponding to the null hypothesi s of no block
~ffect~ and a random subset of 999 of them have been evaluated. The observed average R of 0. 854
is aga m the most extreme in the I 000 permutations evaluated, giving significance level < 1 in 1000
(p<O. l %). There is little merit in then considering the pairwise tests of Blockl v Bloc k2, Block I v
Block3 etc. The key thing to have established is that there are natural changes in the nematode
assemblage across the sandflat, so that removing this block factor from the test for trea tments was
worthwhile (the MDS shows that a I-way design in which block-to-block changes become pan of
134
12. ANOSIM

the replicate variability would largely fail to pull out the treatment effect). Individual block
differences are not of interest, but if they were, the Pairwise Tests table in ANOSIMl shows that
all blocks are well separated from each other (all R values large). Note that none of these pairwise
comparisons have enough replicates (and thus permutations) to allow a sensible significance test. In
all cases the observed configuration is the most extreme permutation, best separating the two
blocks, but with only 9 permutations, that gives significance of only p = 11 . 1%. It is not logical to
conclude that there are no differences between any pair of blocks when the global test has just
shown that there are massive and highly significant differences amongst all four blocks! As with
the discussion on 1-way ANOSIM, attention needs to focus mainly on demonstrating significance
of the global R (otherwise pairwise comparisons should not be looked at) and then the pairwise R
values themselves to ascribe the main differences found by the global test (here, between all pairs).

(tJcross iJll TretJtment group$) Tasmanian nematodes


GlobiJl Test Block Tesr
Sample statistic (Global R): 0.854 !
SiQnificance level of sample statistic: 0.11 i
Nwnber of permutations: 999 (Random sample from 1 1025) I
!Nwnber of permuted statistics Qreater than or equal t.o Global R: O !
I!
R SiQnif icance Possible Actual Number >· !
I
Groups St.at.istic Level 1 Permutation.. Permutations Observed !
1, 2 0.75 11.1 9 9 1 !
.1, 3 1 1 1. 1 9 9 1 !
I
11 , 4
2, 3
1 11. 1
11. l
9
g
9
9
l
- . 1-: I
! I
l - s ·O • ·O.l -0.2 ·0. 1 0 0.1 0.2 O.J 0 .4 O.l o.a 0.7 0 8 0.9
12, 4 l 11. l 9 9 R
141 <- 13, 0.625 1 1. l 9 g 1
"

2-way crossed For an example of a 2-way crossed ANOSIM test in a very different context, return to the particle
ANOSIM for size distributions from Danish sediments introduced on page 42 (workspace sedwk in (:\Examples
Danish v6\Scdiment; if not available, re-run the steps at the bottom of page 42). A distance measure such
sediment data as Euclidean (or Manhattan or Maximum distance etc, see page 57) is appropriate for defining the
resemblance between two distribution curves. Apply this to the un transformed, cumulated data
42 - > sheet, Data I, producing Euclidean distances Reseml for the 30 sediment samples, which arc from
3 sites (A, B, C) and 2 water-colunu1 depths (2m and Sm). Analyse>ANOSIM on Rcseml , taking
the 2-way crossed option (with replicates) for the site and depth factors , gives perfect separation of
the depths (global R = l , p<O. l %) and strong separation of the sites (global R = 0.80, p<O. l %). The
..
I

very large number of possible permutations means that these p values could be made almost
arbitrarily small, as is clear from the null distribution histograms. Pairwise site tests show,
however, that B and C are not well separated (R = 0.36, though still the most extreme of the 1000
·permutations considered from the full set of 15876, thus p<O. l % again), as is also seen in the MOS .
.,
jTE5TS FOR DIFFERiNcis BETWEEN site GROUPS
j (across iJll depth groups)
!Gl ObiJl Te:; t
1samp l e stati st ic (Gl obal RI: 0 . 803
:S i Qnificance l eve l o! samp l e stat i stic: 0.1%
NU!l'l:ier of permutations : 999 (Random samp l e from a larQe nurnbe ~
1Nurrber o! permuted statistics Qreater than or equa l to Global i>"
"
Pairwise Tes ts
R SiQnificance Possib l e
Groups Statistic Leve l % Permutations Perf·~--
l.1., B 0.962 0.1 15876
I"· c l 0.1 15876
1B, C 0.364 0.1 15876
288

TESTS FOR DIFFERENCES BE~EN depth GROUPS


,I ( tJcro:;:; iJl l site groups)
GlobiJl Test
,:..., Samp le statistic (Global RI : 1
'
. ~
SiQni!ica nc e level o! samp le statist i c: 0.1%
Number of permutations: 999 (Random samp l e from
INumber o ! permuted statistics Qreater t han or e

t'~ 135
12. ANOSIM

2-way nested Subtidal rocky reefs, at c 1Om depth, at the Calafuria station in the Ligurian Sea, N Italy, were the
ANOS IM subject of a clearance and recovery experiment by Airoldi L 2000 Mar Ecol Progr Ser 195: 81-92
(Calafuria (see also Clarke KR, Somerfield PJ, Airoldi L, Warwick RM 2006 J exp mar Biol Ecol; both
macroalgae) sources analyse a wider set of data than considered here). For 8 different times between October
1995 and September 1996 (factor A: 'treatment' with levels 1 to 8), rock patches were cleared
from three randomly chosen areas (factor B: 'area' with levels 1 to 24, the 3 areas differing for
each 'treatment', naturally). Three randomly chosen plots from each area (the replication level)
were then examined at the end of one year of recolonisation, and % cover recorded of 9 macroalgal
taxonomic categories. The design is therefore a 2-way nested layout, with factor A at the top level
and B 'nested' within A (replicates can be thought of as 'nested' within B). Note that when
defining the factor levels it does not matter to PRIMER whether the area levels are coded 1 to 24,
or 1to3 repeatedly for each of the 8 treatments. For the latter, when 2-way nested is selected under
Analyse>ANOSIM, and 'area' is specified as the nested factor, the routine wjll know that there is
nothing in common between area 1 in treatment 1 and area 1 in treatment 2. Bµt it ~ay help vou to
code the area levels as 1 to 24, because the fact that area is nested not crosseiJ with treatment will
then be clear. (To be crossed, all levels of factor A must occur in combinatipn with all levels of
factor B- see the discussion of2-way ANOSIM on page 6-7 onwards in the mrthods manual).
Open caac from C:\Examples v6\Calafur and examine its factors with Edit>Factors. A strong
transfonn is necessary to prevent the 'Algal turr category from comple~ely dominating, so
transfonn with fourth root and calculate Bray-Curtis similarities between saQiples, Resem 1. The
primary interest is in whether there are differences in recolonised macroalga1···communities a year
after clearance, depending on the time of year at which the clearance took pl{'ce, i.e. a test of the
treatment factor. But it is important to choose the correct replication level for *is test. At worst (or
some would argue, always) this is the level of variability inunediately belo\V 'treatment' in the
hierarchy, namely the areas not the plots, which are a level further down. Sp, one possibility is
simply to average the three plots within each area and carry out one-way ANOSIM on the
treatments (with 3 'replicate' areas per treatment). But what if there is absolutely no 'area effect',
i.e. plots in different areas are no more dissimilar from each other than plots in the same area? Tpen
it would seem reasonable to take 'plots' as the replication level for testing treatment effects, and:the
much greater number of replicates will improve the sensitivity of that test.
On Reseml, Analyse>ANOSIM>Design•2-way nested (B within A)>(Factor A: Trt) & (Factor B:
Area) carries out a sequence of two tests. Firstly, it tests the null hypothesis that there is no area
effect. Nothing is assumed about treatment effects; these may or may not be present but need tQ be
removed, in exactly the same way as for the 2-way crossed ANOSIM (R values contrasting among-
and within-area rank dissimilarities are calculated separately for each treatment and averaged;
permutations are constrained to shuffle labels between plots only over areas within a treatment, not
across treatments, etc). Secondly, the routine always presumes that an area effect is present, so tests
the treatment effect by averaging the plots within areas, thus using areas as the replication level for
testing treatments, by a 1-way ANOSIM. (In fact, the averaging is done on the rank dissimilarity
matrix, which is then re-ranked for the I-way ANOSIM - see the methods manual). If there is
demonstrably no area effect, so that the test can use all 9 plots as replicates for each treatment, then
this needs a separate run of I-way ANOSIM, simply ignoring the area factor.
Here the 2-way test for areas gives average R = -0.01 (p<56%), and this near-zero R implies
absolutely no suggestion of an area effect, potentially making the 2-way test for treatments
(averaging up to area level) unnecessarily conservative. It still gives a strongly significant global R
of 0.60 but the conservatism is seen in the pairwise table, where all the comparisons are based on
· only 10 permutations (3 areas for each treatment). If, as is justified here, we ignore the area effect,
the 1-way ANOSIM (with 9 plots for each treatment) gives a pairwise table with 24,310 permut-
ations for each comparison and some clear inferences on seasonal effects of time of clearance. E.g.
T7 and T8 differ the most strongly from other times, with most R values in excess of 0.8, whereas
pairs not involving these two times generally give R<0.4. Similar conclusions are apparent from the
MOS of the 72 plots, which has a low stress (0.09), allowing confident visual interpretation. If the
initial test for area had given R>O however, and certainly if it had been significantly so, on what
will usually be a powerful test (= many permutations), then it would not be justifiable to ignore the
area effect and use plots as replicates: this would be non-conservative (pseudo-rrplication).

136
12. ANOSIM

L~~t-· Trt- ·- A~~- p-.;- ~


T1AIP1 T1 Al P1
T1A1P2 T1 1 ~:2
T1A1P3 T1' '""ii.:1"''?3'
T1A2P4 T1 IA2 P4
----~ - - ---iT1--:;;2·- ;,5- -
T1A2PS
T1A2P6 1-1- ·;i--·-?6"
.- ---...., _____,T1-:A3--p7-
Delete ... T1A3P7
T1A3P8
T1'"- ...3---...P8_
De~ign Key ...
-----,
- ----,T1
T1A3?9
T2
A3
A4
pg
PI O
00neway T2A4P10
Import ...
- -- - - t
T2A4P11 T2 A4 Pl I
0 Two way crossed (wtth replicates) -----t
A4 P12
0 Two way crossed (no repllcates) P13
~wo way nested (8 wtthln A)

Fac1orA: !Trt ,v j

TESTS FOR DIFFERENCES BE7TYEEN Area GROUPS


(across all Trt groups)
Global Test
Sample :itati:itic (Global R): - 0.009
Signi ticance l evel ot srunple statistic : 56.5<
Number ot permutat i ons: 999 (Random sample trom a large number) 0

Number o! permuted sto.tistics greater than or equal to Global R: 564

TESTS FOR DIFFERENCES BE7TYEEN Trt GROUPS


(using Area groups dS samples)
Global Test
Sample stat ist ic (Globo.l R): 0.604
S igni!ic ance level o t so.mple statistic: 0.1<!
Number ot pe rmutations: 9 9 9 (Random sample trom a large number)
Mlllt'~ e r ot permut ed statistics greater than or equal to Global R: 0

Pdi.cwise Tes ts
R Signitico.nce Pos sible Actual Numbe r >•
Groups Scatistic Leve l ~ Permutations Permutations Observ"d
Tl, T2 -0.037 60 10 10 6
Tl, T3 0. 4'1'1 20 10 10
Tl,
Tl,
Tl,
Tl,
Tl,
T4
TS
T6
T7
TB
0 .907
0.778
0.333
1
1
10
10
20
10
10
10
10
10
10
10
10
10
10
10
10
1
1
0
1....
0.1 02
R
0 .3 '
0 .4
I
!:
i
o.s 0.6
I
0.7

aiil ~
SMPI• otati•t1C (Globa l It): o.~S7
Si9nU:1oance level o :t aample stat1:1cic: 0.1\
Nurrber ot permu.tat1ona: 999 (Rando:n sample tr
! Oener;i'[ Two wa'! cross Nwrbe:r ot pecrriut.ed :stati:st i c:s Q'Ceatcr than or

P.drwl..se 7'e.$t.s
D.:,;cn I
It s ion 1t icance Po :s:sible
GC'OUp!I Stati!Stic Level Ir Pecrnutat ion:s
~e woy ,Tl , T2 -0 . 0H 17.J 24310
Ti, T3 0.106 6.S 24310
(.~wo way crossed n, T4 o. 423 o.s 24310
0 Two woy crossed ~ Tl,
Ti,
TS
T6
0.297
0. 2 3 6
1. 1
1. 7
24310
24310
0 Two w ay nested ( Ti, Tl 0 .665 0'1 24310
Tl, T8 1 0. 1 24310
T2, TJ 0.017 JJ.6 24310
T2 , T1 0.215 4.7 2431 0
TZ, TS 0.15 0.2 2431 0
~ •. ~ ' ~-= T2 , T6 0.1 5 1 7 .2 243 10
T2, Tl 0. 6 33 0 .1 2431 0
T2, TB 0 . 99 4 0 .1 2 431 0 .n o
/.
n t

.•.."
Mox permutlllrons: Tl , 0 .126 6 .0 2 4310 T r.
\ x *
·-
TJ, TS o . I S9 4.6 24310

·~
999 T3, T6 0 .199 2 .9 2 431 0
x '!f.t.
TJ , Tl 0. 8 57 0 .1 2431 0 • ;
.Tl ,
T~,
TS
TB 0 .999
0 . 2'1 1
0 .1
0 .6
2 43 10
24310 -"· / * , r.
:tn
T4 , T 6 0 . 273 0 .7 2431 0
OK c T1, Tl
T4, TO
TS, T6
O.O?B

0 . 0 13
1
0 .1
0.1
33 .s
2431 0
2431 0
24 310
·+ T 't
. .. ~
v. ** * *
TS, Tl
TS, TO
0.176
0.93S
0 .1
0 .1
24310
2'1310
"
.T6, Tl 0.207 1.5 2 4310 x
T6, TO o. 7 0 2 0 .1 2 4310
• <- y

137
12.ANOSIM

ANOS IM Returning to the 2-way crossed layout, a situation commonly met is when there is only one ~··

test for the observation for each combination of the two factors, e.g. one sample for each time at each site, or
unreplicated one replicate in each block for each treatment. A treatmentsxblocks example is given in the
2-way layout methods manual, Fig. 6.10, but the example shown here is of a sitesxtimes study, from Fig. 6.12.
The Exe estuary nematode data was introduced on page 57, and used to demonstrate clustering and
80 -7 MDS, but you should open here a different file from that study (in a new workspace) of bi-monthly
assemblage samples from sites 12 to 19 only: exnabi(.pri) under directory C:\Examples v6\Exe. (It ·· '
is these 6 seasonal samples, covering one year, which were averaged for the previous dataset exna.) ~I
Due to the state of weather, tides etc, a small number of sites were not sampled at some times of the
year (4 sitextime combinations are missing, leaving 44 rather than 6x8 = 48 samples). As before , \.. . .-:
pre-treat this data with a fourth-root transform and compute Bray-Curtis similarities between r, ..·.t
samples, Resem 1. The interest here is in whether or not there are demonstrable differences between ,-'
community composition at sites 12 to 19 and what, if any, seasonal effects are present. There are
two factors, Site and Time, and the layout is 2-way crossed, because the same set of sites are
returned to at each time, but there is only one replicate sample for each sitextirne combination (at
most) so the usual 2-way crossed ANOSIM option will not work. If you try runni ng it on Resem I
the results window says ' Groups too small' for both tests - there are no repl icates to permute!
Analyse>ANOSIM>Design•Two way crossed (no replicates)>(Factor A: Site) & (Factor B: Time)
runs a different style of permutation procedure, testing for a site effect by asking whether there is
evidence for commonality of the among-site pattern across the different ti mes. For example, if the
MDS plots of sites, displayed separately for each time (Fig. 6.12 of the methods manual) show the
sites grouping in the same way, that must imply there are site differences. To put it the other way
round, under the null hypothesis that there are no site differences, the separate fy1DS plots for each
time will have no common pattern and look like random rearrangements of each other. In fact the
test operates, as with other ANOSIM tests, on the underlying resemb lance matrix (ranks) rather
than the MDS plots, and calculates an average of all pairwise correlations (Pav) between the among-
~
s ite resemblance matrices for each time; Pav will be near zero if there are no ~ite effects. It then
recomputes this Pav statistic for random permutations of the site labels at each ti ~e (since under the t- .-
null hypothesis there are no site differences), to obtain a null permutation distribution for p,v, and .1
thus a significance test. See Chapter 6 of the methods m anual for more details on the choices
offered (by the tab on the ANOSIM dialog box) of rank correlat ion coefficient p to calculate
between two resemblance matrices; • Spearman is the best known and the default. Nore that the
(

routine copes automatically with a small number of missing samples (as here) because fo r each pair E'! ~

E ....•.
of times it can drop the sites which are not found in both configurations (called pairwise de letion of
missing values), without having to drop those sites from the whole matrix (called listw isc deletion).
A satisfactory test needs a reasonable number of shared sites to be available for all pairs of times.
E- ..._;
E- .I(

Design
RNlk tori elatiun method t-: ~

0Spearmen
00ne w ay
0 Weig!ied SpeerrMO ~-.
0 Two way C1GS$ed (wih rep6cele.s )
•-
<iTwo way crossed (no repliceles)
~wo way nc:led (B

Fedor A:. ._I


s_ ••---~
.
1 02 O.J 0
~
-
to Rho: o
TESTS FOR DIFFERENCES BETWEEN Timo GROUPS
(across dll Sito groups )
Global Te.5t
Sample stuciscie (Rho): o.oss
9ign1t1eane e level ot samp l e scatisc1e: 22.5%
----11Nwwer ot permutations: 999 (Random samp le)
Nwwer ot permuted statistics greater than or equal co Rho: 22~ v

138
12. ANOSIM

The results. (and Graphl) show firstly that there is a significant site effect (p = 0.36, p<O. I%).
However, smce 2-way crossed layouts are symmetric, the procedure can be reversed to provide a
test for time effects. If the relationships amongst times show a common pattern over all sites (o
over sufficiently many of them to depart from random re-arrangement) then there must be seasona~
differences. The second test in the ANOSIMl window (and Graph2) shows that this is not the case:
p is close to zero (0.06) and falls in the main body of the null distribution, implying no evidence for
a seasonal effect (this is not so surprising in a climatically mild region, given that generation times
of meiofauna are measured in days/weeks). We would therefore be justified here in strengthen in a
the testing procedure for sites by running a I-way ANOSIM on factor Site, with the full set of 44
samples, i.e. treating the different times as the replicates. Now it is possible to obtain tests between
pairs of sites; such pairwise comparisons were not available w ith the 2-way crossed (no replicates)
analysis for obvious reasons (one cannot ask about commonality of pattern across 6 MOS plots, if
each consist only of two sites!). From the I-way ANOSIM results (ANOSIM2) and the MOS of all
samples (Graph3) it is clear that most sites have significantly different and well -separated
assemblages (with the exception of I 8, which is species-poor and has widely scattered 'replicates'
on the MOS, and I2 v I3 and I6 v 17, with low R values of0.21and0.17 respectively).

T11nrtorm: fourtt'I toot


IAtu mbl 1nct : $17 Br1y Curt is slm d.rn tv
De~lgn
a. 1~
00neway y 1J
12A 128
~way crossed (Wllh repl'ic E 14

63.333 0 Two way crossed (no rc pliclll


• • 15
• 10
-17
53.479 46.49
54 .437 59.67 0 Two way ne:ted (8 wthln A) 18
10
~

- -
Pdic1.,iso Test.:;
I R S igni! 1cance Po,,,,ible Act.ual
r r!'
Group~ St.at.i,,t.ic Level % Permut.at.ion" Permut.at.ion"
12 ' 13 0.206 1. 3 462 462
12' 11 0 . 97 0 .2 462 462
12' 15 0.76 0 .2 462 462 c
12 ' 16 1 0.2 462 462
·~ 12 ' 17 0.88 1 0.2 4 62 462
12' 16 0.619 0.2 462 462 <

[ Note that this one-way test, taking times as 'replicates', might have been equally useful if we had
found a time effect in the previous 2-way ANOSIM. The situation is not the same as described
earlier for the 2-way nested design, where ignoring factor B, when it is really present, leads to a
non-conservative (pseudo-replicated) test of factor A. The crossed design is all-important here : in
the MOS above, the variability in community composition within each site is seen to be made up of
the true replication 'eITor' ~the differences through the year (bi-months A, B, C, ... ). lf the true
time differences arc small in relation to site differences, then a I-way ANOSIM test o f si tes could
still be significant. If it is, we are perfectly justified in rej ecting the null hypothesis of ' no site
effects'; the test is now conservative, so that if we get significance, all is well. The prob lems arise
when we don't reject the null hypothesis! Now, we have no idea whether that is because there arc
no genuine site differences, or whether those d ifferences exist but have been completely obscured
by a large time effect. To see this, imagine that the above MOS has time points closely bunched
within each s ite, as above, but running in the same directional sequence A, B, C, .... for eac h s ite.
This is a time effect, but an attempt to test for this time effect by a 1-way ANOSIM, treat ing the
si tes as replicates is doomed to fai lure. The large site differences have been 'pooled into the
residual' - to use univariate ANOVA terminology - inflating it to be greater than any time effects.
If site and time differences exist, but are of similar magnitude, then both the I-way tests (using
sites as replicates for times, and times as replicates for sites) will fail to find effects. Herein lies the
point of the 2-way ANOSIM: it is able to remove the effects of one factor (e.g. time) when
(.., considering significance of the other factor (e.g. site). Ideally there will be replicates to a ll ow us to
do this properly. Where there are not, the special form of 2-way ANOSIM above is at least able to
(..
provide an indirect, global 2-way test, though it too will not always work. For example, it depends
('av on the interaction between the two factors be ing small - see Chapter 6 of the methods manual. I
.,. • <--
I.;,,
f .
139
t o1!
12. ANOSIM

'-
Contributions Moving away from testing for differences between groups of samples to interpreting such
of variables differences when they have been shown to exist, Chapter 7 of the methods m anual (page 7-3
to similarity onwards) looks at the role of individual species in contributing to the separation between two
(SIMPER) groups of samples, or the 'closeness' of samples within a group. This is impleme nted in the
' similarity percentages' routine (Analyse>SIMPER), which decomposes average Bray-Cunis
dissimilarities between all pairs of samples, one from each group (or decomposes all similarities
among samples within a group), into percentage contributions from each species, listing the species
in decreasing order of such contributions. It can be used effectively with bubble p lots, to identify
which species abundances it may be useful to superimpose on the samples MDS plot (as circles of
differing sizes, see page 83) . Note however that, as with the ANOSIM routine, SIMPER operates
on the dissimilarities themselves (in their high-dimensional relationship) and not on the approx-
imation represented by a 2-d ordination, so is capable of providing an interpretation for established
group structures when these are not accurately represented by 2-d bubble plots (high MOS stress).
Analysc>SIMPER also now caters (in v6) for environmental variables (or suites of biomarkers,
morphometric measurements, particle size distributions etc) which may be better ana lysed with a
resemblance matrix of Euclidean distance rather than Bray-Curtis, the simpl ~ sum ac ross variables,
which makes up (squared) Euclidean distance, being decomposed into variable contributions in j ust
,.
the same way. A further enhancement (from v5) is the inclusion of a 2-way crossed layout for the
group structures, e.g. when asking which species (or environmental varia~ les) best discriminate
two times for data taken across several sites, the only dissimilarities considered arc those across the ,,.
i
two times within each of the sites, thus removing site effects from the time comparison.
Species dis- Close any existing workspace and re-open the Bristol Channel zooplankton workspace, bcwk, in
criminating C:\Examples v6\ BCzoo. If this does not exist, re-run the cluster analysis on bcza, from page 65,
two groups taking 4th-root transform (Datal) and Bray-Curtis similarities (Resem l). Ipc ludc the SIMPROF .•
option in Analyse>CLUSTER, creating a factor (SprofGps) whose levels (a, b, c, d) arc the four
68 - >
groups that are identified as significantly different by the SUvlPROF tests. We shall repeat the
74 -> SIMPER analysis of Table 7. 1 of the methods manual, to li st those species which primarily
contribute to the difference between the first two of these groups (a : sites 1-8, 10, 12, termed ' true
estuarine ' by Collins & Williams, the original authors of this study; and b: s ites 9, 11, 13-27, 29,
termed 'estuarine/marine'). With either the full transformed data sheet Data 1 as the active window
- or with just a selection of it, to groups a and b (Select>Samples>•Factor levels>Factor name:
SprofGps>Levels> . . . , see page 91192) - take Analys!!>SIMPER>(Design•One way> Factor A:
SprofGps) & (Measure•Bray-Curtis similarity) & (¥"'List only higher-contributing variables>Cut- ......
off percentage: 80). The restriction to a cut-off of 80% is probably unnecessary here, where there
are only 24 species, but can be useful to avoid long tables listing all species, however small their
r
percentage contribution to the average dissimilarity between the two groups.
,--

Bristol Channel zooplenkton


...-.
BEST.. . 0 0ne way 0 Bray-Curtis simW4y
AbJJndence
CASWELL... O Two way crossed
0 Euclidean diSlance , ...
9
4.4006
10
3.7746
11
5.4999
DIVERSE .. .
OOMO!S.. .
llNKTREE.. . " 0 Llst ooy t;gher.contribuling variable•
.. ,

PCA .•• ....___ _ _ _ ___........_~ C<.C ofl percentage:


0

!so I r

Ed~ Group:; 4 ' b


Average dissimilaricy • 59.53
I Add... I !La'bcl" s; ~
r---~ 11 1 1-a---111 Gr o up a Gr o up b
[ Combine ... I !2 1 a Species Av . A!Jund Av. A!Jund Av .D i ss Diss /SD Co ncr 1 b ~ Cum .%
. - -- ---, 13 1 a Eury cemora attinis 3. 8S 0 . 38 7.74 2 . 74 13 .00 13.00
I Rename... I 4 I a- - Cencropages hamac us
1
,
I Reorder ... I ;: 2 :
1 I Calanus helgo landi cus
Acarc i a b1!1lo sa
0 .00
0 . 80
2.8 1
3. SS
3 . 79
S.33
7 . 29
6.8'1
S.69
l. 6S
l. 70
l. 'I l
12 . 2 q
11. so
9.S6
2S . 24
36 . 74
46. 29
,...
s . ss l. 69 9.32 SS . 62
;:::~:c!~:~~~o~~!~gacus
0 . 39 2 .91
Oelelc ... 11 1 2 la 2 . 65 'I. 18 4. 70 3. 13 7 . 90 63. S2
'8 2 ja Parac alan us parvu:o 0.00 l. S9 3.30 0 . 78 s .ss 69.07
Key ... I 19 4 'b P leurobra c h i a pi leus juv l. 4 6 0 . 49 3 . 09 l.12 s . 19 74 . 25
10 Sagicca elegans juv 0.2 0 l. '18 2.BB l. SS 4 . 83 79.09
~ ... I !11 Sagi c ca elegan s 0.6 1 1.22 2 . 06 1.30 3 .'IS 82.S'I

140
.' t
12. ANOSIM

In the results window SIMPER!, find the table comparing Groups a & b (see above). The average
of the Bray-Curtis dissimilarities between all pairs of sites (one in group a, the other in b) is 59.5,
and this is made up of 7.74 from E. a/finis, 7.29 from C. hamatus, 6.84 from C. helgola11dic11s etc
·~
given in the third data column of the table. The E. affinis contribution is 13.0% of th e total of 59.5:
C. hamatus gives 12.2% of this total, etc (column 5), and these percentages are cumulated in
column 6, until the cut-off of >80% is reached. Column 4 is the ratio of the average contribution
(column 3) divided by the standard deviation (SD) of those contributions across all pairs of samples
making up this average. A good discriminating species is one which contributes relatively consist-
ently to that distinction, for all pairs of sites, i.e. with a low SD and thus a higher ratio (e.g. P.
parvus, with ratio 0.78, contributes to the difference between groups a and b but inconsistently).
Most emphasis should be on the order in which the species arc displayed however, namely their
·- decreasing contribution to the between-group dissimilarity (column 3). Columns 1 and 2 aid the
•..:i interpretation by also giving the average abundance (or biomass, cover etc) for each species in each
(
of the two groups. E. a/finis therefore declines strongly in abundance in the more saline conditions,
. whereas C. hamalus appears in good numbers, having been absent in the true estuarine group, etc.
Note that the scale here is that of the input data sheet, Data!, namely 4th-root transformed, and the
means in columns 1 and 2 are calculated on these transformed abundances, as is relevant to the
dissimilarity calculations on which the multivariate conclusions are based. Back-transforming fo r
C. hamatus gives a change from 0 to 159 (3.55 4) on the original abundance scale (numbers per m3) .

Species Earlier in the results window, tables are given of the -contributions of each species to the Bray-
typifying a Curtis similarity within each of the groups (see Chapter 7 of the methods manual for the fom1Ula).
group
1Gro u p 11
~I
1
,Ave raqe " imilar1c y : 66. 27

Sp,,c ic~ Av. >.bund Av.Sim Sim/SD Co ne rib 1 Cum. ;


!Euryeemo r a a !!i n i ~ 3. 8 5 19. 31 3. 0 6 29 . 13 29 . 13
1P~eudoca lanus elo nqa eus 2.65 H.68 5.42 22.16 51. 29
Ac a rc i a b i ! i losa 2 .81 12. 16 1. 90 18 . 35 69. 64
\Pol ycha e c e larvae 1.06 3 . 91 1. 25 5. 90 75. 5 4
IPl euobr achia p i l e us 0.97 3 . 44 0. 9 1 5. 19 80.73
I
'Group b
Avc raqe " i milaricy : 65. 29

!spec 1e" Av. Al:Jund Av. Sim S i m/SD Coner ib 1 Cwn. '<
5 . 33 16.29 4.78 24. 95 24.95
!Acor cia bi!i l osa
\Pseudocalanu" e l onqacus 4 . 1& 12. 23 1. 79 18.74 43 . 69
lCal a n u" he lgo l a nd1c u" 3 . 79 9 .99 2 .26 15.30 58.99
Ccnero p aqe ~ hamac u" 3. 55 6. 99 1. 2 6 10 . 71 69.69
' ITernora l onqicorni "
Saq i c c a ele qans . ju~
2 . 91
1. 48
6 . 20
2 . 98
1. 36
1. 10
9 .50 79 . 19
4. 5 6 8 3 . 75 ~.

~I > :
"
The average Bray-Curtis similarity between all pairs of sites in the 'true estuarine' group is 66.3,
made up mainly of contributions from just 3 species: E. a/finis (19 .3, i.e. 29. l %), P. elongatus
( 14.7, i.e. 22.1 %), A. bifilosa ( 12.2, i.e. I 8.4%), with a cumulati ve contribution of about 70% of the
total simi larity of 66.3 (again the list is truncated when 80% is reached) . These species can be
described as typical of Group a (they also have a consistently large presence because the ratio of
their contribution to its SD, across the within-group similarities, is relatively high) . A. bifilosa and
P. elongatus arc also typical of the estuarine & marine Group b (which has a within-group average
similarity of 65.3, comparable with Group a), which is why they do not head the list of species
II (- which are contributing most to the discrimination between Groups a and b. Close bcwk.

SIMPER for Re-open the Tasmania nematode study, workspace tawk in C:\Examples v6\Tasmania, last seen on
.• page 135, where a 2-way crossed ANOSIM test demonstrated there was a significant difference
I the 2-way
crossed between the nematode assemblages seen within (Disturbed) and outside (Undisturbed) sand patcl.1es
layout subject to the burrowing activities of soldier crabs. It does not follow, of course, that the burrowrng
activity is causal in changing the nematode community, since this is an observa~io~al st:idy rather
135 -> than a manipulative experiment (see Chapter 12 in the methods manual) but 1t ts of .interest to
identify the main nematode species whose abundance changes in association with the disturbance.
c A standard (I-way) SIMPER specifying the disturbance factor, with two levels D and U, over the
(.,
... 141
h;.
(
12. ANOSIM

16 samples, would decompose the average of all 64 dissimilarities into species contributions (each
of the 8 D samples with each of the 8 U's). This confuses, however, differences between disturbed
and undisturbed samples with differences between locations across the sandflat ('blocks', B 1 to
84). A 2-way (crossed) SIMPER avoids this, removing the block effect by calculating only
dissimilarities between D and U samples from the same block (4 possibilities from each block) and
averaging these across the blocks (means of 16 dissimilarities therefore, not 64 ). These arc then
decomposed into their species contributions, as before, as also are the similarities for the two D
replicates for each block (mean of 4 similarities, therefore), with the same for the U replicates . The
symmetry of 2-way crossed designs dictates that decompositions can also be done in the other
direction, removing treatment effects in order to determine which species primarily contribute to
block differences, and these tables are also provided, though of less interest in this context.
If an existing tawk workspace is unavailable, open tana into a new workspace and pre-treat with a
4th-root transform, running Analyse>SIMPER on the transformed datasheet. Take Dcsign•Two
way crossed>(Factor A: Treatment) & (Factor B: Block), choosing Bray-Curtis and cut-off 80%.
Of course, SIMPER must operate with active sheet as the data matrix rather tlian the (Bray-Curtis)
resemblance matrix' Resem l , which underlies the MDS seen on page 134 for example, since it r
needs to recalculate all the individual species terms that make up the final dissimilarities. The
resulting table in SIMPER!, headed Groups D & U, shows that the dissimilariry between disturbed
and undisturbed samples is not large, even removing the block effect (aver'lge of 41.3, but this
compares with mean dissimilarities within the D and U groups of 1OQ-68 .6 = ~ 1.4 and 100-73.2 = ... -
26.8, respectively, from the tables headed Group D and Group U). The main pa1t of the table also
demonstrates that the D v U difference is a sum of small contributions from a rather large number
of species. Hypodontolaimus sp B heads the list but this is ·not because it is the most abundant
species overall (Hypodontolaimus sp A generally has larger densities, as do others that are further
down the list). To some extent, however, it is inevitable that no one speciQs can dominate the
contribution, because of the severe 4th root transformation. (The ultimate in severe transformation,
reducing to presence/absence, will inevitably lead to a SIMPER table involving many species.)

Design
Q Oneway
CASWELL... (!) Two way crossed
DIVERSE .•• 0 Eucldean °"1ance
DOMDIS ..• Facior A.
LINKTREE ... ! Treatment v 0Lls1 oriy higher.contribu!ing voriobles
1
1.3161 1.4012
-----
0 0
C~ ·Ofl percemage

11 ----
0
80

2.59
\.. , .
Help

Group O Group u ,.,


Specie"
Hypodontolaimu" "P B
Av.Abund Av.Abund Av.0155 0155/SO Contrib\ Cwn .~
0 . 13 1.81 3 .44 1. 6'1 8 . 32 8.32
t-------
Onyx "P 1.08 1.65 2.28 3.59 5.52 13.83
Hypodontolaimu" "P A 3 . 17 2 . 17 2 . 10 1. 2 1 5. 07 18. 90
Axonolaimu" "P 1.29 2.32 1.99 1.54 4 . 81 23.71
Oe!!modore. "P B 0.00 0 . 86 1.61 1.22 3 . 90 27.62
Leptonemella "P 0 .13 0 . 68 1.56 1.58 3. 78 31. '10
Praeacanthonchu" "P 0.46 1.11 1.52 1. 12 3.67 35.06
Oaptonema "P 0.81 1.15 1. '19 0 . 95 3.61 38 . 68
Promonhy5t era "P 1.32 0.60 1. '17 1.06 3.55 '12.22
Nannolaimoide" "P A 0.69 1. 38 1.44 1.10 J.'19 '15.72
Odontophora "P 0. 38 0.86 1. 37 1. 12 3.32 '19.0'1
Vi5C05ia 5p 0.'12 0.65 1.35 1.08 3 . 26 52 .30
• <- , n~ ., n.., c:.c .,..,

Finally, note that though the SIMPER routine can work well for pairwise comparison o f well-
defined groups of samples, it does not cater for a more continuous pattern of among-sample
relationships. A complementary approach is given later in the next section wh ich poses a subtly
different question: not 'which species contribute to a difference.between groups A and B?' but 'can
we find a subset of species which between them are able to reconstruct the full high-dimensional
sample pattern based on all species?'. The latter uses the BVSTEP option under the BEST rou tine.

142
"-...
~
13. Further matching
"~
·-·~
13. Further matching of multivariate patterns (RELATE, 2STA GE, BEST, MVDISP)
::'-f!t'
RELATE on The Bio-Env routine in Section 11 introduced the concept of measuring how closely related two
-.f" resemblance sets of multivariate data are, for a matching set of samples, by calculating a rank correlation
~

"'
~
~
matrices coefficient (Spearman' s p, Kendall etc) between all the elements of their respective (dis )similarity
matrices. Thus, if the among-sample relationships agree, in exactly the same way in both data sets
(e.g. the two closest samples are 3 and 5, the next two closest are 7 and 15, ... , and the furthest
apart are 6 and 11), then the rank correlation p = 1, a perfect match. (These element-by-element
.~ correlations of two similarity matrices are known as Mantel coefficients in the statistical literature,
though Mantel - working in epidemiology - defined t~1em with standard Pearson correlations, i.e.
-~
linear relationships, which are less flexible than rank correlations for our current purposes.) The
~
two resemblance matrices to be compared in this way need not be of biotic and environmental data
.. ~ respectively, but can come from any source: biotic compared with biotic, abiotic with abiotic,
;.
.t" biotic with a 'model' matrix, etc - it is only necessary that they refer to matching sample labels:
--~ PRIMER performs the calculations by the Analyse>RELATE routine, with active window as one
p of the resemblance" matrices to be compared. In fact, RELATE allows the user either to supply the
second matrix as another triangular resemblance sheet (the general case) or to specify one of two
p
•.f/A special cases of simple model matrices, which the routine then constructs for itself. The first is
referred to as seriation, where the data is compared to a linear sequence, either in space or time, i.e.
~ the matching coefficient p assesses the extent to which samples follow a simple trend, with
adjacent samples being the closest in species composition, samples two steps apart the next closest,
;p.
-· and so on, with assemblages from the first and last samples differing the most. Another model
~ option offered is cyc/icity, with the sample relationships thought of as matching those of distances
~,,~ between points placed equidistantly around a circle. A possible context could be monthly samples
-~
taken over a full year. With a seasonal signal one might expect adjacent months to be the most
similar, months two steps apart less similar and so on, but as the year progresses the assemblage
~ structure gradually returns to that at the start of the year (Dec and Jan are only I step apati, not 11 ).
•,(fA
•I
Model Model matrices corresponding to more complicated structures need first to be constructed by the
-~~~ user and then entered in the same way as any other resemblance matrix to be matched to the active
matrix
,~~fA construction sheet. There are at least three ways of obtaining such model matrices. Firstly, they can be read in
-~:f!l:I (or typed in) directly as a triangular matrix, e.g. as an existing physical distance matrix between the
~~ ;-,~
sampling points. The idea in that case would be to see whether dissimilarity in species composition
:·l~pi matches the geographical layout of the samples - the closer the samples, the more similar the
~·~
f
\ ,,~
i,;;:-· asseriiblages. Secondly, they can be derived from simple x (or x,y or x,y,z) co-ordinates of the
:·~ samp_le points by running this I-variable (or 2- or 3-variable) data sheet through Analyse>
'i::
,·-~
Resemblance, choosing Euclidean distance. For example, if simple seriation (e.g. a linear annual
time trend) was not already catered for as a special case in Analyse>RELATE, it could be handled
. -~ by creating an n-samples by I-variable data sheet, of the numbers I, 2, 3, ... , n, and calculating
Euclidean distances between these samples, producing a lower triangular matrix with l's down the
-~
diagonal, 2's down the next off-diagonal, ... , down ton - 1 in the lower left comer. Similarly, a
~ model 'distance' matrix corresponding to a monthly season cycle could have been constructed by
·~ typing in the x,y co-ordinates of the numbers on a clock face, and again calculating (non-
-~ norm.alised) Euclidean distance between these points (this will not create simple integers l, 2, 3, ..
but they will be in the correct rank order to each other, which is all that matters for the rank

""~
correlation coefficient used in RELATE). For a geographical layout, simply enter the metric form
of lat/long co-ordinates of the sample sites as an 'environmental' matrix, in effect, and take
Euclidean distance again. Thirdly, PRIMER v6 makes it easy to construct model matrices directly
~
from specified factors, using Tools>Model Matrix, which is entered when the active sheet is (say)
f!tt the biotic matrix to be compared with the model. An example given below is of seriation with
~ r.~plication, namely four groups of samples considered to be at points l, 2, 3, 4 along a line (thus
·,~ dissimilarity between group 1 and 2 is less than that between 1 and 3, or 2 and 4, and dissimilarity
between groups I and 4 is greatest of all). This cannot be handled directly by the seriation option in
-~
RELATE because that is only appropriate to single samples at each space (or time) point - here
,...
.. ~ there are replicates in each group, considered to be at distance 0 from each other. Tools>Model
Matrix, specifying a numeric factor with appropriate levels 1, 2, 3, 4, will create the correct model.

143
G:~f'
~r
~~!.----~-------------- -.-.--- _ _ _ ... · ·-· ·- _ _ _ _ . _ · · · ·
13. Further matching

RELATE A permutation test can be applied to the matching coefficient p between any two resemblance
hypothesis matrices which are independently derived and have (at least some) sample labels which can be
test matched up. As pointed out where the RELATE test was first referred to (page 124, in the context
of testing for a significant match between biotic composition and a suite of environmental
variables), it would not be appropriate to use RELATE on two matrices derived from the same
data, e.g. by different transformation or aggregation level on the same set of species abundances.
Under the null hypothesis that there is no relation whatsoever between the two-Similarity matrices,
p will be approximately zero. Its null distribution either side of zero can be obtained by randomly
permuting, many times, one (or both) sets of sample labels and recalculating p, · to build up a
frequency histogram with which the true value of p can be compared. The last example in Chapter
15 of the methods manual describes this procedure in the context of the following data set.

Seriation on Data on cover of a coral-reef assemblage (37 species) on 12 line samples, at equi-spaccd pos itions
Phuket coral down an onshore-offshore gradient (transect A) from Phuket Island, was inJroduced on page I 03 .
transects Open the previously saved workspace kpAwk, or re-read into a new workspace the data sheet
kpcAl of the four early years, and note (Edit>Factors) that the 48 samples are categorised by two
103 -> factors, year (83, 86, 87, 88) and position (1 to 12). Select only the twelve 1983 samples
(Select>Samples>•Factor levels> Factor name: year>Levels> ... ), transform with square root and
calculate Bray-Curtis similarities, renaming the resemblance matrix 1983 (with Filc>Renamc
Resem). Go back to kpcAl and select out the 1987 samples and generate their simi larities,
renamed 1987. Run Analyse>MDS on each one, and note that the 1983 data collapses into two
points, an outlier (position 1 on the transect) and the remaining samples. This can often happen
when one sample is almost devoid of species, as here (look at the original data matrix), so draw a
box around the remaining samples on the MDS plot and take Graph>MDS subset (page 88) . For
both this revised (1983) and the other (1987) MDS plot, join the points in order of their transect
position (Graph>Special>Trajectory./Overlay trajectory>T rajectory numeric factor: position), and r
label (not symbol) the points by position, remove the history box, adjust titles, fon ts etc, to get:

.~~-.-~~r~

8lAI

Phukel coral crNer transect A, 1983


20 Slrus: 0.1
12

10

The steady turnover in coral assemblage structure in 1983 , down the onshore-offshore transec t
(seriation), has largely disappeared in 1987, at the height of the sedi mentation impact from nearby
dredging for a deep-water port. This is reflected in the RELATE statistic, p, declining from 0.65 in
1983 to 0.19 in 1987 (e.g. with resemblance 1983 as the active sheet, take A nalysc>RELATE>
Secondary Data•Result of seriation, and the defaults for other choices). The histogram and results
for 1983 show that the observed p is greater than any of the 999 simulated values, so the nu ll
hypothesis of ' no tendency to seriation at all (p::::: O)' is decisively rejected (p<O. l %, though smaller
P values could clearly be obtained in this case, by taking more permutations). The 1987 value is
much more in the body of the null distribution however, and there is thus no clear evidence of any
serial structure (p<7% approximately, though if a more precise significance level is required, repeat ...
with at least 9999 permutations). Given the paucity of data at the top of the transect, it might be
thought advisable to remove transect position l from both years before computing p, and this does
accentuate the conclusions (p = 0.75 in 1983, p<0.1%, and p = 0.05 in 1987, p<30%).

144

13. Further matching

RELATC · ". . .. .· - .. . -- - -~ -. -- ~ . . :::;:.::,;-c;i


• I • •' • l•I' ~·ii;").'... ~

Secondlvy O<\le f'onlt co<rclt1!lon 1!"'1h?d


1

.. 0.USTER...
~osul of serlotlon

cht>sut of cycllcly
0Si>ecnncn

l~-----froA:;.;;;.;;2;:.__+B:..:;3A3...::...._-L.:.03~A 1'[)5,.,
0 Weighted Spocrmcn
MVOISI'... 0 RosomblanceAnodel motrfx: Q Kendcll
59.875
ZSTAGE... 1•: 0 ·

54.937 I 10.s~ so.s00


~-03_A_s_ _ _-1 42.715 . so.~57 , - 4o:ii14J B Plot Hstogrcm Mox perl!U otions:

~.~-7-A_1___-1 ... -2~.:~2-.~ ----4.?_:~?l v 0 Rt10 veh.les to fie ~__:=:]

RELATE P.'wke! 11an~t A {"'~thou/ posi/lo.'> 1). 1983


150
RELATE
Testing matched resemblance mat
c!1 ifil ~
-..~~==::11 ,Resemblance worksheet
Phul\t1t lran:,~ct A (without posi:ion 1), 198i No.me:
RELATE 1983
1"3 Data type : Similarity
Se le ction: Z-12

Sec ondary data: Result ot seri

.Pa.r4tnete.rs
Rank correlation method : Spear 0
.o 4 .(),3 ·O l ·0.1 0 0: 1 0.2 0.3 0 4 0.5 0 4 0 7 0 8

'Samp le stati!ltic (Rho) : o,7q9


1
S ioniticance level ot !!ample !ltat1!!tic: 0. 1 ;
-------------.--a Rho

'NWl'l:ler ot permutatio n!! : 999"


NW!ber ot permuted !ltati!ltic!I oreater than or equal to Rho : O v

RELATE Given the breakdown of the serial gradient structure for 1987, is it now the case that the pattern of
test on two change down the transect has nothing at all in common with that for 1983? To answer that question
biotic arrays requires a fu1ther run of RELATE, but of the two assemblage simil arity sheets 1983 and 1987
against each other, rather than in comparison with a model matrix. With the active window as one
of these, say I 983, and rei nstating the full set of 12 transect positions for both of the resemblance
ma trices (using Select> All and Edit>Clear Highlight), take Analyse>RELA TE>(Secondary Data
•Resemblance/model matrix: 1987). You will get a warning message ~' indicating th at samp le
labels in the two sheets could not be matched. This issue was raised earlier, on page 102. Nonnally,
PRIMER v6 takes label matching seriously. When linking separate data sheets, as in RELATE or
BEST (or ABC plots in Section 15), the sample order need not be the same in the two matrices -
provided (some of) the sample names are identical, the correct match will take place. However, it is
here inconvenient to have to rename both sets of labels (currently 83A I, 83A2, ... and 87 A I,
87 A2, ... ) to a conunon set (A 1, A2, ... ), especially because the data were extracted from a larger
array, where v6 expects that the sample labels will be unique! So, this warning message provides
an over-ride (press OK) which allows you to skip label matching, and RELATE will pair up the
samples in the existing order in both sheets. (The option w ill not be offered if the two similari ty
matrices are not the same size. Instead an error message will tell you that "No labels matched.
Cannot match labels, even relaxed", and the routine will need to be run again, with the same
number of samples se lected - it is then your responsib ility to make sure they are in the same order!)

Phu!; et l:Jn:,ect A. 1983 v? 9$7


RELATE
.'Jimileri/y (O to 1~
~-.-~
_i.t;;· ,T·'·j-' !,.l"lo. ' · - -i"i
---
ANOSJM...
I
I
f:~ 83A1 CLUSTER...
I
I
~ B3A1 I l"DS ... 0 Result of seriotlon 0Sp .. I

~
Q Resull of cycic~y o w!
0 Resemblcnceknodel motrbc Q Ke

~ l& - G
-·--··-----·-- -: ...
-
PRIMER . · .
B Plot Hi.togrem
0 Rho vclues to fie
No ~Is matched
sl<.ip matchnQ and take same
e<der as worksheet selectloru?
OK

Ccncel
c··
...
OK

145
~
rt~·
13. Further matching ~-·'
~)
r~
The results do indeed show that the assemblage patterns down· the shore in the two years are ~..
completely unrelated. The observed match of only p = 0.08 is exceeded by 259 of the 999
permutations under the null hypothesis (p<26%) - the null hypothesis (as always) being that there ~
is absolutely !lQ match in spatial pattern (p = 0). Omitting the nearshore sample, position 1, from ~) .~·.

both series, makes little difference to the conclusion, p now dropping still further to 0.02 (p<42%). ~~1
153 ~ Save the workspace kpAwk, which will be returned to shortly, to extend the idea of comparing ~-·· ,.:.
spatial patterns across two years to comparison across many pairs of years (2nd stage analysis). ~··
'Seriation Return to the macrofaunal data set from the Ekofisk oilfield, with workspace ekwk last saved in
with replic- directory C:\Examples v6\Ekofisk on page 85. If this is not available, re-open the species data sheet ~
ation' test ekma and redo the steps at the bottom of page 31, i.e. recreate the factor Dist#, which is the i:..'
numeric form of the four groups of sites A to D. These are defined a priori as different distance ~·
85~
ranges from the oilfield centre (in several directions), which are roughly logarithmically spaced ~
(l=D:<250m; 2=C: 250m-lkm; 3=B: l-3.5km; 4=A:>3.5km). Then square root transform the data,
calculate Bray-Curti~ similarities (Reseml) and recreate the MDS plot of page 82. ~
\
~
One important analysis required here is a hypothesis test demonstrating ·conclusively that the
assemblages at sites in these four distance ranges differ, as the MDS clearly appears to indicate. A J;;
one-way ANOSIM test (section 12) will do this (on Reseml take Analyse>ANOSIM>Design•One ~
way>Factor A: Dist#) and the null hypothesis of 'no differences betwe~n groups 1 to 4' is
~
decisively rejected (global R = 0.54, p<<O. l %). However, in more borderline cases, this may not be r
the most powerful procedure, since it tests the null hypothesis against a general alternative of
'groups I to 4 differ, in some unspecified way'. In practice we .may prefer ~o test against a more
specific alternative of 'groups I to 4 form an ordered sequence of assemblage change, away from
""
~-

the oilfield centre'. Such a test is carried out by a run of RELATE on Reseml, comparing with a ~
'
model matrix for seriation with replication, constructed using Tools>Mo4~I Matrix, as shown ~
below. (Somerfield PJ, Clarke KR & Olsgard F 2002 J Anim Eco/ 71: 581-593 explore the greater
r
~
power of this test in relation to ANOSIM, and draw the analogue with standijrd univariate tests: in
this context ANOSIM is the equivalent of 1-way ANOVA and RELATE is analogous to a simple ~
regression, with replication at each value of the explanatory variable.) The counterpart to improved
power is, of course, decreased generality. A departure from the null hypothesis in which the groups
I to 4 have different assemblages, but not in an ordered sequence with distance, might fail to be
"
~
r-

detected by the RELATE test but give a significant ANOSIM result. This would certainly be the "'--
case if, say, communities close to and distant from the oil rig were very similar but intermediate ~.
~
distances produced different assemblages (though this seems an implausible situation to cater for). ~..

With Reseml as the active sheet, take Tools>Model Matrix>Type•Seriation (factor as distance) & ~>
Factor A: Dist#. (Note that the factor Dist, splitting the sites into alphabetic levels D, C, B, A, will ~~·
not work here, because 'distances' cannot generally be calculated between names.) A model matrix
is generated, which you should rename Model dist, having blocks of zeros down the diagonal (sites ~::··
within a distance group are considered zero distance apart), then off-diagonal blocks of l's then 2's ~;
then 3's (sites in groups 1 and 2 are 1 unit apart, in groups 1 and 3 are 2 units apart etc). With
~
Reseml as the active sheet, run Analyse>RELATE>Secondary data•Resemblance/model matrix:
Model dist, giving p = 0.62, thus providing very clear evidence of group differences (p<<0.1 %), ~
and with the large p confirming the strongly ordered gradient of change away from the oilfield in ~)
the high-dimensional assemblage space, seen in the lower-dimensional MDS approximation. ~t·
Another possible test would ignore the (somewhat arbitrary) distance group structure altogether and ~?
'°\· .
carry out a RELATE test of the biotic similarities to the inter-site distance matrix calculated from ~·:;
the individual (logged) distances of each site to the oilfield centre. (The univariate analogue would
now be a regression without replication of the explanatory variable values.) The raw distances are ~-.·~
given by the first column in the environmental data sheet ekev, with variable label 'Distance'. ~I
Select just this column, transform it with Tools>Transform (lndividual)>Expression: log(V) (see ~.
page 105 for individual transforms), then Analyse>Resemblance>(Measure•Euclidean distance) ~­
~. ·
to obtain a distance matrix Continuous dist (ignore the warning about normalising, irrelevant for a
single variable!). Now re-run Analyse>RELATE on the Reseml biotic similarities, matched to ~·,
Continuous dist. The result is rather similar to the other two analyses - perhaps displaying an even
stronger gradient of assemblage change with (logged) distance, p = 0.67 (p<<O. l %).
':'f
~-;
~·.
146
,,:
~
13. Further matching

.,
.... - - -1 - - - - - ANOSJM teSI on J;;ctor 01:;1#
AIDS tor ni<Krotauna at 39 E1<ot1~1< sR~s Oenerol t.!_wo woy crossed 191
20Slrus:O. ll

, 1
<!lOne woy
0'$wo woy crossed (
0 Two woy crossed (no

0 Two woy nesled (B


0. 0.5 0.6

.....

S24 "

Disslm •• •
·' Dupkoto ...
Rank. ..
Transform... Fector A:.
-:? ~--_- vj
Stop Tasks
Options ...
~ OK

0
0 Re:u~ of serietlon 0 Speormon
0 0
0 Resu~ or cyclc4y O Welgf11ed Speormon ,__ _CIOll
130
f.:>Re:embloncellnodel matrix: 0 Kendol

z:e~ tJ?d~-~--:~
0 Pta Histogrom Mox~ns:
999
0 Rho vok.Jes to f ~e I

OK cancel
.0.1 0 0.1 0.2 0.J 04 0.5 06 0.7
Rho

Avero9e ...
Check. ..
".J
Duplicate...
MlsshQ .. .
Mer9e .. .

0 Resul of serietion 8 Speormon


0 Resu~ or cycfcly O W eighted

0 Resembloncellnodel matrix: Q Kendol

59.945 j Model dist - - - ~1 -0.1 o 0.1 02 03 o• os o.s 01


Rho
50.467 I 47.326
~§563 r73.os2 T53~si9 Max perm<Allllons:
_ 46 6~. 66~7 I 4_2.~5.
44 .645 65.772[ 39.8•1 !
· 999 I
43.016' 59.954 40.983
He~

147
,(
13. Further matching

Note that the Continuous dist resemblance matrix in the last example could equally well have been
created by copying the log(Distance) values from the environmenta l data sheet into a factor under
the biotic Reseml and then entering that into Tools>Model Matrix>•Scriation (factor as distance),
specifying the log(Distance) factor as Factor A. The point about the •Seriation option under the
Model Matrix dialog box is that the factor levels do not need to be equally spaced, or in any
particular order. This contrasts with the (Secondary Data•Result of seri ation) option in RELATE
which is much less general since it assumes the samples are equally-spaced and in sequen tia l order. r
Other Model There are three other options when creating model matrices under the Tools menu. With Rcscm I as
Matrix the active window, take Tools>Model Matrix>(Type•Unordered groups) & (Factor A: Dist),
options where Dist is the factor that defines the four distance groups algebraically: D, C, B, A. The
resulting triangular array (call it Unordered gps), has O's between sites in the same group, but I' s
between all sites in different groups (rather than the l's, 2's, 3's of the ordered groups, in Model
dist). In fact, an Analyse>RELATE test between Reseml and Unordered gp~ now has a very close
affinity with the 1-way ANOSIM test on Reseml (it is formally the same test, though w ith a
slightly different form of test statistic, in simple balanced cases) and gives p :;:: 0.40 (p<O. l %). The
benefit of moving from an unordered ANOSIM-type test to an ordered (grouped) RELATE test in
this case is therefore clear - the p statistic increases from 0.40 to 0.62 (though both are s ignificant).

0 Eucideon 20

\lklordered groups

Stop Tasks foctor A:


1 ' 1 1 11 1 1
59.945 _ • Options••.:.._ iiiiil!i±["VJ:J
~[·!~
a 11 1_ 1 _1~ 11 - , -
0
0
.Q.1 0 0.1 0 .2 0 .3 O•
S0.•67 •7.318 1T1 1 1' 1 1 0 0 Rho
--·---·-
60.563 73.092 53. OK ,.---,=----, -, ·~ 1 1~_1_- o :__o.__
0- - - - - - - - - . . . . - - -
46 69 66.•67 •2. 1 1, 1 1. 1 1 0 0 0 0
~~[. ~-1 _ 1 u-~~- 0 -a-·o··-o
1: 1 1 1 1 1 0 0 0 0 0 0
3-~, 1 _ 1~ 1-_o o · o. o. o o o
.2 j. } ~~-1_.1 1. 0 o o_ o o o. _o._ o____
11 1 1 1 1, 1 0 0 0 0 0 0 0 0 0
1t-+: 1" 1: 1 1 , __ 1 1 ' 1 1 1 1 1 1
1 1• 1 1 1 1 1 1 1 ' ' 1- 1 - 1·- , 1 0"
- _ .._ ~ - I - - . -- -
....:-

The other two Model Matrix options are: •Cyclicity (factor as cycles) and •Euclidean 20. The
latter simply calculates, for example, di stance between samples in a geographic layout w hen the x,y
co-ordinates of the sample points are not held in a separate (environment-type) data sheet but as
numeric factors in the biotic data. The corresponding model for sample locations in a 1D layout is
just s imply the first ( •Seriation) option, or equivalently of course, one could set up a Factor B with
the same level (1) for all samples and take the •Euclidean 20 option.
Cyclicity The • Cyclicity option needs a nume ric factor over the range (0, 1), representing the distances round
tests a circle, where 0 and 1 are at the same point (or think of these as the angles at which those points
are set, ranging over 0 to 1, not 0 to 360). Thus the artificial example of resemblance data Resem 1
in C:\Examples v6\Testcyc, for 12 months in one season, can be tested for a monthly annual cycle
by creating a numeric factor Month# with values 1/ 12, 2/12, 3/ 12, .. ., I. When the biotic matrix
Reseml is entered to Tools>Model Matrix>(Type•Cyclicity) & (Factor A: Month#), the mode l
distances Resern2 result, and can be compared with Reseml using RELATE on two resemblance
matrices (the third option), as earlier. This will exactly mimic the second option (• Result of
cyclicity) operating on Reseml for this simple equi-spaced case, but again the explicit creation of a
model matrix is much more general, allowing for non-equally spaced time points. (Note that the
created model matrix Resern2 has been put into MOS to view its structure - it is good idea in these
modelled cases to constrain the MOS with K.ruskal fit scheme•2, see page 77).

148
\-·
13. Further matching

Another option can be demonstrated for thi s test case: the equivalent of seriation with replication
seen on page 146, but where the replicates are now from samples in a cycl ic rather than linear
~rrangemcnt. For the demonstration, suppose that the 12 samples can be spli t into three replicates
111 eac.h .of 4 s.easons . (W, Sp, S, ~) , as given by the factor Season. To use Model Matrb.:>(Type
• Cychc1ty) tlus requires a numenc fonn, given by Season#, with values in the range O to 1. The
values of 0, 0.25, 0.5, and 0.75 , repeated as appropriate, will achieve the equi-spacing of seasons
(Winter could equally well have been represented by l not 0), and the MOS plot from the model
matrix displays the grouped cyclic structure to which the biotic Rcseml is th en RELAT Ed .

No

Dlssin... 0.25 0.1666 6 .33E·2


°""6tate ... 0.333~ ·0.25 01667 S.34E~2
Rank. .. 0.4167 0.3333 0.25 0.1667 6.33E·2
Transform.. 0 3333 0 25 0 1506 6.33E:i
04167 0.3~34 025 01667
Stop Tasks
I Options... . . .~~~~0..5 0 .4167 0 .3333 0 25
..., 0.3333
I L~b~l M.;;,i'hl ~ Se~~on# 0 4167
..___ __, Jan 0.0633 Wrter \0
20 Suus:0.0001 11--------·~
05
. . -......- 1 I Comb'1e ... I Feb o~~ Wrter jo - Nov
C Jan

Feb
--f'~-+..::.::...~
I ~ename... I Mar
Apr
0.25_~~. .25-
0.3333 .spr;,g •.25 Feb
Oct
May 0.41 67 Spring .25
I Reorder ... I .Jun -
1 M..'lr 79 775 65.676
o.s---r-surrvn;-'5
.
Sep
Tesr Ciara momnf\· cNe; ·.:>ne rear
\Apt 79.516 86.061 Delete ...
I Aug
J\A O.5633 Surrvner .5
O 6667 Surrvner .5 Aug 20~
REUITE toUo.'(!~1:
; ! Moy 77 .333 79.195
Key ... 0.75 -· ;A;A.,;m :.75
54.4 54 .639 __,_. ~-
Jut AA May
0.6333 Auturm .75
61.069 61.538
irl'4>0!1 ... . -·-- -· .
0 .9167 Autumn .75

20 Strur: 204 I

•j

J,jluJ ·0.3 ·0 .2 -0.1 o 0 .1 0 .2 o .3 0 .4 o.~ o


Rho

"

Rationale As seen above, the p statistic, which rank correlates the elements of two similarity matrices, can
for 2nd provide a very useful and succinct summary of the extent of agreement between two ordinations
stage MOS (or, to be more precise, of agreement in the high-dimensional multivariate data underlying these
low-dimens ional plots). Often, many such pairwise comparisons are made; for example, a single
set of data may firs t be aggregated to a range of taxonomic levels (species, genus, family , ... ), then
ana lysed under a range of pre-treatments: standardisations (none, by species or samples, and by
maximum or total); other taxon weightings (e.g. dispersion weighting); then transformations (none,
square root, 4th root, log, pres/abs), etc. Many ordination plots result and it is va lid to ask how
much the multivariate pattern changes as a result of these various decisions. What is the imp?rtant
choice? Is it whether the data is only identified to family rather than species level, or is the
difference th is makes completely dwarfed by the changes resulting from choos ing to look at

149
"'~-·
~I
13. Further matching
~
common to mid-abundance species (none or root transform) ot concentrating more on the less- <i
common species (4th root or pres/abs)? Or is it the choice then of a resemblance coefficient (from ,.,-)
the 40 or so in Section 4!) that' really dictates the conclusions? It can .be difficult, and arbitrary, to · ...._
assess this just by looking at the range of different ordinations produced, but at least we can exploit ~ ../
the p statistic to give quantification of the agreement in multivariate pattern for any pair of choices. ~~~-.
. ..
When there are many choices, even a set of p values between pairs does not become a succinct ~~-:
enough description (considering only two types of choice, there are 20 different ordinations from 5 ~-·.
transformations and 4 taxonomic levels, thus 190 p values between them!). The key step here is to ·"'
realise that p itself can be regarded as a similarity measure, taking values near 1 if two multivariate ~
patterns are highly similar and near zero if they bear no relation to each other.. So, the triangular ~-·,
matrix of p coefficients between all pairs of ordinations can be entered into the MDS routine, to ~ ·
obtain what PRIMER calls a 2nd stage MDS plot (an MDS of MDS's, in effect!). Again, there is --
no reason to assume linearity of the p coefficient in its translation into distances in a 2-d ordination ~
plot, so the usual non-metric MOS seems most appropriate (essentially based on the rank orders of ~ ,
the p values, therefore catering naturally with negative p - these just becoil'le patterns that are even ~ .
less like each other than random re-arrangements, and in practice large negative values are not
observed). The 2nd stage plot therefore gives a succinct summary in a 2~d picture, usually with ~
small stress, of the relationship between the multivariate sample patterns unqer various choices. ~
(Morlaix The routine which carries out this operation (Analyse>2STAGE) is illustrated by a further data set, ~-,
macro fauna, that of benthic macrofaunal assemblages in the sediments of the Bay of Morlaix, sampled at ~:
Amoco-Cadiz roughly 3-monthly intervals over the period April 1977 to F~bruary 1982,.covering the period of ~-·
oil spill) the Amoco-Cadiz oil tanker wreck in March 1978. (The data are from Dauvin J-C 1984, Doctoral ~
thesis, Univ Pierre et Marie Curie, Paris.) The spill occurred some 40 km from the Bay itself but '
·~ oil slicks reached this coastline and there is a clear signal of marked change Jn community structure ,.,,
in the sampling periods after the spill, with a gradual recovery over the nex.t 3 years towards (but ~
riot reaching) the starting assemblage - see the MDS below. There are 21 ·sampling times (A-U) ~!.
over the 5 years, with the oil impact occurring between samples E and F. ·
~
In a new workspace, File>Open all the files in C:\Examples v6\Morlaix - you can do this in one '
operation by highlighting them all (click on the first, shift-click on the last; ctrl-click adds or ~·
subtracts from the highlighted list, as in nonnal Windows practice), and then pressing the Open ~·
button. Most of the sheets are similarity matrices computed from the original data sheet mxma of ~ ·.
21 sampling times and 257 species, using either species (mxs*), genus (mxg*) or family (mxf*) t
level data, and then subjected to a range of transfonnations (0: none, 1: square root, 2: 4th root, 3: ~
log(l +x), 4: reduction to pres/abs), giving 15 similarity matrices in all. However, one of them ~'
(mxf3) is omitted and left for the user to construct: with mxma as the active data sheet, take Tools> "'
Aggregate>(Aggregation worksheet: mxmagg) & (From level: species) & {To level: family). Note
that the aggregation matrix mxmagg is purely numeric in this case, giving a data sheet Data I with ~,
(family) numbers for variables, but it does not need to be - the gfagg sheet for the Groundfish data ~
(first seen on page 36) is a more typical example, using taxon names On Datal, take Analyse> ~
Pre-treatment>Transform (overall)>Transfonnation: Log(X+l), then Analyse>Resemblance, ~
using Bray-Curtis similarity- rename the result mxf3. Run this through Analyse>MDS and do the . ·t
same for another resemblance matrix, e.g. species-level root transform, mxs 1, joining the time ~
points (Graph>Special>./Overlay trajectory, with factor: time), and visually comparing the plots. ~·
In spite of the lack of replication which would allow formal statistical testing, there is much data at
each time point (several pooled samples) and this clearly demonstrates, in both plots, a pattern of ~
- ..
strong change after the spill and partial recovery, dominating a smaller-scale seasonal signal. ~

2nd stage Now select one of the similarity matrices as the active window (mxsO, say) and take Analyse> ~/
MDS plots 2STAGE>(Data•Multiple matrices) & (Other resemblance matrices: ./mxs4, ./mxfil, ./mxfl, ... ) ~
making sure to select all 14 similarity.check boxes. Choose the default Rank correlation method• ~
Speannan and the routine returns a 2nd stage resemblance sheet of matrix correlations p, all of ~~t
which are positive, with some very close to I (e.g. species and genus level under no transform, or
4th root and log transform for the same tax.on level, etc), indicating complete robustness of the ~:,
conclusions to those particular choices. Title {Edit>Properties) and rename the sheet as mx2st. ,.,.; ~'
The rows/cols may not be in a very logical order for easy viewing, but you can re-order them using
~-.

150 ~:
~)
~
13. Further matching

Edit>Sort>•By labels (or creating a factor which gives the desired order and sorting by that). It
might also be useful to create a more helpful label as a factor: taxon-transf, with level s sp-none, sp-
root, sp-4th, sp-log, sp-p/a, gn-none, .. ., gn-p/a, fin-none, ... , fm-p/a (you could use the Combine
trick on page 60/61). Then on mx2st, run Analyse>MDS and label the points with taxon/transf.
(The plot shown below is also zoomed to a wider, shorter rectangular boundary, a useful way of
generating tidier-looking plots, see page 87/88). The main conclusions are that transfonn choice
and taxonomic level tend to have 'orthogonal' effects (transforms across the page, taxon levels up
the page); that transform choice generally makes a larger difference to the outcome than taxon level
(the exception being between 4th root and log, which are more-or-less equivalent) but that the
differences between taxonomic levels increase with the severity of the transformation. The latter
was to be expected, since untransformed analysis tends to be dominated by 3 or 4 abundant species:
if these are in different genera or families then their contribution is unchanged by aggregation. The
methods manual lis ts work by Olsgard, Somerfield & Carr 1997, 1998, 2000, exploring this further.
Save the workspace as mxwk, for use again shortly, and File>Closc Workspace.

ahfil ~

~~ mxwk
~ mxs4
' ~ mxfO
·~ mx f1
~ mxt2
~ mx fd
~ mxgO
~ mxg1
~ mxg2
~ mxgJ
~mx94 O O 0
'...) !lJi!:'.i J 2 16
dJ mx""'lKJ 32 53 536

·~
~~1
mxsO 1!1111111 '-:;::;~~ ~ L-ir-n•Rli
,. . ,
~ mxs2 ~=:=:.t 0 ""
~ mxs3 l..-~~~----''-'-~~~-'-~~~2-o~st~
,.-a-
: o-
.o-
7l l~>_....

Second stage matrix for Morlaix analys


Corre le/ion (- t to 1J

0 Slng!e malrlX wth somple groups


•.";rJ',••~ • ·~ :. ••

.•• J·: _...,,.,


'11:+

0 Mu~lple molrlces 20 Slress 0 02 ~


CLUSTER ...
MOS... Other resemblance mstrlces: Im· log
MVDISP .. . , ~ mxs 4
r,;.41Rn·IOg rm-root
j -:: mx ro • • SP·IOQ rm-no ne
gn-,.0Yt~ 001
RELATE .. .
I "' mxl1 on·41h •
1;;: mxf2
.-1 mxf4

.
SP·4 1h I
op·none

1~ mxgO
~~xg1 .
on·pla
SP..f'a

157(-
<'

151
13. Further matching

2STAGE to More recently (Clarke KR, Somerfield PJ, Chapman MG 2006 ' On resemblance measures for
compare ecological studies, ... ' J exp mar Biol Ecol)', the technique of 2nd stage plots has also been used to
resemblance examine the effects of different coefficient choices on a samples analysi s, scaling this in relation to .,,;..._,
coeffici ents the effects of differing transformation (and, by extrapolation, taxonomic level). Re-open the Clyde
dumpground workspace clwk in C:\Examples v6\Clydemac or, if unavailable, the bi omass data <:"")
1 20~ clmb, for which the MDS, based on square-root transformed data (Data l) and Bray-Curti s simil-
..
ari ty, was seen on page 8 l. For the same transformed sheet, Data 1, calculate a wide range of other
(dis)similarities and distances from Analyse>Resemblance>(Measure•More (tab))>More ... , sec
page 46. E.g. from •Similarity PIA: Sl Simple matching, S8 Sorensen (i.e. Bray~Curtis PIA), Sl 3
Kulczynski PIA and S26 Faith; from •Similarity quantitative: SIS Gower, Sl 8 Kulczynski and
Canberra similarity (excl 0-0); from •Distance: D7 Manhattan, DI 0 Canberra metric, D 16 Chi
squared distance; from •Others: Binomial Deviance (scaled); and back on the Main tab: • Euclid-
ean distance, and also its form when the data sheet Data l has firs t been nonnali sed (with Analyse>
Pre-treatment>Normalise variables, page 39, having first taken out the all-bl ank species with
Select>Variables>•Use those that contribute at least 0.01 %). With one of these resemblances as
the active sheet, A'.nalyse>2STAGE>Data•Multiple matrices, ticking the check boxes for al l the
rest, and run MDS on the resulting 2nd stage matrix, also looking at indi vidu~l MDS plots for some
measures with differing effect, in comparison with the contaminant gradient (runn ing MO S on the
normalised Euclidean data clevtn from page 113).
~ ~ i..'.11 ,f.
Clyde 2nd stage coeffs
Data Corralel1on (-1 to 1J
~ ~........_~

0 Single mairoc w ah somple groups

: ;
ANOSIM.. .
0 46931
::P---+--+c.;;_-..J."-'---i..;;.._---"L.;;..
S: 0.USTER .. .
r-DS ... ·. ,. .; 0 49~6 095905
MVO!SP ... 081713 086626
0 .71539 -0 14254
0Muliplo matrices
~ Other resemblance ma!ric:es: 0•336 091716

v Sinple 0 40904 0 97628 ,..-


v B.CPA 0.71216 0 19543
i.i{ Ku!PA 0.94659 0.20538
';. Feih 0 78301
1..,. Gower

-
0.37161
0.29104
12
0
..
10
Gower 20 Stress 0.024
c~vs
• 1
9imp(PA )
NEucl I

Fa:th(PA)

Manh
• Kul(PA)
20 Streu: 0. 102 •
B-C(PA)
10 2 3
~""
••
Eucid e.c f. .
I

20 Stru:s . 0 02'

152
~f!!l
.f!"' 13. Further matching
'~
Conclusions Clar~e, Somerfield & Chapman 2006, J exp mar Biol Ecol, discuss this analysis (and that for
.. ~ on comparing several other data sets) in more detail, but to pick out just four general points:
~ resemblance a) These 2nd stage plots have common features, irrespective of the actual data set, e.g. coefficients
·'~~ coefficients
which are in what they tenn as the 'Bray-Curtis family' (including quantitative measures: Sl 7 Sl8
& Ochiai (quant), matched by pres/abs measures: SS, 813, S14; also Canberra similarity (exc 0-0))
.... ~
tend always to cluster on the 2nd stage plot, i.e. produce similar multivariate conclusions and
~~ radically differ from Euclidean distance, even more so when the latter is nonnalised. '
~ b) Choice of coefficient is much more cnscial to a multivariate analysis than transfonnation (which
itself is more important than taxonomic level - see earlier); this is apparent here by noting the
~ relative proximity of the Bray-Curtis and Bray-Curtis PIA (Sorensen) points, and the Kulczynski
"~ and Kulczynski PlA points, on the 2nd stage plot (the first of the pair uses a mild square root, and
the second is on presence/absence data - the most severe transfonn possible).
~
·' c) The inference of similarity from joint absences for coefficients such as Euclidean distance, S 15
pi

..
'i.~
"' Gower etc, has a dramatically adverse effect on their perfonnance in describing gradients of
assemblage change where there is a tumover of species (i.e. pres/abs data is infonnative); this is
clear from the above (1st stage) MOS based on Euclidean distance, which places site 6, at the
centre of the dumpground, close to the extreme ends of the transect, 1 and 12, when 6 has no
~ species in common with either! Similarity is deemed higher because they share absent species. The
.~ft' radical effect of counting (or not) joint absences is also clear here from: the separation of the
Canberra metric from Canberra similarity (the only difference is an adjustment for double zeros,
'fl' page 47 & 48), and the way the plots splits left, right (counts 0-0, ignores 0-0), with the Faith
~ coefficient intennediate since it counts joint absences, but with less weight than joint presences.
~~
d) Another key feature which separates out the behaviour of coeffici~nts is whether they implicitly
·p or explicitly standardise (or nonnalise), and whether over samples or species. Chi-squared distance
does both, removing all differences in total abundance between samples and also having a divisor
-~ of the total abundance of each species across all samples: low density species can be given very

heavy weight, leading to problematic behaviour. Nonnalised Euclidean and Gower also have a
"""p species (but not sample) standardisation, giving rare and common species equal weight.

.. ~
·~
161 ~
Save the Clyde workspace clwk for later use, and close it.
2STAGE A very different way of using 2nd stage matrices is best accessed through the alternative entry
~ from a single option in the dialog box for 2STAGE, namely to specify a single similarity matrix with factors
similarity defining a 2-way crossed layout of samples (e.g. of sites and times), and allow 2STAGE to select
~ matrix out the sub-matrices on which to calculate the second-stage correlations. To motivate this, return to
t!' the Phuket coral analysis of page 145, in which the spatial pattern of assemblage change down an
I~ 146 -> onshore-offshore transect was compared for two years, 1983 and 1987 (see MOS plots on page
144). The Speannan correlation between these the two Bray-Curtis similarity matrices underlying
f'A'i these profiles was only p = 0.08, indicating a poorly matching sequence, the conclusion being that
p. the sedimentation from dredging for a deep-water port in 1986 and 87 had disrupted the normal
~ spatial pattern of the assemblages. In fact, that study has data from 13 years (over the period 1983
to 2000), including a further potentially disruptive event in 1998, a prolonged period of low sea
~ levels from a high-pressure anomaly, increasing the frequency of desiccation. If the transect
; '!i patterns for all years are now matched, pairwise, a correlation matrix of p values is produced,
-~ which is the second stage matrix. These 'similarities' between years can be input to MOS or
clustering to give a succinct visual summary of the inter-annual changes, not of the community as
~ such (i.e. not of the average assemblage, or the assemblage at one fixed point on the transect - that
~ would be a 1st stage MDS) but of the internal pattern of assemblage change running down the
.~ shore. Years which are anomalous in terms of their spatial pattern should stand out as outliers on
• this 2nd stage MOS or 2nd stage cluster analysis. If the inter-annual differences do not disrupt
. '~
internal spatial structuring but simply, for example, increase the abundance of all species down the
"fl" transect in some years, relative to others, then the 2nd stage plot will show nothing whatsoeve~ -
... ~ that type of signal will be seen in a (Isl stage) plot of yearly changes in the average community
over the whole shore. In a sense, what the 2nd stage plot does is to remove 'main effects' o~ years
L~ (to use familiar univariate terminology) and concentrate on 'interactions', the changes m the
t•
1,, ~.. - ···~
internal spatial gradient for some years compared with others. This example is carried out below
and discussed in more detail in Clarke KR, Somerfield PJ, Airoldi L, Warwick RM 2006.
'Exploring interactions by second-stage community analyses'. J exp mar Biol Ecol.
153

~
,,·":.~
13. Further matching

Open the workspace kpAwk, of coral cover for the Ko Phuket transect A, in C:\Examples v6\
Phuket, first met on page 103. If the workspace saved then (or later) is not available, read in kpcAI '•

(years 1983 -88) and kpcA2 ( 1991 -2000), and Tools> Merge them, taking the defaults, to produce
the full inter-annual series kpcA. This has 156 samples, in a 2-way crossed design split into 13
years, with 12 positions along the onshore-offshore transect (look at the factors year and position
with Edit>Factors). Create the full similarity matrix from kpcA under the same conditions as
previously: Bray-Curtis on ·square-root transfonned data, renaming it kpB-C. Then take Analyse>
2STAGE>Data•Single matrix with sample groups>(Outer factor: year) & (Inner factor: position)
to produce the 2nd stage matrix, renamed kp2st say. On this, Ana lyse>CLUSTER and Analyse>
MDS, drawing c lusters on the MDS at (say) a resemblance (p) of 0.2, from Graph>Special>
./Overlay clusters (see page 80). Contrast this 2nd stage plot with the (I st stage) MDS of years,
from averaging across the transect positions: with kpcA as the active sheet, take Tools> A vcrage>
(Samples•Averages for factor: year) & (Variables•No averaging). On the resulting data sheet of 53
species by 13 years (yr avge say), repeat the square root transform and Bray-Curtis sample
similarity calculations, and run MDS. Although testing is not possible in this case (but it can be, fo r
some 2nd stage analyses, see below), it is clear that this 1st stage ('year main effect') matrix is less
sensitive in picking up the potential impacts (sedimentation years 86 and 87, desiccation 98) than
the 2nd stage, concentrating solely on the consistency in spatial pattern over years ('interactions').
-
._

r-.
E!!m
ANOS!M.. .
CLUSTER.. . 2nd stage cluster of transect profiles
Oat~ - Rri correlotion
t-DS ... Group average
0 Sflgle mlllrix wlh s~a groups ·O.S
MVO!SP .. . 0Spaorman
~ Outer factor:
RELATE .. . 0 Welg~edS
j yaor "' j
QKend!ll 0

7 ~ 708E• 2
0 6024 2 l_°_0.·3_1 _0o_49 _9_.228~7 E_·2
21 5 3 .--~2_n_
d _s_
ta~g_e_N_10
~S_o_f_tra_n_s_e_c~tp_~~o_m_e_s_o_
ve_r......._~~~~
- 20 Stress· 0.107 C(l(re1a11c111
_ 0.43023 ·~7 _1 _0 .14291 , . 95 ~"\ - 02
o.7322.: ~.3sos9 L o .~2211 \
0 .33802 6.4854E-2 I 6.6444E-2
94
- ·- -
o.42388 , 0.1949 0.11612 \ 186.., \ 93 }

..
0.38649 I 0.14468 -5.281~ 00
I

~-1!11! iii~-m~r-11
Phukel coral cover. transect A, 1983-2000
\l \"·---:0' ,.
Other \
"-.E/

S..~ple$
20 Stress: 0.12
0No averogng 00 97
.. I Add... j ·_La_bel--+-i'----'-1 0 Averages tor 1oC1or:
~
r---~ '_63A _ 1- +-_. 99
[ Combne ... I
83A2 1veor
95
94
83A3 83 3 98
l Rename... l 83A4 83 4 91
92

[Reorder ... I.:: :~:-- ~ate


MisslnQ ..
Meroe... 86
93 83

Delete... j 83A7 83 7 Rank var 88


83A8 83 8 Sun... 87
Key... I •83A9 83 9
83A10 03- -;10___
Transf
TrMSpOS
mpor1... I .83A11 ~;11- 03 34.1 24.2 : 19.9 1
,---~ l83A12 83~
I stopT4$
0 0 0.16 1 2.16
Options.
OK j l86A1 86 11 01 01 0 0 v
,..---~ 186A2 86 j2 <•

154
f!1
r
~
13. Further matching

2STAGE for In the context of a 2-factor design, PRIMER makes a 2nd stage matrix very simple to produce but
{~ time series it is less easy to understand what it represents! Think carefully about the options: the factors must
.r and repeated
measures
divide the data into a 2-way layout with no replicates in each cell; the inner factor specifies the
patterns to match, positions 1 to 12, and the outer the factor to display, the years. Note that, because

.,'"'
of the symmetry of two-way crossed designs, these could be reversed, thus matching the patterns of
"fl' inter-annual time series at each position on the shore. This removes the 'main effect' of differences
in (time-averaged) assemblages going down the shore, and concentrates on anomalous shore
·~ positions, which have a different time series pattern. In fact, this is arguably the more useful
.")~ application for Analyse>2STAGE: comparing time profiles at different locations. Sometimes there
~ is also a natural hypothesis testing framework, which even extends to cover 'repeated measures'
designs, usually considered problematic, even in univariate studies. The Clarke et al (2006) paper
~
referred to on page 153 discusses two such examples, an inter-annual time series at different
·~,,,. locations in Tees Bay, UK, and a recolonisation experiment on macroalgae in the Ligurian Sea,
~ some of the data from which was analysed on pages 136 and 137.
·.» In the Tees Bay yearsxlocations study, the locations had a further structure of geographic areas, and
r·~
sites within areas; all of which were returned to in the same month over 22 years. The patterns
being compared here are the separate time series at each location, producing a 2nd stage matrix
:'\ with a 1-way structure of areas and (replicate) sites within areas, which can be tested for area

"'
·.:p
differences by (2nd stage) ANOSIM. Note that this is not then a test for different assemblages in
different areas - such differences are to be expected because of the geographic range. Instead it
tests whether the inter-annual variations are the same across the region or whether some areas
-.~
.•I"'
show a different temporal pattern (e.g. close to the Tees estuary, where changes in contaminant
impact occurred over that period); 1-way ANOSIM on a 2STAGE matrix establishes this point.
~-ft' The Calafuria macroalgal recolonisation experiment monitored the same physical rock patches over
-.~ one year, having first cleared the (subtidal) rockface. Replicate patches were tracked for 8 different
... 'treatments', namely different times of year for the clearance. The 2STAGE analysis matches the
:·~
'\ recolonisation patterns of all replicates and a 1-way ANOSIM on the 2nd stage matrix tests
.. P whether different 'treatments' give different recolonisation profiles (which they do). The individual
I~ time points in the recovery sequence cannot be assumed independent, since the same rock patch is
returned to bi-monthly: this is 'repeated measures'. But the 2nd stage analysis treats that inter-
~
dependent time sequence of recovery as a single experimental unit, in effect. It becomes a single
-~ point on the 2nd stage MDS plot and a single replicate in the 2nd stage ANOSIM, independent of
f' other replicates (other rock patches), and thus gives a fully valid test. An equally valid alternative
~ would have been to throw away the intermediate recovery times and just analyse the assemblages
.,~ at one year after clearance (which is the data seen on page 136, which also introduces a lower level
to the design, of patches within areas, under the different treatments). In fact, the 2STAGE analysis
-~~ is more incisive here because it allows the whole recovery profile to be assessed rather than solely
its end point (but different hypotheses are being tested, and both are of interest in their own right).
~
~
~-~
Ideas for Ano~her situation employing rank correlation (p) between two resemblance matrices was met in the
--~~~ other BEST BEST (Bio-Env) routines of section 11, where the fixed similarity matrix gave the among-sample
applications relatlonships from the full set of species and the active data matrix contained environmental
.. f!' variables. Subsets of the latter variables were taken, and the among-sample distances computed for
~ each subset and correlated with the biotic similarities, the search being for a variable set that
.,f'A maximises p. However, there is nothing in the construction of BEST which limits its use to (fixed)
.:·~
species similarities and (active) environmental matrices. Either or both of the fixed and active
sheets could be from biotic or abiotic samples - the user needs only to specify a resemblance
.,~ coefficient which is appropriate for the type of data in the active worksheet. A number of possib-
-~ ilities can be envisaged. In what might be termed 'Env-Bio', subsets of species co~ld. be s~lcctcd
which best characterise the environmental gradient defined by a specified set of ab1ot1c variables,
-~ or best match a simple model structure, e.g. the seriation distance matrix for n equally-spaced
~ points on a line, which has l's down the diagonal, 2's down the inunediate off-diagonal, ··.,up to
.·· (l!i n-1 in the bottom left corner. An example of the latter could be the Phuket cora~s data f?r, ~ay,
-~~ 1983 (see the MDS on page 144), where the question would be 'which species, m. combmatton,
·best characterise the demonstrated onshore-offshore gradient?'. Or for samples which have an a
·.r priori group structure (unord~red), a relevant 'model' distance matrix is displayed on page 148, and
"j

.
·'fA
.:.,..'11'
,,~
•.. ,"''·'"···
··~
155

j
13. Further matching

an Env-Bio analysis in that case would search for subsets of sp·ecies which, in combination, best
characterise a defined set of groups. This is then a type of SIMPER generalisation that looks at all 1~~
the groups at once rather than pairs of groups. (It is equivalent to maximising the AN OSl M R r
statistic, PRJMER's preferred measure of group separation in high-dimensional space - a s imilar
use of R was seen in the linkage tree routine of section 11. It needs to be stressed again that having
selected an optimal species set in this way, it is totally invalid to re-test the groups with a simple
.,
ANOSIM test! The strong selection bias effect is allowed for, however, in the global BEST test of

page 124, so that could be used to justi fy interpreting the optimal species subset in thi s case). A
further generalisation would allow ordering on the groups, e.g. with the model distance matrix
shown on page 147. There the idea would be to select the subset of species which best character-
ised the ordered group structure of community change away from the oilfield (though a fixed
resemblance matrix of the actual distance of sites from the oilfi eld centre is an alternative and '
possibly more relevant model for identifying species delineating the impact gradient in thac case).
A similar use of variable selection to best match a priori ordered groups was given by Valesini F el
al 2003. Est Coast Shelf Sci 57: 163-177, under what might be termed an 'Env-Env' scenario, since
the variables were beach morphology characteristics, and thus the active matrix required a dis tance-
based resemblance calculation, such as normalised Euclidean. Other natural applications of thi s
type might include the selection of biomarkers to best display a given impact gradient determined
by tissue chemistry, the selection of morphometric measurements to best characterise know n
species or sub-species categories (unordered groups or ordered clines) etc, again supplemented by
the Global BEST test of page 124, to allow for the selection bias when testing overall significance
of the 'explanation' (but see the important reservations expressed in Chapters 11 and 12 of the
methods manual on the extent to which such correlative-type links of species to environmental
variables, biomarkers to tissue contaminants etc, are ever demonstrated to be causa l).

BY Step There is one fundamental problem with applying BEST (Bio-Env) in many of the above scenarios:
stepwise the number of variable combinations from the active matrix that must be considered in a full search
selection increases exponentially with the number of variables. For p variables, there are (2P - 1) combin-
ations, and this is prohibitive for p more than about 16 (c. 65,000 combinations). Searching across
all subsets of spec ies from a typical community matrix w ill therefore usually prove impossible. T he
•BVSTEP option under Analyse>BEST instead carries out a stepwise search: the best single
vari able is selected (maximising the matching coefficient, p); this is retai ned and the best variable
to add to this is selected (maximising p); these two arc retai ned and a th ird variable is added, and so
on, resulting in a declining number of combinations to be considered at each step. This is ca lled
forward selection. BVStep also carries out backward elimination: starting with all variables
included, the one that decreases p least, when omitted, is dropped from the set, and this el imination
process repeated. In fact, as is common with stepwise procedures elsewhere (e.g. in multiple linear
regression), BVStep implements both forward and backward steps successively, so that after each
addition of a variable by forward selection, the current set of variables is scanned to see if any of
the other variables can now be e liminated. (The analogy with stepwise mulliple regression is not
perfect, note, because there the residual sums of squares always decreases as more variables arc
added - here the p value may go up or down, giving a natura l optimisation). It follows, however,
from the fact that only a small fract ion of the possible combinations arc considered , that the rou tine
can become trapped in a non-op timal maximum, just as PRJMER's other majo r search routine,
non-metric M DS, can get trapped in a local minimum of the stress function (page 75). The solution
to this problem is the same as for MDS: repeat the search from a different starting position. So, the
user can specify, under the BVSTEP tab, how many random restarts are required (5 is the default
but more are desirable if they are not computationally prohibitive). Each restart is from a different, ..
F ...
randomly chosen, combination of the variables (6 of them, by default, though this number should
be varied experimentally, experience so far suggesting that small not large numbers are preferable).
Chapter 16 of the methods manual. giyes more detail on the operation of the forward/backwa rd .
.. ..
stepping algorithm and its application to the Morlaix oil-spill data, which is also described below .

Species sets The main application area for the BVStep routine introduced by Clarke KR, Warwick RM 1998,
'explaining' Oecologia 113 : 278-289, is what might be termed 'B io-B io', namely searching for subsets of
the overall species whose resemblance matrix best matches that of another (fixed) set of species. One can
pattern envisage this used on different fauna! (taxonomic- or trophic-based) groups to elucidate potential

156
-~
,·"
._..
··~
13. Further matching

interactions but the most obvious context is when the tWo biological matrices are from the same
.~ data. That is, the fixed similarity matrix is computed from the full set of species, and the active
~.iA datasheet, from which species are selected, is the same full species data. Now, the idea is not to
-~ maximise p, since it can always be made equal to 1 by choosing a subset which is the full set of
1•.~
--1 species, but to find the smallest possible subset of species which, in combination, describe most of
'-·~ the pattern in the full data set. 'Most' in this context is taken to be a conventional, and somewhat
·._r arbitrary, p>0.95. Once p gets to about this level, two multivariate patterns (e.g. as seen in 2-d

:r
··-~
ordinations) are effectively indistinguishable, and would not lead to different interpretations.
The procedure can be thought of as a generalisation of the SIJ\.1PER approach (page 140) to the
case of continuous multivariate patterns, rather than a clearly-defined clustering of samples. For
·~ example, in the Morlaix MDS of the time series of 21 samples (page 151 ), SIMPER could be run
:r ~
on, say, three groups of times: before and immediately after the oil-spill, and the partial recovery
phase, to identify all species contributing to the dissimilarity between each pair of those groups.
The BVStep procedure, however, asks a subtly different question, namely, is there a subset of
~ species which between them 'account' for the whole continuous pattern: the structure of initial
seasonal cycle, a period of marked and sustained change following the oil-spill, then a gradual

"'
~~
~
recovery with the re-establishment of the seasonal cycle? Not only does this provide a more holistic
answer than SIMPER (and, importantly, one that can be applied whatever the chosen resemblance
matrix), it is also more parsimonious in identifying indicator species: if several species are
contributing to the pattern in exactly the same way, BVStep will qnly need to select one of them,
~
whereas SIMPER will identify all as contributing something to the average between-group dis-
~
~ similarity. A natural corollary is then to ask whether the identified set of species is the on(l' subset
~ which is capable of 'accounting' for this multivariate impact, recovery and seasonal pattern (i.e.
)
would constitute a good set of indicators for this time series) . .In other words, is the same pattern

"
.f!"
~
reinforced in the matrix over several sets of species (what might be tenned structural redundancy)?

ii
,,, BVStep on
Morlaix oil-
spill data
Open the workspace mxwk from the data first met on page 150, direc~ory C:\Examples v6\Morlaix,
or just open mxma, the abundance datasheet of 21 sample times by 257 species, since that is all that
is required here. Clarke & Waiwick 1998 reasoned that many of these species were sufficiently rare
~ (about half have totals across all samples in single figures) that the problem could be scaled down a
-~
151 ~ little by working with the 'most important' 125 (see page 93). Thus, Select>Variables>( •Use n-
"~ most important where n is 125) on mxma, and subject it to 4th root transfonn, naming the outcome
mxr4rt say. (A severe transfonn seems the best choice, otherwise counts of tens of thousands in a
~~ few species would dominate.) Produce the MDS ordination from Bray-Curtis similarities on this
~ reduced, transfonned data, calling the resemblance matrix mxr4rtB-C. This is the fixed matrix in
Analyse>BEST which operates on active datasheet mxr4rt, searching for the smallest possible
-~
subset of the 125 species that effectively contains (to within p>0.95) the same among-sample
'
--~ information as mxr4rtB-C. It is clear that the full enumeration of possibilities in the •BIOENV
·~..~ option would never be possible (2 125 species combinations!) so the stepwise option of •BVSTEP is
-~;~ necessary. Even with the halving of the species numbers, it must be appreciated that many of these
·~i~' 125 species will be highly inter-correlated, and it is inevitable that many marginally different
combinations of species will do an almost equally good job as indicators of the full data set (see the
"~f!i comments about linking biotic and abiotic variables on page 122). It is desirable therefore to start
"iA ..
~~·:~,···.
. the search from several random subsets (perhaps 20 or more), and look at all the output results, if
"':••
only to appreciate that we are very far from being in a position of a single 'correct' answer!
·i~~ Nonetheless it is interesting to see that the detailed MDS based on 125 species can be reproduced
<~;;6
<~·--·· almost perfectly by several competing selections of only 9 species, as follows.
.t• ·.~'.~-····

.,~ BVStcp On mx.r4rt, Analyse>BEST>(Method•BVSTEP) & (Resemblance matrix (fixed): mxr4~B-C) &
starting and (Results detail: Detailed), taking the defaults for all other entries (e.g. Speannan correlat1~ns, the
·~ stopping suggested Bray-Curtis similarity, all 125 species 'Availab!e' for selection, and the permutation test
...~ options ignored - clearly a test of p = O makes no sense in this context and is invalid when the same data
,. . .._
'.i,, \(-"~~···
are being used in both matrices). On the BVSTEP tab, take' Starting vari~bles•Fi~ed. Note that an
·alternative to the default of starting from no variables included, and then. mtro~ucm~ them one-~y­
(~ one (foiward stepping}, is to include all of them initially, so that the routme will begt? by dr?ppmg
~ species (backward elimination). This is achieved on the BVSTEP tab, ender (Startmg variables•
tfu.~

••
,._"~~
___
157

I
13. Further matching

Fixed)>Variables, and moving all species across from Available to Include with The other EJ.
item (Stop! criteria) on this tab can usually be left at the defaults - this terminates the search when ,:·
adding the best new species to the current set gives p~0.95, and dropping any of them then reduces
p to <0.95, or if the increase in p when adding the best new species is not more than 0.00 I (i.e. the
run is considered to have converged at an optimum p which is less than 0.95).

I Pre-treotme
ti mar4rt ~-:----:--
..:.i
. . "1€9~ -
~~
··-···· ...·.. .. . •
Morlaix (125 species) 4th roof counts-=-="'"""""""' Molt.vd RarJc corretsllOr• niCll.xJ

Abundance I L. ••
DIVERSE ...
0
0
l3IOENV ( ol combrahon: )

BVSTEP (otepwi$e •e..,ch)


OOl'O!S .••
l !NKTREE..• ~
. 1 PCA .•• Resemblance met/ix (ruced):
Imxr4rta.<:: .... I
Re:~:

Resemblonc:e •••
Mox num 01 best re:ua:

Select votiables ..•

---------------------.
·~ ·.. ',, ·. · ,:.,..:z:. ~ .. • .. :·:, j, ..... .. ~· ..'...·1.:t:~..i: ..~... ..~: ~
1~: BiOENV~---=--------,
S!art'1!1 verlalJ/es Sl¥c•~er i•
RhO •: ·0.95
- ., I
'
0Fixed
~, Variable s ... Pelo rho •: 'D.CiOll ..
..

1123 23 l
12'1 2 37
Morla1x MOS (a/1125 sp) 41h roo/, Bray-Curtis 125 244
10suus: om
Steps
No.Vo.r:o Corr . Se lec,ion:o
l 0 . 729 58
2 0 . 776 35,58
3 0.796 35,58,81
4 0.646 12,35,56,61
5 0.693 12,35,56 , 6 1, 97
6 0. 914 l , 12,35,56,6 1, 97
7 0.936 l,12, 35 , 56 , 75,61,97
8 0.942 l,12,13, 35 , 56,75,81,97
9 0.947 l,12,13, 35,58,75 , 81,97 ,110
10 0.952 l,12 ,13 ,24,35,58,75,81,97,llO
>

Note that the results are always given in terms of the row (variable) number in the active matrix.
This is the reduced species set in this case, so the list at the start of the output window shou ld be
consulted to trans late these into original species numbers (thus if 125 appeared in the optimal set,
this would refer to original species number 244; use of species/variable names rather than numbers
is therefore usually preferable throughout PRlMER, to avoid any risk of confusion).
BVStep from Now re-run the routine, and on the BVSTEP tab take: Starting variables•Random selcction>(Num
random starts of trial variables : 6) & (Num of restarts: 10). This will begin from a randomly chosen 6 species ....:
from the 125, and Results detail: Detailed will allow you to see the alternating backward elimin-
ation then forward stepping phases in the Results window. In the run shown below, the best
solutions (fewest species for p~0. 95) do slightly better than for the above fixed start, which - as it
happened - used all forward selection (this was not a constraint, but will often occur when
matching a matrix to itself) . Each of the 10 restarts gives a different solution (and you will obtain
different ones again, since a different random number seed w ill be used in every new run) ; many
more than I 0 would be needed to be sure that there is not a solution with fewer than 9 speci es. But
to set out on an exhaustive search here misses the main point, that the impact and seasonal structure
in the above MDS (which, importantly, is largely 'signal', because of the large sample s izes - we
are not 'chasing noise' here), is capable of being reproduced by a small set of just 9 or 10 species.

158
..
13. Further matching ·

Show this by selecting out your optimal species subset from rnxr4rt (most easily by copying Ctrl-
C, and pasting, Ctrl-V, the string of species numbers, separated by commas, from the BEST Results
window to the Select>Variables>•Variable numbers box), and produce the sample MDS on Bray-
Curtis similarities. That this is not the only small subset of species capable of generating this
pattern is obvious from the plethora of solutions, and the further analysis in Chapter 16 of the
methods manual, and the Clarke and Warwick 1998 paper. In particular, they re-run the BEST
routine, excluding this first subset of 9 species from the 125 in rnxr4rt, but again matching to the
mxr4rtB-C similarities, and by repeated exclusions show that further 'peels' of 11, 14, 18 etc
species can be found which essentially reproduce the same multivariate patterns, indicating a high
level of structural redundancy in the matrix. To reproduce their first exc lusion step, the easiest way
of removing the 9 species is to Select>All on the currently se lected mxr4rt, which will leave these
9 species highlighted, then Edit>Invert Highlight and Select>Highlighted (and perhaps T ools>
Duplicate, changing the title), then run BEST as before but on this reduced set of 116 species.

23 ,57, 60,69,111, 117


5 0.575 23 ,60,69,111,117
4 0.615 23,69,111,117
Slop! crterl..' 3 0.651 23 ,69,1 17
Rho • : fD."gs j 4 0 . 780 23, 69,82,117
5 0.859 6,23,69,82,117
Dcao rho <: iD.QOii 6 0.897 6,12,23,69 ,82, 1 17
7 0.918 6, 12,23,46, 69, 82 , 117
6 0.919 6,12,23,46,82,117
7 0.933 6,12,23,46,82,109,117
8 0.943 5,6,12,23,46,82,109,117
9 0 . 949 5 , 6,12,23,46,82,99,109,117
10 0.954 5 ,6,12,2 3,4 6, 58 ,82 ,99,109,117
9 0.9565,6,12,23,46,58,82, 99,109

Bes e resul es
Multi pl e No.Vars Corr. Select i ons
1 9 o. 95 6 ®!W'i9&MffW1'1liE\1Etlii:
9 0.953 1, 6, 12,23,35,59,8 1 , 110, 124
9 0. 953 1,6, 12 ,23,59,65,75,81,110
l 9 0.953 6, 4 6,58 , 75,82,92 ,97 , 109, 115
0 Variable numbers
1 10 0 . 954 6,58,74 ,81,97,98,103, 104,109,110
10 0 . 952 1, 12 , 13,24,35,58,75,81,97, 110
10 0.951 12,23 ,35,43, 59,74 ,75,81,98, 109
11 0.953 8,12, 23, 24,36, 37 , 46,58, 75,81,97
() lndlcolor levels
;::;:;;:a,..;:til~.::.:J'""
·
;;;=.;;~;:,;;i;~Di'2~M~10~r~la~ix MOS ('/J~st' 9 species) 4th root B-C
20 Slrtss. o.os
0 Use n-most importont where n Is

O Use those thol contribute at leost

0 No rnissing value:

Concel

P~ute

Insert
0 2.3003 · 2.6043 Delete
1.4i42 J.2532.13.118 Move
l~~!~l~~.f-2B_9_2....,
5 ~~~ j 3.8942_ ~~-r_
t ~~-
<~
-
: 2_0SSS .1-~2.53:J....2~2
2.4663 2.5457
Pro~rties ...
2.3403 Labels ~ 10
~ < -j·--- ·----·--..- --) Factors... Stop Tasks .. 11

....

~ • f- ~~cat~._
.. _
i.. Options... i <

159
13. Further matching

Multivariate The only mu ltivariate routine not so far met is A na lyse>MVDISP, applied to a resemblance matrix
dispersion from samples with a simple group structure ( l-way layout) . This gives a description of relative ~·
MVDISP multivariate variability within each of the groups in a single ordination or, to be more precise, in
the full-dimensional space of the rank similarity matrix underlying that ordination. (As such it is
not a matching of multivariate patterns and doesn't really belong in this section!) The concept is
straightforward and discussed in detail in Chapter l 5 of the methods manual , so only an example
will be given here. Tables of the dispersion sequence of all groups (eq 15.4) and the index of
multivariate dispersion (IMO), comparing pairs of groups (eq 15.2), are output to the resu lts
window, and describe differing dispersion across groups on the basis of dissimilarity (or any other
resemblance measure) within groups - between-group dissimilarities are not used. (For the special
cases of Bray-Curtis and Euclidean distance, an alternative might be to run Analyse>SIMPER on
the transformed data sheet and look at the headings to the first set of tables, each of which gives the
average similarity of all pairs of replicates within that group.) The term multivariate dispersion i;,

rather than variance is used because the relationship between the univariate variance of the original
variables and the dispersion in 'resemblance space' (and its low-dimensional ordinations) can be
far from linear, depending on the choice of resemblance measure. For example, simi larity measures
in the quantitative Bray-Curtis family (see the top half of page 153) are driven partly by the
presence/absence structure of the data, as well as the magnitude of counts from species which are
always present, and this inevitably involves a non-linear transformation of original variable scales.
Similarly, something as simple as normalisation, used in a Euclidean distance analysis of environ-
mental variables, will remove any direct link betwe.e n variance on the original measurement scales
and dispersion in the multivariate space. Any statement about relative dispersion, therefore, must
be contingent on specifying the resemblance measure used. Clarke et al 2006 (see top of page 152),
show the radically different conclusions that would be reached, for an Indonesian reef study (see
section 15), about dispersion among transects before and after a coral bleaching event, under Chi-
squared, Bray-Curtis and Euclidean-based analyses - with Bray-Curtis being intermediate.

(Mesocosm The example used here is a simple 1-way design: three mesocosm treatments 'of Control (C), Low
experiment, (L) and High (H) dose of organic enrichment applied to the surface of 12 intact sediment cores,
Solbergstrand taken from the same location into a mesocosm system, and randomly allocated to the treatments (4
copepods) replicates in each). Chapter 15 of the methods manual analyses the resulting meiofaunal (nematode
plus copepod) communities in the sediment cores after several weeks' exposure, but here open just
· ~ the datasheet of copepod counts, slpa in directory C:\Examples v6\Solberg. With square root
transform and Bray-Curtis similarities Resem l , plot the MOS, and note the apparently much larger '-
.-
dispersion within the High dose treatment (as well as the obvious differences between treatments,
which would be validly tested by 1-way ANOSIM). This is borne out by Analyse>MVDISP>
(Factor name: Treatment) on slpa (which operates on the high-dimensional similarities not the
ordination space). The dispersion sequence of 0.5 6, 0.84, 1.60 for L, C, H shows that the average
rank dissimilarity is almost three times higher within H than L (comparable dispersions result in a
sequence of l's), and the pairwise comparisons show that all the lowest dissimilarities (within a
group) are in L and all the highest in H (thus IMO = -l ). The result is of limited usefulness since a
test of these dispersion differences is not possible (without linear modelling of some sort) .

D i!!per!I ion
H 0.561
0.8'12
H 1.596

. ~

160
.{J pi
.... 14. Biodiversity measures and tests (DIVERSE, TAXDTEST)
..:~
:: Input/output PRIMER computes an extensive set ofunivaliate diversity measures, covering most of the standard
~-. ~ for diversity indices used in ecology. The active sheet is a data matrix for which the chosen indices are calcul-
~ ated for every sample. The measures arc selected by ticking check boxes, so any combination of
them can be computed in one run, and the results output either to the results window in a tabular
~
format (which can be copied to the clipboard and pasted directly into Excel) or as a samples-by-
~ variables matrix in a second worksheet. The latter can be saved, as with any data matrix, in text or
~ Excel format, for input into a standard statistics package that will perform univariate ANOVA etc.

"' Presentation The facility to send the indices to a new worksheet also allows some interesting possibilities for
~ of diversity further presentation, including multivariate analysis. For example, the indices can be superimposed,
"" information one at a time, on an MDS plot for the full species assemblage data (treat the diversity matrix like an
environmental variables data file) or input the diversity matrix to a multivariate analysis itself
f!' (again treat the indices as an environmental array and calculate normalised Euclidean distances
~ between samples for an MDS, or run a PCA). This will show the between-sample relationships
obtained from the full range of diversity information extracted, and can be contrasted with the usual
~ ordination exploiting the matching of species identities between samples (which is generally found
~ to be more sensitive, since it exploits more of the available information). A PCA for a set of
diversity indices can also demonstrate how many genuinely different axes of information they have
~
captured (i.e. how many PC axes explain most of the variability), since ·many of the standard
@" measures are really just some weighted combination of two features: the total number of species
~ (richness) and the extent to which the total abundance is spread equally amongst the observed
species (evenness). An 'inverse' MDS plot, based on resemblances between indices (variables)
~ shows which measures are essentially equivalent. These analyses can be a useful counterbalance to
~ the inclination to proliferate indices by calculating yet further variations of the same information.
~
Taxonomic One of the distinctive features of PRIMER is its inclusion of a suite of biodiversity measures based
~ distinctness on the relatedness of the species within a sample, e.g. the average 'distance apart' of any two
~ species or individuals chosen at random from the sample (termed average taxonomic distinctness).
This is usually defined from a Linnaean tree (though could be phylogenetically, genetically or even

""
~
~
functionally-based) and requires availability of an aggregation file relevant to the data sheet. It
provides an additional dimension of information to that obtainable from the abundance distribution
alone: as an average measure its construction makes it independent of the number of species, and it
therefore has ·much better statistical sampling properties than richness-related estimators when

"
~
~
sampling effort is non-comparable across samples. (This should be seen as a major sphere of
application: \.mcontrolled studies over wide spatial or temporal scales, where classic diversity
measures can be misleading). Several papers give details, e.g. Clarke KR & Warwick RM 1998 J
Appl Ecol 35: 523-531 and Warwick RM: & Clarke KR 200 l Oceanog Mar Biol Ann Rev 39: 207-
~ 231; see also Chapter 17 of the methods manual. In just the same way as for the classic indices,
~ PRIMER calculates a range of taxonomic-related measures (including the 'PD' of Faith DP 1992,
Biol Conserv 61: 1-10), accessed through check boxes on the Analyse>DIVERSE menu. These
~ can be separated into quantitative indices (e.g. .1, .1*} and those which depend only on a species list
~ (superscript+). The latter divide into average measures (e.g. /:Ji.+, A} which have the 'independence
~ of sampling effort' property, and total measures (e.g. S.1+, S<l>) which are alternative definitions of
'richness', combining the number of species with relatedness information. For two of the PIA
@I'
measures, a hypothesis testing structure can be erected, to compare observed /:Ji.+, average (and A+,
~ variation in) taxonomic distinctness with that 'expected' from a master regional list, and this is
handled in Analyse>TAX.DTEST, accessed only when the active window is an aggregation file.
~
~Standard The range of possibilities is illustrated. with the macrobenthic abundance file clma.pri from the
"1indices Clyde sludge dump-ground study, directory C:\Examples v6\Clydemac. Analyses so far have
f'calculated mainly used the abiotic and biomass matrices, and the previous workspace may have become
cluttered, so open a new one for clma. Without pre-treatment, take Analyse>DIVERSE>(~Results
~ 153~ to worksheet). Look at the options on the first 5 tabs, taking only ~S, ~ d, ~J', ~a., ~H' (base e),
~ ~I - A.', ~ES(n) with n values: 15, 30, 45 (there is no special significance to the grouping of
indices under tabs, except that the last two deal with taxonomic-relatedness measures, seen later).
~
~ 161

~
fJ'
~
flP'
~
\..
P'
.
14. Biodiversity measures

The abundance of ~he ith species is denoted by N; (i = I, 2, .. , S) and, divided by their sum (N) i~
denoted P1(i = I, 2, .. , S). The first 5 tabs (where ..l's denote the default selections) are:
Other
" Total species: S
..ITotal individuals: N
..ISpecies richness (Marga let): d = (S- 1)/log.JV
..IPielou's evenness: J' = H'llo~
Brillouin: H = N 1 lo&{N!/(N1!N2 !. ..N5 !)}
Fisher' s a statistic
Shannon
..IJI' = - :E P 1 log(P1), where the logs are to the base e
H ' as above but for logs to the base 2
H' as above but for logs to the base 10
Simpson
J.. =:EPl
- 1
1-1.. = I -(:EPn (
I..' = {:E 1 N1(N,-1)}/{N(N- l )}
..II-I..'= 1 - {E1 N,{N,-l)} /{N(N-l )}
Hill numbers
N I= exp(H')
N2 = !tr. P/
N<rJ = l/max,{P 1}
N1o=NIIS
N10' = (NI-1 )/(S-I)
N21 =N21N1
N21' = (N2-l )/(Nl-1 )
Rarefaction (Sanders/Hurlbert)
ESn, the 'expected' number of species from n individuals (n:::; /\')
E
E
E
0 Tecol species: s
QTDCol hdlY!duolJ: N
0 Species roemess (Marg

0 Plok>Js cvcmess: J •

S d
B:I Flshcr ES(15 ES(JO ES(45
J
18 4.0o4
1 ~ 0.011 s.660 10.s1 13.83
o .~8
tlW-:~-:28
~ 5.18 o.1n 2.371 9218 8.641 13.20"1ii3.
0 N1 • Exp(H') --.-=--r--· l~t=1 43rs.0o4-o:.COS-IT70 -eD«°"•-9668.~
0 N2 •I/SI ox . ~-0.310 1.:ne- eli714527 ·1.D47.. 9.o21
0 r-.infnly • 1A'mox 26 ··i14.""ii:«91.4473s'547:i3.. s.i01ssse
0 1-x - 1.
3 o.3' o.547 o.see o.455 2.1s2 ~45
0N10•N11S ox· • SU\(N tr • ·S\.M(PI'
Si' 1"1;1.io 0214 0512 1.249 2.383 3.169 3635
ONIO' . (N1·1).(S-1) 22 2.72°omo:ii923.3ii6 3.126 4~ U4~
Log bu• 1
01-X' • 1·S S8
0N21 •N2N1 0e S9 27 4.88 0.707 2.150 6.341 7.590 11 .08 13.60

0N21' • (N2·1).(N1-1 s10 3:4 6.62 Oiio 2.554 '13.93'9.Si21siil19iii


02 Sii 22 4.99 0.806 2.118 11.41 8.7CM 13.53 17.43
0 10 ::vS12 ~Q.915 2.253 13.46 10.42 15.84 19.76.

0 Res\A: lo wetkshcct
rt.f < >

OK Cane~

0 Rest.Cs lo wetkshccl

OK Cone cl Help I

162
14. Biodivc["!:ity measures
..
Multivariate For the diversity (variables) by samples matrix, Data!, Analyse>Draftsman Plot>(..l'Correlations
analysis of to worksheet) shows that none of the indices is badly behaved (highly skewed, dominated by
diversities outliers, strongly curvilinear relationships etc) so no transforms seem called for. Data l will need
nonnalising, however, before entry to Analyse>PCA since the va rious indices are on different
scales. On the configuration plot from PCA, tum off the variable vectors (on the Graph> Special
,. menu) and overlay the transect trajectory using the Site# factor (also on the Graph>Spccial menu).
.'Iii Site 6 is the dumpground centre, with sites 1 and 12 at the extremities of the transl!ct, and this
combined set of d iversi ty indices clearly displays the strong, simple gradient of effect, in a rather
similar way to the full multivariate analysis of the original species data (you might like to carry out
the latter, with a fa irly severe transformation and Bray-Curtis similarities). The agreement is a
consequence of the severity of the impact. The 'meta-analysis' of Chapter 15 of the methods
manual shows this to be the most severe of the contaminant studies examined there, but the manual
(Chapter 14) also shows that such agreement is untypical, diversity measures being less likely to
detect biological change for more intennediate-level disturbances. The PCA results (eigenvalues)
also make it clear that rather little is to be gained by calcul<:>.ting ten diversity indices instead of two
or three: over 83% of the total variation in the l 0 indices is accounted for by the first principal
component, and 97% (i.e. all of it, in effect) by the first two PC' s. The coefficients (eigenvectors)
show that the simple lefi to right gradient in the main axis (PC ! ) of the PCA plot is a roughly
equally weighted combination of all measures (evenness + richness), both declining near the
dumpground, w hereas the second axis strongly contrasts the two main diversity components: PC2
is effectively (evenness - richness). This simplicity should come as no surprise, g iven the very high
correlations between indices evident from the draftsman plot, and from the correlation matrix
Rcscm I that was output by its creation.

Cryae mttauna eounct. dr.ersRy lnd1cc:s


• Pre·treotment .••
Resembl•nce ...

BEST••• Dot o work.s heet


CASWEl.L ... jl'lame : Dat.a2
DIVERSE... Dat.a t.ype: Ot.her
7 OOMDIS .. . Scunple :!!election: All
'Variable ~elect.ion: All
L!Nl:TREE .. .
PCA ... 5 ,,. ,. .~.
SIMPER...
i .. , .. , , . Ei g1'n vdl ue.s
PC ti11envalue" ~Var int. ion Cum.!c Variot. i on
SIMPROF•••
1 6. 36 63. 6 63. 6
OornNnce Plot ... 2 l . 39 13. 9 97.~

3 0.166 l. 7 9 $.1
4 s.n:-2 0.6 99.7
:1 ~
5 l . 921:-2 0.2 99.9
:' :'

Cfyde. PCA oi diversity indices


'-- 3
Rescrrbl6nc c. ..

BEST. ..
d J
(.ASWELL. ..
.0.57S -4.9007E-2 0.30133 .J.372JE-2I DIVERSE ...
(. ·.:;i.::.~-
_0-.2-67_,5I -o:s3577~r- 0.6Gl: ,7"l .0994 DOMOIS ••.
1.5613 0.97671 .o 74231 ·0.19575
1
Wft:TREE ... ...c 0
'>-· - --;:.95 - 0164o1- :0.90569-:0'.Jsss J , • . .
SIMPER .. .
l 0.115 , ·0.51365
-1.6688
.0.565 ·0.22951. SIMPROF.. .
·1.!M99 .o.135•3 ·1.•"'7
·'
c.
I
-:1:11aa....,~ -:i5iiiJ--:;sT;;g' Dorninonce Plot...
Dr41tslnM Plot ...
.3
Gee<ne~rlc 0•>< Pl
.4 -2
~~-
ecle_
s·A_
ccurnP l-~~~~~~~~_.;P~C~
1 ~~~~~~~---'

163
0•
(')
A final, revealing plot can be produced from Resem l , by ordinating the variables. Technically, it \...-
first needs transfonning before it can be considered a similarity matrix: there is a small, negative
correlation between S and J'. It is zero in effect, in this case, but other situations might produce
large negative correlations, e.g. between equitability and dominance measures, and they should also
imply similarity (of variables). As seen on page 67, submitting Resem l to Tools>Transform>
(Expression: 100* ABS(V)) will achieve the conversion to a similarity matrix, and Analyse>MDS
then generates the variables ordination plot shown below, in which the relative distances apart of
the indices exactly reflects the rank order of their pairwise correlations (note that M OS stress is
effec ti vely zero). The configuration is largely linear, the extremities corresponding to ' pure'
richness (S) and evenness (J'), with other measures being a mix of these two components (the
points have been more descriptively labelled using the Edit button on the Data labels & symbols
menu, which is l!quivo.lent to Edit>Indicators from the main menu, then Add an indicator name.
The MOS plot has also been 'rectangular zoomed', see page 87, to change the border to a more ·~
relevant shape for this strongly linear plot, though wi thout changing the aspect ratio of the points!).
Values of n ~ 15, 30 and 45 were chosen fo:- the rarefaction indices ES(n) because larger values are
not pemussible, the site with lowest a~undance having only 46 individuals. (To see this, submit
clma to Analyse>Pre-treatment>Standardise>(Standardise•Samples) & (By•Total) & (./Stats to
worksheet), giving sample totals in the secondary output worksheet.) The fact that the 'expected
species numbers' ES(n) are considerably closer to being evenness measures than the richness e1
indices that their name implies (correlations of about 0. 9 with J' and 0. 98 with H' , compared with
about 0.3 with S) results from the lack of ecological realism in their underpiIL'ling model. This
assumes that individuals arrive randomly and independently into the sample, and hence the process
can be reversed in rarefaction, by randomly and independently excluding them. This does not ~
correspond to the reality of a heavily 'clumped' spatial distribution seen for many species (the issue
F-·
tackled in dispersion weighting, page 40). Save the workspace (clwk2) for use in the final section.
l.
ml
Model M.ot
Check. .. Clyde mlfauna abund: diversity indices

Clyde mlfauna abund
Corralolion (. 1 to f) I Oisslm ...
D<.(liicate..
S1mi!emy (0 lo 100)
€~-
c :. ""'. :;-:s; .rs;;m Rork. ..
~AllS(1 OO' V)
-,
4~...--e:-~.~-~• ...,...~...,..........-..~.,.._.......~-:-<"~
""--..;.._'-<"~lf-4-~'....-~.....................""'"_,_......:..,_;...-t-'.:£,...,
~(loQ•
~ s d

, -- ---! 0 84491 Step Tasks


f;..-:-.- .c·.-.. .;~. 1 J ~ Is
Id 8' 49
~
I
.1.01ec.3 o'A~-· ~... Poe!.
·- - - i
0.101 49.93
~: ;
'-'-----1- - - '
• a.1c<..... _o 45_1~~_
0,54369
o .~11 __o.e::s21
0.69613__0.76565
Type
Ocelvou. .,__ _ 45.10
----; - .• 69.61
___, 54.3E
-- --
82.11 65.52
76.56 90.&2
~I
- ---1 - - ,, __
29.31
74.19 91 04 96 ;9 92.91

~--

0 29317 0.74197 0.91043 0 Flllciion


11 JJO ee.se 95 67 95.oe 99.53
33.s2
·--"-_,__-i J~.26 79.26 67.64 - 94.919s.ee~98.88 99.65
I
e--!
1-L ombcill'
€'. ~
[ AdcL
Is L- nome
s
-l
I Co'l'blne... I ---
Matgolcl d 2D Stru::r: 0.0 1 ~~
---
I
J Pielou J
.. ......
...:1
[ Remvne ...

( Re0<der ..
I
Asner
Brilouin

ES(15)
Brliloo.in H
Fisher
Plelou J'

ExSp(45-15) Fisher

••
Shannon H '

Margalefd
s

€-~
- ~..
( Oelele... I ES(30)
Simpson • •
ES(45)
- - --
ExSp(45-15)
• Brillouin H

~

173 *- C!<e!=-1 H'(lo9e) Shomon~


--- €- -~
1-LorOOdo' ~.,,,..
I >
~....c_
(Bermuda ~
Soft-sediment macrofaunal assemblages (along with meiofauna and biomarker suites) were studied
macrofauna ) ~
a t 6 sites in Hamilton Harbour, Bermuda (labelled H2, H3, H4, HS, H6, H7) during an international ··1

• -t
IOC workshop on the effects of pollutants in sub-tropical waters (Addison RF & Clarke KR, eds
1990, J exp mar Biol Ecol 138). There were 4 replicates at each site, giving a data matrix of 24
~
.:~
samples from 64 macrofaunal species in the v6 format file bdma in directory C:\Examples v6\ ~
Bermuda. These data will be used to illustrate computation of another diversity index, not now
widely used (the validity of its assumptions being questionable for most assemblages) but which
has been available in PRIMER s ince its earliest versions and therefore retained for consistency.

164 .,.
-· 14. Biodiversity measures

Caswell's Analyse>CASWELL generates V statistics for the Caswell neutral model, and is discussed in
nei1tral Chapter 8 of the methods manual. It is essentially a comparison of Shannon diversity H with the
model value it would be expected to take, conditional on the observed number of species Sand individuals
N, under some simple model assembly rules for the community, which are 'ecologically neutral', in
the sense defined by Caswell H 1976, Ecol Monogr 46: 327-354. The normalisl!d form of H
(subtract the modelled mean and divide by the modelled standard deviation) is the V statistic,
positive values of V implying greater diversity than 'neutrality' and negative values lesser. {There
,, is an F test of departure from V = 0, though this is not very convincing because it also depends on
the rather unrealistic assumptions of the neutral model for typical asscmbhges). The algorithm
implemented here is due to Goldman N & Lambshead P JD 1989, Mar Ecol Prng Ser 50: 255-261.
Recreate the Caswell example in Chapter 8 of the methods manual, for the Bermuda macrofauna
worksheet bdma by firstly summing across the replicates, to increase the sample size, with
., Tools>Sum>{Samples•Sums for factor: site) & (Variables• No summing). This is justified because
there ue equal numbers of replicates at each site; Tools>Average would not be appropriate for a
Caswell calculation because the entries arc no longer real (integer) counts. Note that V could
alternatively be catculated for each replicate, exactly as for the other diversity measures above, and
this would allow the usual type of ANOVA tests based on variance estimates from replication,
.. rather than the (less reliable) internal variance estimate from the neutral model. On the summed
datasheet Data l take Analyse>CASWELL>{ v"Results to worksheet), and the V values for each
site (and the accompanying test calculations) are found in the resulting Data2 wOI:ksheet, which can
be manipulated, saved etc as with any other data matrix. Sites H3 and H4 are seen to have H' well
below 'expectation' under the neutral model ( V statistics of -5.4, -4.5 respectively}. Close the
' workspace; it will not be needed :igain.

S~s

0Nosurrmng
~um: tor foctor:
sic

31
12 2 35
1
0
6
1i
1
23
4
I
\!:•No :UfMW't9 4~ am- o 0

0 0 C• Sums '°' indicotor: Prc·tr catment .. .


Rcsombloncc .. .

SEST .. .
m--ty 1,:1 Jt;:{1iit•rt;~l
DIVERSE~
OOMOIS...

Plot ...

· ~

Range of Returning to the standard diversity indices compared on page 164, to obtain a measure which steps
relatedness outside the species abundance distribution (and which could strike out along a different axis to the
indices linear richness-evenness combinations shown in the indices MOS), it \.\'ould be hclpfal to introduce
calculated further attributes of the assemblage composition. One possibility is to mix biomass and abundance
data, as in ABC curves (section 15); another is to introduce information on the relatedness of the
species in each sample, as discussed on page 161. These indices arc accessed through the final two
tabs on Analysc>DIVERSE, namely Taxdisc and Phylogenetic. The nomenclature comes fro~
'~ the original papers on these topics (Warwick and Clarke's 'ta;<0nomic diversity' and 'taxonomic
c distinctness' indices, and Faith's 'phylogenetic diversity'), and docs not imply that either set of
indices is more appropriate to taxonomic or phylogenetic hierarchies. Other hierarchies (~.g.
.., genetic, functional) could be equally appropriate but note that PRIMER is restricted to representing
"-" the hierarchy by an aggregation file, with fixed (though not necessarily equally-spaced) levels.
c 165
14. Btodl\'ersity measure~
~

The relatedness indices are all denoted by upper case Greek symbols, with superscript+ if calculated ~
from species lists. For definitions, and extensive discussion, see Chapter 17 of the methods manual. ~
Taxonomic distinctness "i
Quantitative:
~
Taxonomic diversity: 6
Taxonomic distinctness: 6 * ~
Presence/absence: ~
·.'.,I ~
Average taxonomic distinctness {AvTD): 6 + -7:)
Total taxonomic distinctness {TTD): S6+ :<.~ ~
:
~.

.
Variation in taxonomic distinctness {VarTD): A+
Phylogenetic diversity '
.... . ·~
t!'A)

~
Presence/absence:
~
Average phylogenetic diversity (AvPD): <I>+
(Total) phylogenetic diversity: S<I>+ {Faith's 'PD') ~
~ '•
~
Specifying If any of these taxonomic distinctness measures are chosen, the Taxonomy button under either the
aggregation ~
Taxdisc or Phylogenetic tabs should be taken, and an· aggregation worksheet, which is already in
sheets the workspace, specified under 'Aggregation data' (if only one has been read in, it will be the ".~
~
default). This is a look-up table which gives the 'family tree' of all species (or whatever the lowest
~
taxonomic level represents) and allows calculation of the path length {/weight) between every pair
of species. Such matrices are a distinct worksheet type within PRIMER, containing character data ~
as entries (if numbers, they' are treated as characters), and with a different extension (*.agg) when ~
saved as PRIMER fonnat files. The aggregation data could simply be a tree constructed for just ·
~
those species in the current data matrix or it could be a wider and more comprehensive 'master list'
for those faunal groups. The species (or other variable) names used in the data worksheet must find ~
an exact match somewhere in the first column of the aggregation sheet (or if using a higher ~
taxonomic level in the aggregation matrix as the variable names for the data then this must be
specified in 'Current level of sample data'). The species do not need to occur in the same order in ~
the two sheets since, as elsewhere, PRIMER v6 uses strict label matching. Page 98 has some useful ..;.
,
~
tips on checking aggregation arrays for consistency, and potential mis-spellings, with Tools>
~
Check. There are also options to use only part of the taxonomic tree, by starting from genus level
{say), in effect treating all species in the same genus as if they are the same species. This is not a ~
routine requin;ment but could potentially be used if the identifications are very patchy to species ~
level, but reliable to genus level. To achieve it, specify in the Taxonomy {Data) dialog box: Use
links>(From level: Genus). Similarly, the tree could be compressed at the top level(s) so that, for ~

example, no greater distance is assumed between two species in different phyla than between two ~
species in different classes in the same phylum - specify Use links>(To level: Class).
~
Weighting
of tree step
The other box in this Taxonomy dialog can be used to alter the weights given to the various branch
lengths in the tree (and includes the previous compression at the top or bottom of the tree as a
-· ~

~
lengths special case, with those step lengths set to zero). By taking the Weights•User specified>Weights
~
button, the default lengths are displayed: equal steps are assumed, and any values pJaced here will
always be standardised, subsequently (and automatically), so that the longest path in the tree is set rlll)
to I 00. Thus a change to step lengths of 2 for all categories would not alter the values of any of
~
indices, but a change to decreasing step lengths of 6 (species to genus), 5 (genus to family), 4
{family to order) etc. could be worth expforing because it would put relatively more weight on the t-.· ~
shorter branch lengths between species {of which there are fewer) rather than leaving much of the ~
emphP.sis on the longer branch lengths (because there are many). One logical basis for altering the ,.-. .
step lengths from their default would be to make them depend on the change in the number of taxa 4
in the master 1ist when making that step - the smaller the change in the number of taxa, the shorter ~
the step length. This has the merit of consistency if, for example, one were to interpolate a ~
taxonomic level (subfamily, say) which was entirely superfluous, so that there were as many
subfamilies as families in the master list: the taxonomic distinctness indices would then remain ~

unchanged. The detail is given in Clarke KR & Warwick RM 1999, Mar Ecol Prog Ser 184: 21-29, ~
and their weighting scheme can be implemented here by taking (Weights•Taxon richness) in the
~
Taxonomy (Data) dialog box. ,_,
166 ~
~ ~
14. Biodivcr5ity mc::sures

Taxonomic The example used to illustrate these indices is the NW European shelf beam-trawl surveys of
groundfish assemblages (93 species in 277 samples, divided into 9 sea areas). Further details of the
' distinctness
........., ...
(groundfish
suryeys)
study are on page 50, with the aggregation sheet checked for consistency on page 98. If you do not
have a saved workspace gfwk in directory C:\Examples v6\Grdfo:h, open a new one with abund-
ance matrix gfa and aggregation sheet gfagg. In this case, the data and aggregation matrices have
-· 10 0~ been constructed to have the same full set of species, in the same order, in both files. With gfa as
the active sheet, run Analyse>DIVERSE and on the Taxdisc and Phyloge netic tabs, check(.!') all
the quantitative and presence/absence options:!::. (=deita), !::.*, 6.+, St:/, A+ (=lambda+), <D+ (=phi+)
._ and S<D+, taking also ./Results to worksheet. Under the T axonomy button, take all the defaults,
namely: (Aggregation data: gfagg) & (Current level of sample data : Species) & (Use links>From
level: Species & To level: Class) & (Weights• User specified), with the Weights left on their values
of step lengths of I between all levels. Include also the number o f species (S) and an evenness
·' measure such as Pielou's J', from the Other tab. As with th~ earlier di versity example, look at the
correlation between these indices by Analyse>Draftsman Plot. (To obtain the plot below, the axis
scales have been switched off by unchecking the Show scales box from the Graph>Special menu.)

·~
;I· ~
Groundfish NW European shelf
Abundan('e t'!m
r------J - ---·
Qi;;-~ ~•on ~' Rorefaetion T.;;cfoc ·
. .,
Pny109cnei~"]

f .:..d!i4•; ,.L '•"'.t/t!:ii


1
Pre-tre-;-· I 0AvP<l: ~

'""°1-----+-s_168_~ Rose .c:u;;- 1g;,:;;;;;;; ~ ·Hi- .!l•relodion p-;;;;-;··p;;y;;;;:;;_~I


R&.o rodloto _ 0. ---·
Ouar~4oli\-e
0
0Toxdiv: 0
B Tox dlsl: O•

[::'.}Avl'O' /),•

0 TTD: s /),•

Ocran famly
ROIO ROpdae
ROIO Ropc!ae
glogg
Rojo ROjidac
Rejo R.pdoe Curren level ol somple dola:
Rojo ~ojidae
f~~·;s---------;; a1111re9Bc i on le
ROJO Roiidoe
a r e: de:ce:rmincd

0"'"User
"'hi'opeclllcd
Rojo Rojldoe U.e m>o
loYo n Branch 1Jc 1i;;hc
Torpectinldoc fromlovol:
Jspccie" l 20
~~es 10

~ lc1
l!'j f!i] ~ :cenu!S

Tolovet.
" 1
ramily Order
60
80
0 ...,,,,
To xon rlcMe"
100 "'
Level

' ,...--OK
Spee~s

- --. CONJ:
Grciundf1:;h: diversity + relatedness indices
.. ~-~~

~ ......
·.,~)-

.,
··..

167
9 ~
In the draftsman plot, note particularly the first column of plots, which set each index against the 0~ ~
number of species, S. These bear out the general observations of Clarke KR & Warwick RM 2001,
Mar Ecol Prog Ser 216: 265-278, and Chapter 17 of the methods manual, that:
0 ~
a) total phylogenetic diversity PD (S<I>l and total taxonomic distinctness TIO (SA1 are dominated !"
·.. /
:"'"'\
~
by S (which will be strongly influenced by the differing sampling effort for the 277 rectangles); ,.-
b) an attempt to correct for this by using average PD (<I>) is unsuccessful, there still being a strong ~ ~
·•'
correlation with S (only negative now), but it is successful for average taxonomic distinctness ~
1~'1
...
AvTD (AJ and variation in taxonomic distinctness VarTD (A}, Clarke & Warwick showing that -..
i ~
(mechanistic) independence of A+ and A+ from S is to be expected on theoretical grounds; ~
....
c) quantitative taxonomic diversity (A) retains a strong element of the evenness component from .---.. ~
,..... ;
the species abundance distribution, i.e. is strongly correlated with Pielou's J', Shannon H etc. In ...... ~
l
fact, A can be thought of as the product of Simpson diversity and a relatedness index, thus
quantitative taxonomic distinctness A* = A+{Simpson) more nearly represents pure relatedness, c! ~

and is seen to be much less positively correlated with evenness; -:) ~


d) the quantitative (A*) and pres/abs (AJ forms of AvTO, though positively correlated(== 0.5), are e; ~
not highly so, suggesting (as other evidence does) that they capture somewhat different aspects f~- ~
of relatedness and are both worth examining when quantitative data exists; -
e) because of their use of the taxonomic tree structure, the taxonomic distinctness measures capture ; ~
t,;;
an axis of variation in the samples not reflected by the standard diversity measures (this can be ..... ~
seen by repeating the PCA, and the MOS variables ordination, of pages 163-4 for a combination F,:.
·~
of the above relatedness indices with the classic measures S, d, J', H, a., H' and 1-A.').
The worksheet of diversity indices for each sample could now be saved in text or Excel forma~,
along with the factors that define the 9 sea areas, ready for input to a standard univariate statistics
c ~
~
package, to calculate ANOVA's, means and 95% Cl's (see Rogers et al 1999, reference page 50). @~,
~
Tests for Wide-ranging biogeographic studies, and particularly historic data, are often restricted to simple f- ~
taxonomic species lists. Even where quantitative infonnation exists, it is rarely from sampling protocols that
distinctness have been standardised with respect to sampling effort over the whole data. Where sampling is so e.· """ I

exhaustive that the asymptote of the species-area curve is approached, then it may be valid to ~
compare diversity status by the length of these lists (species richness S), but this is not often the ~
~
case (in marine science, certainly). As is well known, Sis heavily sampling effort dependent so, if (-
sampling effort is variable and unknown~ any valid statements about diversity appear problematic. ~
However, the two relatedness measures discussed earlier, average taxonomic distinctness (AvTO, ~· ~
A) and variatfon in taxonomic distinctness (VarTD, A1, are not only computed from simple
~
species lists, with the added knowledge of their Linnaean (or other) classification, but also possess €""
a robustness to the varying number of species Sin the lists. To be more precise, in different-sized ~
sublists generated by random sampling from a larger list (simulating the action of sampling with ~- ~
variable effort) their mean values arc unchanged. This suggests that it is valid to compare A+ (or
A) over histoiic time or biogeographic space scales, under conditions of variable sampling effort. ~ ~
(Note that the indices are average not total measures, and orthogonal to species richness - along a ~
third PC diversity axis, would be one way of thinking of it - and therefore an addition to S, rather €.)
··)
~
than a substitute for it, in cases where sampling effort is controlled and Scan be validly compared.)
Furthermore, a test can be constructed for the null hypothesis that a species list from one locality ~.. ~

~
(or time) has the same taxonomic distinctness structure as the 'master' list (e.g. of all species in that
biogeographic region) from which it is drawn This is again by simple randomisation: given there
€-:: ~
··~
arc s species observed in a particular sample, make repeated drawings at random of s species from ~~-ll
.-... ~
the master list and compute A+ for each drawing, building up a histogram and a 95% probability
range of values of A+ expected under the null hypothesis, with which the true A+ can be compared. ~:! ~

Values below the lower probability limit suggest a biodiversity that is 'below expectation'. This ~
E-·~
can be carried out for a range of sublist s~zes and the limits plotted agains~ s, to give a 95% 'funnel' .~!
~
'.)
of expected values (the funnel arises from uncertainty being greater for smaller sublists). This can
be repeated for VarTD (A1, giving a second set of histograms and 95% funnel. Together, the true
~ ~
6.+ and A+, and the simulated values obtained by drawing their number of species from the master
~-·
~
list, can be plotted on a single (x,y) scatter plot. Probability regions (contours, referred to as ......
~·.
'"~"'
'ellipses') covering 95% of the simulated values can now be constructed, repeated for a range of
sample sizes, and the true (a+, A1 compared with their appropriate contour. ~

168
~ ~
...
~ .' .
~

':='r
~
-
~

~
~

~
-~
.. , 14. Biodiversity measures

TAXDTEST Further theoretical details and discussion can be found in Chapter 17 of the methods manual, which
(on Eili°ppean also presents analyses for the groundfish data, whose workspace gfwk should still be open from
groun~~sh) page 167. These taxonomic distinctness tests (on pres/abs data, only) are carried out by Analyse>
r,. .~ g:', . " ' :. TAXDTEST, accessible only when an aggregation sheet is the active window. This is the 'master
list' (the Master taxonomy button on the General tab) from which random subsets of ~pccies will
be drawn, in order to construct the probability histograms, funnels or 'ellipses'. It is also the default
aggregation sheet used in calculating the real D.+and A+ for any specific set of samples, to super-
impose as points on the simulated funnels or ellipses (-./Use Sample data>Taxonomy•Use master).
However, with (Taxonomy•Specify different>Taxonomy), a different aggregation sheet could be
supplied for the sample data calculation. This would normally be quite unnecessary because the
species relatedness needed for any particular sample can be drawn from the master taxonomy: as
noted earlier, there is no necessity for the sample data matrix to contain a ll the same species in the
·-.
I

same order as the aggregation sheet - it is just necessary that all the species can be found in the
master lis t. However, it could be valid to place data from one region (or geological time), with its
own aggregation matrix, on an 'expected' funnel from an entirely different region (or time), with a
different master list, and v6 now caters for this. The Taxonomy buttons in both cases gi ve a dialog
exactly as on page 167, allowing upper and lower compression of the taxonomic tree <tnd path
weights which can be altered from equal weighting of steps through the levels.

Histograms As an example, consider just the first of the 277 groundfish samples, the site (quarter-rectangle) SI;
for one highlight and select just this column from gfa with Sclcct>Highlightcd. (gfa is a density matrix not
sublist size presence/absence, but TAXDTEST will automatically convert it to PI A data - as does DIVERSE
when computing D.+,A+ etc of course). With the aggregation sheet gfagg as the ac1i ve w indow, take
Analyse>TAX.DTEST>Gcneral>(Plot type•Histogram) & (Max random selections: 1000), with
defaults for the Master taxonomy button, and -./Use Sample data>(Worksheet: gfa)>(Taxonomy•
Use master). There is no need at this stage to consider the options on the S Range tab, since the
routine will automatically establish that there are S = 19 species in the proffered single data column
and will produce 1000 random draws of 19 species from the maste r list gfagg of 93 species. These
are displayed as a histogram (i n fact two histograms, for D.+ and A+) with the real valu~ of D.+ (or
A+) for that data column indicated by a dashed line, as usual, and significance levels in the results.

frcqu, ncy Dal•


~~-

•,
["'Ma!:ter taxonomy...
Max ra:-.drwn sefection~
'• 1000

0Usc Sample d!ll•


t' Work:heot
gl•

IOXOn<'mY

G use mo<!cr
"-· 0 Spedly dl llercnl

'~
) ;-.

'-" Global Te.o t


Nwnbc r c t !!pee 1e!I: l.9
\~
r- ..
~
Nwnbe r. ot permut a tio n!! : 1000 (Rt1ndo1n !!ample f r om a l t\rqe nwnbe r )

-· ii-
.!:
Srunp l e ~ttl t i!lti c (De lta+): ?S . 906
.S i q nitic a nc e l e v ~l o t !lwnp l e !lt tl t l!l t ic: 4.4•
"-" NWYlber c t permuted !l t a ti:'Jt ic!I gre ate r t han o r e qua l t o " "-~pi e : 98 1
N'unbcr ot permut.e d !jt.at1:st 1c :s le !t!t t.han ot ~qu al to :sdmp le: 2 1
..._,
""""
Sample !ltat 1!!tlC (LM'.bda +) : ~ 3 3.535
0 J.,_..,-.-,-.i-..11,11..-,1,11,•9:1'
Siqn1tic anc e lev el ot !latnp l ~ !lt a t i!! t l c : ?S . 3 •
ec,IPJNwnbe r ot pe rmu ted .otat istic!I orea t c r t han o r e qual t o :"Jampl e : 3 76
.,-.i• 1.1s11,
2 93 94 9
(.) O.••• Nwn~er of permuted !t t. at 1 ~~i c:s l c :s:s than o r equa l t o !Ja.mple : 62 i ~

0 I 6 CJ
If you s'Jbmit several columns of data by mistake at this stage, the error message 'Only one sample ,_..,
must be selected for histogram' will result. If you wish to generate histograms of 'expected' 11+ (or -.....·
J\) values for a fixed sample size (S = 20, then 30, etc), without reference to a particular data
sample, then uncheck the Use Sample data box on the General tab of the TAXDTEST dialog. You
will then need to take the S R a nge tab and specify, for examp le, S value (no sample data): 20. (For .r....
any histogram, particularly the latter, you rr.ay wish to rescale the x axis to a shorter or longer range
with e.g. Graph> Data labels & symbols>X axis>..!' Specify scale> X min: 70 & X max: 90 etc).
'Funnels' for The histogram for sample S l (from sea area 9) on the previous page shows that its AvTD (11) is
t I
a range of 'below expectation' or at least on the borderline of statistical significance fo r being so (p : : : 0.04),
sublist sizes w hereas V:irTD (/\}is within the ' expected range' ; the test details are given in the results window.
However, it is impractical to produce decailed histograms for each of the 277 samples, so a
preferable option is just to view the 95% lower and upper limits for a range of sample sizes S, using o • I

(Plot type• Funnel) on the General tab, and a set of samples specified under ..l'Use Sample data. So,
select out all sea area 9 (E Central N Sea) and sea area l (Bristol Channel) samples from gfa (using
Select>Samples>•Factor levels>Factor name: area> Levels> ... , moving all except 9 and 1 to the
Available box), and run A n alyse>T AXDTEST again, on gfagg. With (Plot type•Funnel) and
(..!'Use Sample data> Worksheet: gfa) it will also be necessary to take the S Range tab and
(..!'Specify S range)>(Min S: 5) & (Max S: 35), say, to span the spread of S values on the display. --·
• I

The other option here, S ratio (funnel): 1.2, just determines how many S values are calculated in the
intervening range (5 to 30), S values stepping up by multiples of 1.2 by default (then rounded), thus
..
S = 5, then 6 (=5x 1.2) etc. Finally, the default gives 95% intervals (2.5% of simulations fall above
the upper limit and 2.5% below the lower limit) but this can also be changed, on the S Range tab. · ."
p .•
The results and funnel plots fort:.+ and /\+ are shown below and indicate that, whilst area 1 samples
arc within ' expected' ranges for average taxonomic distinctness, area 9 samples have reduced
diversity (AvTD is the more easily interpretable of the two indices, since it measures the ' breadth' ..-
' -~1
.
of the assemblage for a given number of species). Rogers et al 1999, see page 50, di scuss possible
!'..' I I
reasons fer this. Note that (in another improvement to v5) these p lots have been tidied up with
Graph>Data labels & symbols, by removing the labels and adding symbols for factor area,
changing titles and also symbol sizes, and types/colours with the Key button, as for any other plot.
The probability limits could be further smoothed by running with Max random selections: 10000
~ .
- !
(General tab), but they will contin'Je to show ' kinks' for small S, because it is a discrete variable.

2:9~~
Groundf1sh. ave:reoe taxonomic distinctness
F1ot IYre
F unneU-J:.p~e

0 Specify S ronge
95 - Iareal
" 1
() Hlslogrom 00 T 9
Min S: Mox S:
0 Fo.rnet s ; 3s[J
(.~;>••
$amp!e dulo

glo
----
0Use rnM1er
Qspecity dilterent 400
10 15 20 25 30 35
Number of speri es

!s ample H Value Sig ~ Vo l u c Si g ~


1
Sl 19 7 5 .9 1 •!. q 233 . 5 4 n .9 -
~ S2 20 ?7 . 37 1 0 .0 26 1. 50 2 4 .0
15 3 H 7 1. q 3 Q, q 177 . OB 55 . 7
15 4 19 74 . 39 0. 6 14 6 . 26 15 . 2
SS 17 7 2. 35 o. ; 166 . 56 6 1. 9 ~
10 15 2Q ~
> llUnber or species

170
14. Biodiversity measures

· ·,1 ·:·:•ri~~t~--:::~
::.,Using tax.on : Another new feature in v6 is that the simulation of random draws from the 'master list', to generate
. -. · ~- .H.v.:.;.t~ ~.~...
~ fyequency;.m, histograms, funnels or ellipses, can be constrained to match the probabilities of occurrence of each
•• • •'i>':· ··-· ..
:. simulatio'n's ~~ species, as observed in a large set of samples defining those taxon frequencies. Thus certain species
·"" ... . ...:::·":~:.~ ._
~ "' are picked more often in the random subsets, because they are observed to be present more often in
real samples of this type. The simulated mean and range of (say) A vTD values generated in thi s
'::;1 way could be argued to give a more realistic yardstick for assessing the observed A vTD. They arc
simply produced in the Frequency Data tab on the T AXDTEST dialog. by ( ../Use taxon frequency
data) and entering a presence/absence data matrix which has a reasonable number of samples from
the full set of species present in the master taxonomy, on which the simulations arc based.
An illustration can be constructed here by first duplicating the selected parts of gfa (areas 1 and 9)
with Tools>Duplicate, renaming this gfal +9, then clearing the selection on gfa (with Select>All),
and creating a pres/abs matrix from it with Pre-treatment>Transform (overall)>(Transformation:
Preser.cc/absence), renaming this new sheet gfaPA. Now run Analyse>TAXDTEST on the acti ve
window gfagg, again with (Plot typc•Funnel) and the default taxonomy and S Range tab as before,
but with (./Use ~ample data)>(Worksheet: gfa 1+9) on the General tab, and on the Frequency
Data tab take (./Use taxon frequency data)>(Worksheet: gfaPA). Tne same tidyi ng up operations
as previously, produce the plot shown below. Of course, the real D..+ values are unchanged - they
are not a function of assumptions made about the relevant master list to simulate from, or whether
to carry out simple random or frequency-based simulations. (Indeed, if you are not convinced that
D..+ or A+ should be compared with fillY master list, you can simply use the taxonomic indices in the
same way as other diversity measures, e.g. analysing replicate samples across groups by ANOV A .)
The frequency-based simulated mean is no longer exactly independent of the sub-list siz~ s, but the
increase with s is seen to be slight her!.!, on the scale of the probability limits, and the conclusions
would be rather similar to those for simple random sampling. Generating frequencies from the full
data, then used to assess part of the same data, may or may not be questionable; its implications arc
not really clear. For now, it would be wise to treat the frequency-based approach as experimental.

~l!l ~
....."' O H.s togom Ma~tr tax GrDundf1sh: Av TD (irequency-based s1mulaled lim its)
0 Fl6Ylel 95 1a~a1
0 1
Max random : de
(J Bipse
~ (Alwoys uses ma"1er to xonomy ) 90 \ T 9
· -- .. l

0 Use Sample dato


85 ~-
c -----:----:----_
• e: • "'. o., -
11',\Jrk~et
~ 80 .... .,... - ............. _._'!.~.....~..!:.............. -.-·- ·-

75t· ')#~
':J l ..'lXt"t'IUrTrf ' T -----

0 Usemoster

0 Speclty dllfererl

..,.... 10 I .
65
5 10 15 20 25 30 35
Number or species
OK Concel e1p

'Ellipses' for The final option is to consider D..... and A+ in combination, by plotting 95% probability contours for
joint values their joint distribution, under the null hypothesis of simple random (or frequency-based) selection
of(6\A) from the mas ter species list. Optionally, pairs (D.. +, Al from a real samplc: data matrix car. be
superimposed. There may be some advantage in looking at both measures simultaneously because
departures from 'expectation' may reveal themselves as, say, lowish 6+ and highis h /\+values,
neither of which was significant on its own, but in combination outside the joint (6+, A) contours,
for wh!ch D..+ and A+ might be negatively correlated. (The contou:-s are referred to as 'ellipses'
because they are calculated by approximating the full simulation distribution by a bivariate normal
distribution in a transformed space. This works well - see Chapter 17 of the methods manual).

171
,.....,
'
With gfagg as the a::tive window, adding the sample data points for areas I and 9, take Analyse> r~'
T AXDTEST>General>(Plot type• Ellipse) & (./Use Sample data>Worksheet: gfa 1+9), again with ~-·
the default taxonomy options . Run the simulations under simple random sampling, i.e. uncheck the ,,_._
Use taxon frequency data box on the Frequency Data tab, which was ticked for the previous
example. On the S Range tab, take (..!Specify S range)>(Min S: 10 & Max S: 30), with S interval
(ellipse): 5, which will plot 'expected' 95% contours for s = 10, 15, 20, 25, 30. You should again
remove the labels from the superimposed data points, by Graph>Data labels & symbols, and
change the symbol types/colours for the two areas, as previously. (Note that if you bad changed the
symbol types under the Key button on the Edit>Factors menu when the active sheet was the data
matrix gfa I+9, rather than using the Key button from the previous plot window delta+ freq-based,
then the nt;w symbol types would automatically have been used for the present plot. Look at the
Explorer tree and you will see why the new types are not carried across from one plot to the other -
they are not 0n a common branch connected to gfal+9, so the information will not propagate back
and forwards again.) Also on the presentation side, note that all the usual plotting features are
available here, such as zooming in lo see the details of particular points (re-instate the labels lo
display the number of species in each sample, in parentheses). In addition, the Contour tab on the
Graph Options dialog (seen on page 80 for drawing cluster contours) also operates here, with a Key
button allowing line styles and fill colours to be changed, and the fill to be suppressed.

O tts1ogrom Muter
FUV\Clld ip sc
0 Fumel

0 Elp:e
Max rondoln 0 Speedy S •""9•
ts •1000
""s:
10
Max S:
30
400

S61rf)!o d•I•
S lrtervol (ellpse~
5 ---,
W>rlc sneot
-----
CcnlOU' %: ...-
Ta'(00¢rny
i9s __ l
0 Use mesler
70 75 80 65 90
.---:. ' ;77-----. --~-~-.-
. -.
~..~~l!lf.DJI!...\ ..·....~.\..{.·.! ',..( ......(..\.~~:,: . .,. :4 1,,:~ .. ~' .:~

Della•

The idea is that, for each samp le, one visually interpolates between the contours for the two s
values that straddle its observed number of species S, and determines whether that point is inside or
outside its 95% contour of joint 'expectation' (a Bonferroni-type correction could be used for the
probability limits, or you should just bear in mind in interpreting the plot that 1 in 20 of the points
will fall outside 95% limits, under random draws!) . The conclusion here is again of a lower than
expected average taxonomic distinctness (but mid-range VarTD) for area 9, and this is discrete
from area I, which has ' normal ' AvTD (and possibly higher VarTD). For some general thoughts on
interpretation of fl+ and A+ see Clarke & Watwick 2001, Warwick & Clarke 2001, and for this data
• ~- (including other sea areas), see Rogers et al 1999 and Chapter 17 of the methods manual.

172
15. Spcr.ics curves

15. Species curves (Geometric Class, Dominance and Species-Accumulation Plots) I


Range of PRIMER plots a range of what might be termed 'diversity curves', under the Analyse>Geom~tric
diversity Class Plot and Dominance Plot menus, obtained when the active window is a data matrix. These
curves display evenness and richness components in a more continuous way than is achieved by a single
index and, in the case of ABC curves, incorporate both abundance and biomass components of the
assemblage. Features new to v6 include: a greater ease of generating multiple ABC plots (one for
each matching sample in abundance and biomass arrays); the ability to identify factor levels on the
plots by symbols/line types (and all the other general plotting improvements); and a structure for
testing dominance plot differences over sites/times/treatments etc, using the Analyse>DOMDIS
routine to generate a triangular matrix of between-curve distances which can be input to ANO SIM.
In addition, an enhanced richness (S) estimator suite is provided in Analyse>Species-Accnm Plot,
which (as elsewhere) supplements the plot by sending key statistics to a new worksheet, making it
easy to export information from PRIMER to other plotting or univariate statistica l software.

Geometric These are essentially multiple frequency polygons, plotted on a singlt> graph, for each sample in the
class plots active window, which needs to be a taxon (species) by samples array of genuine counts. If you
wish to plot a single curve for a pool of all samples in the data matrix, or multiple curves each of
which is for the pool of samples in a particular group, then you should first submit the data matrix
to Tools>Sum (to sum all the samples in an array you will need to create a factor which has just
one level across a11 samples, simply using the Edit>Fill Down>Value operation when you have
created a new factor from the Edit>Factors>Add menu, see page 31). They axis of the curve is
the num'oer·of species that fall into a set of geometric (x 2) abundance classes. That is, the plot
gives the number of species represented in the sample by a single individual (class 1), 2 or 3
individuals (class 2), 4-7 individuals (class 3), 8-15 individuals etc. (This is termed the 'species
abundance distribution' in statistical ecology, and there is much early literature on fitting it by
distributions such as the truncated log--normal, proposed on rather unconvincing theoretical
grounds. Fisher also fitted it to the log series distribution, which is the derivation of the Fisher
a diversity index calculated by Analyse>DIVERSE, see page 162). It has been suggested that
impact on assemblages changes the characteristic form of this species abundance distribution,
lengthening the right tail: some species become very abundant and some rarer species disappear.
164 -> Close the existing workspace (it is not needed again), and re-open the Clyde dumpground macro-
fauna data, clwk2, created on page 164, or just open the abundance matrix clma from C:\Examples
v6\ Clydemac. The plot will be too cluttered with all 12 transect samples displayed, so contrast just
two sets of pooled samples: the outer ( l , 2, 11, 12) and inner (5, 6, 7, 8) sites - the pools need to be
the same size for unbiased comparison. For clma create a new factor with two levels corresponding
f '
to these groups, using Edit>Factors>Add (leave the entries for the other s ites blank), and run
Tools>Sum on clma, specifying this factor. On the resulting data sheet, Analyse> Geometric C lass
Plot. The right shift of the abundance distribution at the inner sites is clearly seen. Close the
workspace; it will not be needed again.
· ""\

Edt

I Add... .=. be:.;...I~~'-'-'----I


L•c..;.
1
S1
[ Combine ... J S2 2- )0uter

[ Reneme ... )
SJ
$4
3--1=
4 I
5 lmer
[ Re0<der ... j .SS ----r---
~-.-~~!__
Oelcle ...
- -!Inner
7
--
'--7 Key .•• :---r~
c..!(\ '1lport...
___
10 I
]____

1.•~---_i~:..__
i.._., 12 ~ Outer

c · ~
g 173
0
' <•• I
15. Species curves
~.
".!:rt
Dominance ' Dominance plot' is the convenient generic name for a family of curves also known as 'ranked
curves species abundance plots', which can be computed for abundance, biomass, %cover or other biotic
measure representing quantity of each taxon. For each sample, or pooled set of samples, species are
ranked in decreasing order of (say) abundance. Their relative abundance (i.e. percentage of the total
abundance in the sample) is plotted again5t the increasing rank (x axis), the latter on a log scale. .-.. .
The y axis can consist either of relative abundance or cumulative relative abundance, the former
therefore always decreasing and the latter always increasing. The cumulative plot is often referred
to as a k-dominance plot. There is a third possibility, a partial dominance curve, in which they axis
is the abundance of each species relative to the total of its own abundance plus that of all other less- •.....
abundant species. The idea of the latter is to ameliorate the way standard dominance curves tend to
be dictated by the most abundant species, by looking at the dominance pattern of the remair.ing
assemblage having removed the most abundant species, then the next most abundant, etc.
-. ~

A further possibility is to put dominance curves for abundance and biomass, separately calculated , "'
onto the same plot. This is referred to as an Abundance-Biomass Comparison (ABC) curve. A
number of published studies have demonstrated a characteristic change in the relative positior. of
....
-
these curves under ' disturbance, particularly for organic endchment of marine, soft-sediment
macrobenthic communities. The method is due to Warwick RM 1986, Mar Biol 92: 557-562.
Further applied detail on dominance plots is given in Chapter 8 of the methods manual. ..._.......
(L. Linnhe Macrobenthos in soft sediments of a site in Loch Linnhe, Scotland were monitored by Pearson TH - -...
a: i

macro fauna 1975, J exp mar Biol Eco/ 20: 1-4·1 , over the period 1963-73, recording both species abundance and
time-series) biomass. The data here are pooled to a single sample for each year, with an assemblage of 115
I
species. Starting in 1966, pulp-mill effluent was discharged in the vicinity of the site, w ith the rate· · ·I

· ~ increasing in 1970 and reducing in 1972. Abundances are in lnma, and matching biomass data € •I . I
(total biomass of each species for each of 11 years) in lnmb, in directory C:\Examples v6\Linnhe. ,.. I
~· · ·· ·

k-dominance, Open lnma (and lnmb) in a new workspace, and with lnma as the active window, take Analyse>
~- .
ordinary & Domin ance Plot>(Mode•Single worksheet) & (Plot type•Cumulative). This produces a single plot - · I
partial plots of the k-dominance curves for all 11 years, the earlier years being seen to have higher evenness - ··
(lower dominance), by their placement lower down they axis, and higher species richness by their ~..-- ~
'reach' along the x axis, before the full I 00% abundance is attained. In contrast, the years of f ... I
highest effluent release are characterised by !ow equitability (curves move up the plot) and lower I
richness. The latter tends to be under-emphasised on classic k-dominance curves s ince the default is ~· I
to plot species ranks on a log scale. This can be removed by unchecking the Log scale box on the X --~

axis tab of the Graph Options dialog, accessed by Grapll>Data labels & symbols, but typically .-
~: · ' ,

that would underemphasise the equitability component, so a log scale is usually preferred. Note that .(
most of the usual plot features, such as a rectangular zoom (changing the aspect ratio) are available. €: .
~
e- -·...
e- ~ ·
0 Sngle worlcsheet ·~
(A!ll.Nl!UllCE ot OOMASS)

Q Ab<.ndonce encl Oonass dola (ABC)


~
(Ac'Jve worksheet is ABUll>ANCE) ~
·~- - · -- -- -- r - ..--

!il'ii·G;-.nh4' · ,· .
...... -- - ··-:---.:- -
:' ; : ·, . · ,~ ".~l::!J~
el~~
· ·- - - E-' ·1
E- ~
l'~j ....~. · ' - •.~·. ..- ." ,' f
1
.. / \ t ,
~ ·. Loch Lmn.l)e macrofauna
I •• • .' • HU ,1,

" ;• .. .. ''·' ..' 100


_..<;'7~~
~ 1963

Max number of plots


t<q/
., 1964
: 1965
~.:..!
"'
r- 80 ./ · • • 1966
DIVERSE..
j:j ~"
.· :' -·
. .,,,~-~ •
...~
~-~
·I . •
-
1967
1968
~--~
..
OOl'O!S...
l !NKTREE.
PCA,.. OK Cancel
j
.
50 '
. _..A,' .....
... 1' "•j/
~ , ',,( j!./
,., A

. _,•

_,
~ " 1969
* 1970
1971
••
~ 7 SIMPER... ~ •O " c 'l -' / "1 1972
.. e _ o___o__o_ _o _ . -----------4 ~
s1MPROF ..""
s
.--,YX './
f .......-! ...-
0 1973
~
9 0 0 0 0,
~
U
e 431- 4o ~1•·mummww
10 -
!.W -~·- 1o
·
7~
I
0
I Oro/~l'tot... I 20 ;.,-"


Geometric Cl4ss Plot ...
o-......-~~~~10~~~~--'100
< 1 Species·Accum Plot... , ~
·- - - ~clesronk

174
15. Spcdcs curves

x m:n:
r-
I
Q Spccllysclllo • 1966
~ • 1967
~ogscolc 0Rcvcr• • 1968
• 1969
"1970
.. 1971

_ _ _J Doto lobcls
Gcncrol •••

v Po<ntcr
[ Tl Speciu t• nlt
0 .... . _ __ . _ _ _ _ _ __ _ _ ...__...._....__.......
--~'.'°"' _....
< __ ____________
: > _.
o :o ~o
Sov~ Gtoph As ...
Specie• ronl<

To note the e ffect of switching from cumulative to ord inary/partial dominance curves, first select
just the years 1966 (the last year before effluent discharges began) and 197 1 (one of the two peak
yea rs of discharge), by highlighting those columns in lnma and Select> Highlig hted. Perform three
separate runs of Analyse> Dominance Plot, w ith Plot type options of • Ordinary, • Cumulative and
•Partial, then close all windows except the three graphs and Window>Tile Vertical to obtain the
displayed desktop. Note how the cumulative plot emphasises the greater dominance of the 'lwo
most abundant species in 197 1 (on::e high % values are reached they canno t decrease !), but the
partial plot also picks out an unevenness fo r that yea r in the do minance struc tu re of species lower
in the ran k order, an undisturbed partial dominance curve typica lly looking more like that fo r 1966.

,..
"

D
• "·

~
[dt
- -
~~11Ptjf~~~~~.;'.,.t~~\i~~~h:1~i-:-
• ••

i;i ~ [9.
·,a;;:.,.·,;;;,;;-- --
.: :) ......
• ... .. ;"~~-~.··
c;..p, Toe*
~
.,... ......,.,...·· ••rtN
i:,;-;.:,.~-

~ "'· It J'J f> --:;


:1, . . . ,
.,.
-·!:-. ; ..y.,..~
Hot>

~.,, ,,
.
-- ~·
n:a:::!'.t;_;k::~t~-~~
...... '11 , ... • tJ..
.~

t.,
·=·
~ .,

'V
....,,,

.·..---dl!31~ '5!':rl:..1'$ '.'


~.-:-...f{~~~
•~.._....,.; !Ir·~·
....

,,
·'"'· :~
~u~

~~ l!1 ~t ~~-
~J .,

... ,

{ l : 1""'iq~ -!31!3

- 1' ~·"""' Ordinary l Pait/a! Cumulative


D 0r.,n1 so~
~1 j
80
=1' °"""""°' l'lol2
' D~
I ~1 100 I Tr,~ ' ~'
~i
I .I.
40 _.,,T .::
' ~ 60 80
I
.; " Oo.M>nee PlclJ
D o.-3 ....
eD '
\
I v
:;
.,e
I ;v .!
g 40 ""' ,- T bO

.......
I .E I
,, " Oomnonco """ '
Do....,. ii I
0
I
~

\/j}
r

--
E
':) ....., E 20
c3 • \
I
0
-;;
·;:
.II 40
1i A
I
]
' ' :.
~
I
~ 1--r
.,.. .
10 20 "· :• 20 , ·
J. \'- I
~
(j

0 ~ I 0 0
1 10 100 I 1 10 100 ' 1 10 «xi
.I ( .. ) Species rank
- -
l Spt?c1es tank
--- -
Specie' rank
-- - -

•l
Abundance- ABC curves plot both abundance and biomass k-dominance lines on the same plot, and have been
Biomass interpreted in the literatu re as indicating au 'undisturbed' community if the biomass curve is above
Comparison the abundance curve, ' gross disturbance ' if the abundance curve lies above the biomass curve and
'J curves ' moderate disturbance' if rhe two lines largely intersect. T hi s is based on the observation that for
, . climax communities of soft-sediment macrobenthos the biomass dominan ts are large-bodied, but
do not dominate the abundance, and are amongst the more susceptible species to environmental
-;
impact, whereas gross distu rbance (especially from organic enrichment) is often characterised by
large numbers of individuals of a few, small-bodied 'opportun ist' species. There is evidence that
... the phenomenon extends to other fauna! groups (e.g. terres trial) but its universality is questionable .
"-·
,... Restore the full set of samples for the Loch Linnhe macrofauna data sheet lnma by Select> All (and
v E dit>C lear Hig hlight if you wish, though the latter is unn ecessary, since a ll routi nes - with the
i....,, exception of T ools> Transfo rm (individual) - operate on the current selection, not the highlights).
.- In PRJ MER v6, the 11 ABC plots - one fo r each sample (year) - can now be generated
\..·
,.,.. .. automatically by a single run of Domina nce Plot. The acti ve sheet must be the abundance file
\..; (lnrna) and the biomass file (lnrnb here) must be available in the workspace as the secondary sheet.
9 175
©
ttM
'..I

An attempt to run the routine with the matrices the other way round (or to run it on environmental 0r·
data etc) will provoke one or more warnings (e.g. 'Primary data not Abundance'), though not .- ~

usually an outright error. It is always advisable to check that Data types have been correctly ·-·
' I

defined, e.g. when reading in external files from Excel, and change them if necessary with ·"""·
Edit>Properties; you will then get the benefit of sensible defaults and warnings if you make an ~ ·

unexpected choice. As with all routines that use a secondary sheet, v6 will use strict matching of : !
sample labels to combine the information from the two data sources, though with the usual option
to relax this if the two matrices have the same number of samples, when it will assume that they are
in the same order. (Thus, if lnma had remained a selection of just years 1966 and 1971 then, even if ,-·
rhe full lnmb matrix were to be entered as the secondary file, the ABC plots would be produced
·~ ·'
only for those two years - the routine always starts from the primary matrix, and pulls out matching
samples from the secondary array.) Note that the spec ies do not need to match for the ABC routine,
so the matrices can have different numbers of variables. In fact this is quite likely, since biomass
may be unmeasurable for the smallest-bodied organisms counted. The key point to appreciate is
that the dominance curves for biomass and for abundance will almost certainly not rank the species
in the same way: th~ biomass dominant will not be the abundance dominant. This is an integral part
of the method, and thus no variable matching is necessary for this particular routine. ·~
' I

So, on lnma, take Analyse>Dominance Plot again, this time with (Mode•Abundance and Biomass
data, ABC)>(Biomass worksheet: lnmb) & (-.l'Output W statistics to worksheet) & (Max number of
plots: 11) & (Plot type•Cumulative). Eleven ABC plots will result, along with a data matrix of the ~-
·.....·
W statistic far each sample. W measures the extent to which the biomass curve lies above the
abundance curve (positive values are 'expected' for the undisturbed condition, negative values. fo.r
impacted samples), and is a convenient single index to report, if presenting large numbers of ABC
plots is an impossibility. W values are also given in the results file but sending them to a worksheet
is convenient for exporting to external statistical software, for univariate tests etc. The juxtaposition
of abundance and biomass data in Wis found to capture a different diversity 'dimension' than the
classic axis of richness-evenness seen on page 164. Finally, if you have started off the run with
only the lnma window displayed, it is possible to view all the plots together, neatly, as follows.
Click on the results window (Dominance Plot5) in the Explorer tree to bring its window to the
~..
-
front, and close it tcJ; do the same with lnma, leaving 12 w indows open (Graph5 to Grapb15 and
Datal); Window>Tile Vertical will nov.r display all 12 in the correct order, by columns - though
t.
they need to be printed or saved, in vector (.emf) or bitmap (.bmp/.jpg etc) format, one at a time.
(In the plot shown below, the default symbols and lettering sizes were increased, before Domin-
ance Plot was run, by Tools>Options>Graphs>(Symbol Size: 200) & (Fonts Scale: 200) and
these defaults remain in action for all plots until changed, even persisting on exit from PRIMER).

hlial !older ~1-er"ephs 1


; - Syrr;;;--- . Bub~-- ------j
· Shape: Colour: Colour: 6ound8fy: i
ji • ~ II I
Size: ;gJJ _=_J I

0 Single workoheet 00rd'-V


(ABl.N)ANCE 0< BIOMASS)
0CumutaUve
0 Ablndence end E:iomoss d&to (ABC) Q Pertiel P.EST...
(Active w orksheet f; ABUM)AJllCE)
CASWEU .. .
BIOMASS worksheet:
DIVERSE .. .
E 3 OOIYOIS .. .
L!t«TREE •••
GR_Ou!pl.A W stotist'C:• to worksheet
PCA...
Ma1run'>er of plot• Sll':PER•••
m 33
~~·----'
. . ..
' ~

Droltsrn Plot ...


I Help Geometric ci..ss Plot... I
_c~ Plot.:.:__J
_· _
_ _ _ _ _ _ _ __. _ _Spe_oe s·Ac
·--:

176
~ ·:
: ~/·~":'!• 15. Species curves
I

,.., F4< [dt ¥ew c;,..,n Todt - Helo


ah31~

.,, 0 ~ g ~~ ~~e . -foP • ~c ~~ ov

1111
..;
~n-·
:
~ _:'°~Aoll

I
:t-o

·~-=~ 1
~

:-::;
1

~
© 0.""'2
:."'()otonr'W'CeAotl
~ OrflP'\l
[)omlNr'Ct: ~'
I
I i.,
J
!
~
:t
G
:D
LccflLillnhoma<fO/•u~


/7
//
/

,ff,J

W•0177
i~~~~i~~·
1.::=c 1,1
IA,. /
1~
~ :n
:l .
~·-~··~~~~~~~rn~~~~~!ti~~~!
LochLlr.nho.7>a<fO/auna

.,,.-r
/

~ c .
Lochl.JnnhomacfO/•uns

>•
''"

>"W • 02M
..~it.~:-,~~.r::-~·~"~@~.:ill~~:
Loc11Lm,,,.,,~a11na

1:~ 1 !:
'A
~.,
) ~ .,
a l2 '
I

./
r
W•OO$l
'""'
1:!"'.:.-1 2: :"T
L • . j
1
~
'W• .0. 142

~ :»
IO /
i:;';.':." J
,•
'' · :

~()rf!C)h( -~---CD I ....,- - -.,---..


= ~ ~tAclS
1
I IO
!io'Ul.Qtt.
ICD I
U-•Cfll.llt
I .,
O»Clll~1•
f'C
UUlllMI

0 0..,,..s I , ~ .~ ··c.,- ..~ · -- • ~w.l.:F-!f.N•'C - ;r:- ,, ~~-~·~"' ... · ·~~,. ...... , =-

.·, , ,-;-/
' o o.ecH ' 0(-:.;~~"'°!'l . ,.~.. ~!3 ~ o'.:)'*~;;1f'•;l ·!.l .~s ~e:; · ~'.:<..'J,;@:1~ ·:'1ii:.!l_i3f:3, ~ ~;t.\fl' ....... ~@'~
"Y ,.,_ ~ Or~7 I LochLmnl:'<ima.:rot1un1 LtXhLmn~m«IC'l11.mi L1XhLmnt1fmt1cr~vauni Loct1 Lmn1>em~ri:>o'.J:in9
~ ()rephe ''" ftG7 tti'O ,~,,

;{
: ~°'-
~
. ~:f
! ,/
l:::::::·J I';: / ...?
/ l~!'::."::4' I ;:~,..,..- 1::1::...--:u I ~: ~ I::':'~~· I
~ °'...,,,
" 0.ach10 2 ) / /
J .. I.~ 1 .. ,. ' ' J ., ;• A.. • :
~0.achllI I . W• Otoo § c ; W • Dl>O ~ ,,/ w- 0 0<0 ,j •' w • o101
~Oncih13 j i«> ,, · i ~ ,' i'1 . t i lo) , '
~ Or~14 I u :n u o v u :ti---~---

.,,
2)

~ Ottld'l1S I ; ~'~""' IOl I ~~~1.... OJ 1 3.'tt.•~UM 0) I !:oc.•~,._..


::lO.O•I I ·--·- · - - -· - - · -·- - - - - -- · ·· · ··- .. -·- , · ----~ -
- D...,.,

·'

:::t1>u•• ..
~

The ABC plots follow the pattern (seen in other indicators also) of initial stability (biomass over
....... abundance curve), then a switch over of the curves in the period of effluent discharges from 1966,
:
~
I increasing in 1970, with apparent recovery in 1973 after discharges decreased in 1972. Close the
_, ·~ Linnhe workspace; it will not be needed again.

\./ Testing for Testing for differences in ABC curves for group structures of sites/times/treatments etc, where
k-dominance there are replicate samples within each group, is probably best accomplished by using the W index.
~ curves This is computed for every replicate and the set uf W values is treated like any other diversity
measure, by exporting to univariate software for ANOVA. A different approach is needed for k-
dominance curves, because of the lack of an internal comparison of curves to generate a univariate
statistic. Single cumulative curves now need to be compared across replicates, both within a grnup
)
and between groups. Clarke KR 1990, J exp mar Biol Ecol 138: 143-157 suggests a solution here,
which v6 now implements in the Analyse>DOMDIS routine. Thi!: starts from an active sheet of a
single specicsxsamplcs array (abundances, say, though it could equally be biomass or area cover,
as in the example that fo llows), then calculates, separately for each sample, the cumulative relative
abundances of species ranked in decreasing order, exactly as for the k-dominance plots. The
'distance apart' of every pair of cumulative curves (samples) is now computed, using Manhattan
distance D 1 (see page 46), and the routine generates a triangular matrix which is simply a dissimil-
arity matrix for all pairs of samples. As such, it can be entered into 1.he multivariate PPJMER
routines in just the same way as for any other dissimilarity matrix. (Indeed, the possibility of
I inputting pairwise distances between curves - growth curves, particle-size curves etc - to
multivariate analysis was discussed on page 56.) In particular, a nm of Analyse>ANOSIM on this
distance matrix will produce a significance test for the differences between groups. Replicate
curves across groups that tend to be further apart from each other than replicates within groups will
1 I
give ANOSIM R > 0, and this is tested by permutation as usual. In fact, there is a choice of two
'curve separation' statistics offered by the dialog box in DOMDIS, namely Manhattan distance and
a modification of it, which is the default: (./'Log weighting of specie~ ranks). This multiplies the
absolute difference between the curves at the ith point on the x axis (the ith ranked species) by
'-' log(l + (), which successively downweights the contributions from the lower ranked species. It
reflects the fact that k-dominancc curves are usually plotted wi th a log scale on the x axis (of
I
1,..,
ranks), and it approximates to the visually-observed area between the two curves. The unweighted
form would be relevant if plots arc preferred without a logged x scale (as at the top of page 175).
(1

177
1 J . ~pt:c : ~ curves

(Tikus Is Data on coral communities at a site in Tikus Island, Thousand Islands, Indonesia, over the years
coral cover) 1981, 83, 84, 85, 87 and 88, were reported by Warwick RM, Clarke KR & Suharsono 1990, Coral

. ·~
Reefs 8: 17 1-1 79. Ten replicate transects were examined in each year, and the data is of percentage
cover of the 58 coral species identified, file tick in directory C:\Examples v6\Corals. As an illustr-
ation here, only the differences between two of the years will be used: 198 1 and 1983, before and
after the major El N ifio of 1982/3. [The data is of wider interest, however, as an example where
multivariate analysis is highly dependent on choice of dissimilarity coefficient: Euclidean and chi-
squared distance produce ordination plots which are almost the reverse of each other, with Bray- --,-;- ....
-~
.l

Curtis somewhere in between. Also, the zero-adjustment to Bray-Curtis on page 44 has a signif-
icant role in properly displaying the denuded communities immediately post-El N ino. See Clarke
- ,.. 11r .:'
KR et al 2006 J exp mar Biol Ecol for a comparative analysis using 2STAGE, as on page 152.) .... .

Open tick in a new workspace, and note (Edit>Factors) that the factor Year divides the data into
the 6 years ( l 0 replicates in each). Select just 1981 and 1983 data, using Select samples from the
right-click menu when the cursor is over the data matrix, then (• Factor levels)>(Factor name:
Year)> Levels ... , leavjng just 8 1 and 83 in the Include box. Analyse>Dominance Plot, with the
defaults of ( •Sing~e worksheet) and (•Cumulative) plot type, will generate the k-dominance curves
for all selected samples, on a single plot. Show the group strucn1re, of l 0 replicates from each of
two years, by Graph>Data lab els & symbols, switching on Symbols:./Plot>./By factor: Year. To
test for significance of the (rather obvious!) differences between the curves in the two years, run
Analyse>DOMDIS>(./Log weighting of. species ranks) on the active sheet tick. This generates a
resemblance matrix Resem 1 defining the distance between all pairs of curves, which is input to
Analyse>ANOSIM>(Design•One. way) & (Factor A: Year). The R statistic is 0.46, easily larger
that for any of the 999 random permutations under the hypothesis of no year difference, thus
p<O. l % (and the null histogram shows that it could have been much smaller for more simulations) . .
The R value is depressed somewhat by one 'outlying' replicate in 1981 (sample Y81Rl0) which is
much more dominated, and less species rich, than the other 9 transects, but there is no justification
for leaving it out of the analysis. Close the workspace; it wi ll not be needed again.

Aveloble
'------ j:cWjreK M»Q
84
es
Fnet or rwnc: ·97
88

~i!.•'iiiiiii
~ .
Pre·tr~tmcnt ...

I Canr.el - Re$.:.'.~o~.:.:.
SfST •••

S~tnnk 100
Sove Gtoph As .. .

178

-
15. Species curves

mm ____
; Pr..-trebtment ...

S Tikus coral k-dominance cuNes


Factor A: ::~ -~- = ~ I A Lr.v els...

:: .. .
J

Year Test
222
0 Polrwtse tests to worksheet

,Glob"l Te" t
S~ple :stati:stic (Global R) : 0.461
Siqn1~1cance level o! :sample :stat1:stic: 0.1;
o 0.1 02 O.J 0, o. ~'wnber o! P"rmutatior.:s: 999 (Random :sample !rom 92 376)
R Nun-her o! permuted :stat1:st 1c:s oreat"r than o r equal to Global R: O
• r Y

(Loeb Creran The final set of data in the Tutorial is of a benthic study by Gage JD & Coghill GG ( 1977, in: Coull
contiguous B (ed), 'Ecology of marine benthos', Univ S Carolina Press, Columbia) at a single site (C-12) in
macrofauna Loch Creran, Scotland, involving 256 contiguous cores, arranged along a s ingle transect. Small
cores) cores were used to examine local-scale dispersion ('clumping') propenies of sediment macro fauna .
The data matrix of 67 species by 256 samples is the file crma in dire'ctory C:\ Examples v6\Creran;
open this into a new worksp:ice. It will be used here to illustrate the final ty?e of curvilinear plots
· ~ available in PRIMER, species accumulation curves.

Species The Analy!:c>Spccies Accum Plot routine plots (and lists) the increasing total number of different
accumul- species observed (S), as samples are successively pooled (often referred to as the 'Sobs' curve).
ation plots This is accessed when the active sheet is any speciesxsamples matrix (though for all except the
Chao! index, see later, which requires genuine counts, only the presence/absence stmcturc is used).
There are 3 options for the order in which samples in the data sheet are successively amalgamated :
Sample order•Original, •User defined or •Permute. The first case simply takes samples in their
label order in the worksheet and the second specifies the order by a Select samples button. This
provides an Ordered Selection dialog (see pages 31, 61), in which samples can be moved up or
down with i or .!. buttons (alternatively, re-order the original labels w ith Edit>Sort, using a
factor). The third (default) option is to enter samples in random order; this being carried out 999
times (or whatever specified) and the resu lting curves averaged, giving a smoothed S curve.
The analytical form of this .mean value of the accumulation curve (over all permutations, in effect)
was given by Ugland K, Gray JG & Ellingsen K 2003, J an!m Ecol 72: 888-89 7, and is computed
by Analysc>Spccics-Accum Plot>(Indi ccs/UGE). [It is the counterpart of the analytical form
given by Hurlbert SH 197 1, Ecology 52: 577-586, for the Sanders rarefaction curve met under
Analyse>DIVERSE, which gave expected numbers of species for subsets of individuals from a
single sample (whereas here we are talking about subsets of samples from a data matrix).]
Although, for large numbers of permutations, the /UGE curve will always lie on top of that from
:. ' (Sample order•Permute) & (Indices .I'S), it 1s included as a separate option so that the combination
of original sample order for S with the mean curve (UGE) can be taken. For samples that arri ve in a
non-arbitrary space/time order, this allows comparison of the real accumulation curve with its
smoothed version: spatio-temporal heterogeneity will display as a jagged or s tepped S curve.
,_
Select the first 64 samp les from crma using Selcct>Sampks>(•Sample numbers: 1-64), and submit
it to Analyse>Species-Accum Plot>(Samplc order•Original) & (lndices/S/UGE), turning off the
other indices. There is possibly some suggestion (though not much) of 'stepping' in the Sobs .curve
but this would be hard to test for, and there arc better ways of exploring spatial autocorrelat1on or
heterogeneity. However, it is clear that the accumulation curve is still rising and this naturally
raises the question as to how much larger Scan get, with repeated sampling in thi s area.
179
...., ..
....,../

r 11-
.... · ~j

. t'
Loch Creran macrofauna, contiguous cores ':!. ·=·'
60 1•· UCE
Sobsl
0s
Abundance O°""' 1
Res em
00>oo2
oe:· ·. : ~· .. 'iJ.F.. 40
B~ST ..

CASW
QPerm<Ae
0 Jocl;nlre 1
OJecimire 2
~ ' .
i
0 DIVER
'"'I~ .:
ii OOMO
, .' t >• ; >.,
Oeootstrop
2 llf\11.'T 20 r I ·
o~Mert
0 PCA ...
0UGE
0 0 r I I I

·------;
0
O
0
O
....0·'---,
Orbits
OK ' I -1
1 Geom
1 U$·1'·$$ip_tt~, ,, -
S estimators So, the opportunity has also been taken in PRlMER v6 to include a number of S extrapolators -
attempts to predict the true total number of species that would be observed as the number of
.. .. . \
,•

samples tends to infinity (the 'asymptote' of the species accumulation curve), assuming that a & I
closed community is being successively sampled. (They should not be confused with the V"'UGE or
permuted V"'S curves which, like rarefaction indices, look backwards, at the expected behaviour of
Sas samples are removed, and return simply the observed Sat the end of the series). There is a
choice of 6 extrapolators, each of which is calculated as every new sample is added, so the result is I I I

again a curve, of the evolution of the S predictor as sample size increases (though mostly one
I.
would use the end point prediction as the best estimate of the asymptote). Where the samples are • 1

entered in permuted orders, the predictions are again the average of the 999 estimators at each step.
These are non-parametric approaches, depending on simple functions of the number of species seen
only in 1 or 2 samples (Chao2, Jacknifel and 2), or the number of species that have only 1 or 2
,. ' ....
r
individuals in the entire pool of samples (Chao 1), or the set of proportions of samples that contain • r
each species (Bootstrap). The only parametric model given, for historical reasons, is Michaelis-
Menton, but this is often the least useful estimator because (as here) it can generate estimates below
the final Sobs~ The literature on S estimation is interesting but voluminous, and PRIMER does not
attempt a compreh.ensive approach. A good, and influential, summary for ecologists is Colwell RK
& Coddington JA 1994, Phil Trans Roy Soc B 345: 101-118, who detail the above estimators, and
an excellent software package for serious users in this area is Colwell's 'EstimateS' (http://
purl.oclc.org/estimates), which also contains the latest work of the influential Anne Chao.
So, finally, produce these estimates with Species Accum Plot on samples 1-64 in crma, taking the
option to permute the entry order, and noting that they (and their analytical sd's) are sent to a
worksheet. How close are these predictions to Sobs when selecting samples 1-128, and then 1-256?

al@~
Pre·treatment .. . Loch Creran macrofauna, contiguous cores
Resemblance.. . 100 · • Sobs
.. Chao1
BEST .. . 0s • Chao2
CASWELL. .. 80 • Jacknife 1
0c11eo 1
DIVERSE .. . o u:er detned • Jackn1fe2
OOMUIS .. . 0Cheo2 •· Bootstrap
Llt\'KTREE ..
! ...... :.: !'"•.-::·i-!.'. c , MM
0Jacknite 1 5 60
PCA ..
SIMPER.. .
0PermUle
~ Mox perrnutot1ons:
8 Jacknite 2
u
...,
0

SIMPROF.. .
j999 EJ BoolS1rop ~ 40
V>
Oomina:>c:e Plot .. . 0 Mchoels McN on

20

.
20 40 60 80
Samples

~
Sobs Sobs(SO Chso1 Cheo1(SO Choo2 Choo2\SO Jocknifa1 Jec.l<nite2 Boolsltop MM "'
54.741 0.52192 75.439 1 14,131 9).619 27.547 72.314 t 85.529 62.14 ; 52,44
- - - 55 . - - o 76.333 j u.s251 "955- 2854972.719 ·-
86.343 62.567 : 52.59 ~

180
~
~

~-
~- Indexes

-~
I Acknowledgements j
6
:~
We would like to thank our many colleagues around the world, too numerous to list, for being such willing
.... . 'guinea pigs', providing feedback on earlier versions of PRIMER 6. It has been a long time in gestation but
~-
' we hope you will feel it has been worth the wait! Special thanks are due to our colleagues at the Plymouth
Marine Laboratory (PML), particularly Paul Somerfield and Richard Warwick, for continued scientific coll-
~ aborations, which provide the spur for new PRIMER tools. In this respect we would also like to acknowledge
~- funding from the relevant UK government department (DEFRA) for research support, making it possible for
us to develop and publish new techniques in the open literature. We are equally grateful to the director, Nick
~- Owens, and many of the staff of our host laboratory, the PML, for continued encouragement of the 'spin-out'
~.. company that PRIMER-E Ltd represents; to the Natural Environment Research Council of the UK for their
enlightened support for this venture on its inception in October 2000; and to Steve Hawkins and his staff at
~~ the Marine Biological Association of the UK (MBA) for the closer ties that have developed there. KRC
~- acknowledges his positions as an honorary feilow of both the PML and the MBA.
·'-'
~-
~,,
- Index to data sets I
Numbers are pages ~n which the specified data sets are open and analysed; a number in bold indicates the
~ location of a brief introduction to that data, and a source reference. The list is in order of.first appearance of
{%'\
... the data in the text, and the marginal box indicates the data sub-directory.
~ \Ekofisk Ekofisk oilfield monitoring (N Sea), soft-sediment macrofauna abundance & contaminant variables:
~ 20,21-24,31-32,38-39,4J,45,49,81-85, 146-148
~-· \Tasmania Tasmanian soldier crab disturbance study (Australia), sandflat nematode abundance:
~ 24-28,29-36,92-~4, 100-102, 133-135, 141-142

"'~~\ \Grdfish Groundfish trawl surveys (NW European shelf waters), fish abundance:
36, 50-51, 62-64, 95-100, 167-172
~~ \WAfdiet W Australian coastal fish, dietary data, gut composition of prey categories:
37-38, 86-88, 89-92, 129-132
~..
.... \Fal Fal estuary mudflats (SW England), copepod abundance:
~ 40-41

~ \Sediment Danish sediments, particle size distributions:


42, 135
~.".\
\Bio mark Biomarkers in flounder tissues (Southern N Sea):
~- 53-55, 118, 133
fi'. \Exe Exe estuary intertidal (SW England), nematode abundance & natural environmental data:
57-61, 75-80, 138-139
~ \BCzoo Bristol Channel plankton net hauls (W England), zooplankton abundance:
~- 65-68,69-74, 140-141
~ \Clydemac Clyde sludge dumpground monitoring (Scotland), macrofauna abundance/biomass & contaminant variables:

~
80-81, 104-110, 112-117, 120, 152-153, 161-164, 173
f}ii. \Phuket Ko Phuket reefs under disturbance (Thailand), time se1ies of coral transect cover:
103, 144-146, 153-155
~ \Messoldi Messolongi lagoon system (E Central Greece), diatom abundance & water-column environmental data:
~- 122-128
\Calafur Calafuria colonisation experiments on subtidal rock (Ligurian Sea, N Italy), macroalgae cover:
~
J 136-137
~ \Morlaix Morlaix Bay, Amoco-Cadiz oil-spill (N coast of France), time series of macrofauna abundance:
~ 150-151, 157-159
~-
J
\Solberg Solbergstrand mesocosm, nutrient-enrichment experiments (Norway), copepod abundance:
160
~ \Bermuda Bermuda, Hamilton Harbour, sub-tropical macrofauna abundance:
,.\
~. 164-165
~·' \Linnhe Loch Linnhe pulp-mill effluent monitoring (Scotland), macrofauna abundance/biomass:
174-177
~:

~
\Corals Coral transects at Tikus Island (Thousand Islands, Indonesia), coral percentage area cover:
178-179
~ ~-:~~~~ Loch Creran contiguous cores on a single transect, soft-sediment macrofauna abundance:
179-180
~ 181
\C,.,
~
Indexes

General index

Entries in bold denote items from the main menu, with sub-menu choices indicated after a > sign. (Entri~s th!1t begin with
a> sign are already sub-menu or tab items). Numbers in bold indicate the main entry for that term, which is ·usually the
right place to start looking.

> for data matrices ........... 9, 2S, 26, 27, 35 ANOSIM Rstatistic ................. 16, 109, 129
for triangular matrices .................. 35, 54 insensitive to rep number.................. 131
3-d ordination plots.JO, 75, 8S, 86, 87, 117, interpreting R ............................ 130, 131
>A Leve!s ....................................... 132, 134
118 when few pcnnutations................ 135
>Add ................................................ 60, J64
when many pcrmutations............. 135
>B Levels ............................................... 134
pairwise R values .............. 130, 131, 136
>BIOENV ....................................... 121-23 A
converted to 'distances' ................ 131
>Bubble ..................................... 84, 85, 110
use in linkage trees ................... 126, 156
>BVSTEP ................................ 121, 156-58 ABC curves ...... 16, 145, 165, 173, 174, 175 ANOSIM test .•• 161 29, 56, 81, 129, 130-39
>Coinbine ........................................ 60, 151 Abiotic ........................... See Environmental I-way layout ..... 129, 133, J46, 160, 118
>Contour ......................................... 80, 172 Active sheet..... 13, 14, 29, 57, 97, 106, 121, ir.
-~
defining factor.............................. 129
>Create factor ............................ 60, 72, 79 143, 152, 154, 155, 175 pairwisc.tcsts:......... 16, 130, 133, 136
>Cumulate samples ................................ 42
..-- !'~~
Active window .. 14, 29, 43, 591 67, 75, 143, used on 2-way crossed data ......... 139
>Delete ..................................................... 30 150, 161, 169, 173 2-way crossed ........... 129, 133, 134, 135
>Dispersion weighting ..................... 40, 41
>Edit .......................................... 30, 60, 164
Add dummy variable ................... 44, 45, 49 defining factors .................... 133, 135 (~
Add Note ................................................. 73 null hypotheses .................... 129, 133 ~
>Fill Down ............................ 30, 31, 60, 80
>Font ........................................... 58, 59, 84
Adding worksheets with Merge ............. 101 pairwise tcsts ........................ 134, 135 -- •.·- .. ·~
>Frequency Data .................................. 171
Aggregate ................ See Tools>Aggregate
Aggregating data sh~et .............. 15, 91, 150
removes 'nuisance' factor ............. 135
...
- .;_~
>General... ........... 14, 16, 59, 121, 169, 170
>lmport ................................................... 34
>Kev ............... 33, 58, 62, 65, 86, 170, J72
Aggregation file ............... IS, 17, 36, SO, 95
can be numeric or alphabctie ............ 150
checking consistency .................. 98, 166
2-way crossed (no rcps) ............ 129, 138
defining factors ............................ 138
do I-way test if no effect ...........-139 ... .•
-~
~

··~
interactions must be small ........... 139
>Ke~·s font ............................................... 59 for taxon distinctness .. 49, 161, -166, 169 no pairwise tests possible ............ 139
>Le~els ................ 15, 68, 91, 140, 170, 178
·-~
in Excel ............................................... 22
>!\'lain .................................................... 152
>!\laster taxonomy ............................... 169
>More ............. 46, 51, 52, 53, 6&, 125, I 52
master list .................................... 97, 166
mi~ing entries .................................... 98
null hypotheses ............................ 138
rank correlation p.. statistic ......... 138
tolerates~ missing data ......... 138
' ..~
tree display .......................................... 99 2-way nested ............................. 129, 136 ~
>Normalise variables ....... 14, 39, 113, 115 v. indicator .................................... 36, 97 defining factors ............................ 136 .~
>Page ....................................................... 74
>Pick .............................................. lOS, 106
All zero samples/variables ... 44, 52, 99, 152
Analyse betw.:cn variables ...................... 52
do l-way test if no effect ............. 136
null hypotheses .................... 129, 136
... -~
.......
>Printer ................................................... 74 Analyse> ............................................ 16, 70
--~
presumes lower level effect ......... 136.
>Renaanc ................................................. 31 2STAGE ....... 16, 150, 152-55, 153, 178 ANOVA analogue .... 129, 133, 139, 146 ; .._ ..
>Reorder ................................................. 31 ANOSIM ..... 16, 129, l32-J8, 146, 173, l.~
>Resemblance ................................. 6S, 121
>S Range ............................... 169, 170, 172
>Scale ...................................................... 84
177
BEST ... 16, 121, 122-25, 145, lSS, 156,
dialog box
max permutations ................ 129, 130
pairwise tests to workshect .......... 131
.. :.~
••• I

>Select ....................................................:.34
>Select variables ........................... 121, 123
157, 159
CASWELL ...................................... 165
CLUSTER .. 14, S7, 58, 62, 65, 66, 128,
R values (permuted) to file .......... 132
selecting subsets of levels.... 132, 134
features
.. ~~
>Sl:\IPROF .................................... 6S, 128 140, 154 ~
any rank resemblances.129, 133, 138 ~-.
>Standardise ............. 13, 37, 42, 52, 86, 97 DIVERSE ...... 16, 97, 161, 162-64, 165, balance not cssential.................... 129 •
>Taxonomy ..................... Sl, 166, 167, 169 167, 173, 179 for a priori groups (factors) ......... 129 .~
..
>Titles ............................ 14, S9, 60, 65, 112 DOMDIS .................... 16, 173, 177, 178 in full-d (not MOS) space .... 133, 138
>Transform (overall) ..... 14, 38, 41, 43, 49 Dominance Plot ......... 16, 174, 175, 178 no homogeneity assumption 129, 133 • -~
>\\'eight variables .................................. 42
>Weights ................................. Sl, 166, 167
>X axis ....................... S9, 68, 125, 170, 174
Draftsman Plot.16, 108, lll, 112, 120,
122. 163, 167
Geometric Class Plot ................ 16, 173
permutation test (on labels) ......... 129
for k-dom curves (DOMDIS) ... 177, 178 ..- --~ ~
null distribution plot.129, 132, 135, 178
>Yaxis ..................................................... S9 LINKTREE ............... 16, 1:.?S, 126, 128 for pairwise R test ........................ 132 I
MDS ... 16, 75, 81, 86, 96, 108, 123, 144,
151, 154, 164
for Pav statistic .............................. 138 ·-·,.re.~
not smooth for few pcrms ............ 134
J MVDISP ..................................... 16, 160 real R (dashed line) ...................... 130 -·.. ~
•.:.. 4 ..,._
PCA ............ 16.113, 115, 118, 123, 163 results window .. 130, 133, 135, 136, 139
~
1-way layout. .................. 129, 133, 160, 178
of second stage matrix ...................... 1SS
Pre-treatment>
Cumulate samples ........................ 42
'groups too small' error ................ 138 - -.•- ~

•-
possible permutations .......... 130, 134 ••_.J·
Dlspea·slon weighting ............. 40, 41 significance level ...................... 130, 135
Normalise variablcs ...... 14, 39, 113, all v. subset ofpcrms ................... 130 .-_; ..Ail\
2 115, 118, 123
Standardise .... 13, 37, 42, 52, 86, 97,
cannot use standard tables ......,_.. 134
no BonfC&TOni correction ............. 131
-
164 sensitive to rep numbcr................ 131
2-d ordination plots.75, 77, 78-85, 88, 117, Transform (overnll) .. 14, 38, 41, 43,
118, 123 v. RELATE test ........................ 146, 148
49, 150, 171 v. SIMPROF tesL ........................ 64, 129
2STAGE ................See Analysc>lSTAGE
Weight variables .......................... 42 with undefined resemblances ............. 99
2-way layout
RELATE .... 15, 16, 1021 124, 143, 144- ANOSIMl (vS) ...................See ANOSIM>
crossed..... 24, 29, 92, 95, 129, 133, 134,
46, 148 AppleMac ................................................. 9
135, 144, 153
Resemblance .14, 39, 43, 45, 46, 49-53, Application areas ....................................... 8
nested ........................................ 119, 136
55, 123, 133, 143, 152 Arcacover••8, 103, 136, 144, 153, 174, 178
SIMPER ..................... 16, 140, 142, 160 mixed with counts ............................... 42
SIMPROF ...................... 16,57,67, 128 Aspect ratio
J Speclcs-Accum Plot ... 16, 173, 179, 180 change in zoom ..................... 10, 63, 174
TAXDTEST ......... 16, 161, 169, 170-72 prcscrvcd in ordination ... 72, 77, 87, 117
3-column format ANOSIM .............. See Analyse>ANOSIM

182
Indexes

Assemblages/communities See throughout! Bray-Curtis dissimilarity ................. 47, 140 Comparing resemblances
Asymptote (of species accum curvc) ..... 180 Bray-Curtis 'family' of coeffs ........ 1SJ, 160 automatic reversal (simldissim) ........ 109
Avai!ablc/lncludc/Excludc lis~ Dray-Curtis pres/abs .......................... 49, SO biota v. ~nv .......... 16, 109, 121, 124, 155
in selecting factors ........................ 32, 92 Bray-Curt!s si!nilarity ..... 11. 14, 38, 40, 42, coefficients ................ I 09, I 50, 152, l 53
in selecting levels ........................ 91, 132 43,49,58,62, 75,86,93,96,99, 108, Comparison vS to v6 ... 9-11, 13, 14, 16, Ifs,
in selecting variablcs ................. 121, I 23 111, 122, 128, 133, 136, 144, 152, 157, 27, 34, 51, 69, 77, 78, 85, 90. 100, 124,
Average ...................... See Tools>Average 178 125, 129, 140, 173
Avcrage body mass ................................ 107 between species .................................. 52 Compiete linkage ..................................... 57
Average distance ...................................... 46 with dummy species ............. 44, 99, 178 Composite factor .............................. 32, 151
Averaging Brief tour ................................................. 13 for averaging/summing ....................... 95
dissimilarities ............................ I 3 I , 140 Brown et al 2001 .................................... 1OJ for selecting ........................................ 92
samples ..... 15, 32, 95, 96, 131, 153, 154 Bubble plots ....................... 82-85, 116, 140 Compressing taxonomic tree ................... 5 I
for(not across) a factor .......... 95, 154 BVStcp ..............................See BEST match Configuration ....... See Graph Configuration
variables .............................................. 97 dialog
worksheets ................................ 106, 108 Contact
c address/phone/fax ................................ 7
e-mail/web site ...................................... 7
B Canberra metric ........................ 47. 152, 153 Contaminant concentrationsl4, 17, 80, 105,
Canberra similarity ............ 44, 48, 152, I 53 109, 111, 114, 119, 156
B% statistic ......................See Linkage trees C'ao et al 1997.......................................... 56 Contourplots(AvNarTD) .... 168, 171, 172
Backward elimination ............ 121, 156, I 57 Cartouche ................................................. 12 Converting
.Balance in replicates ........................ 96, 129 Cascade ................. See Window>Cascade correlation lo similarity 52, 53, I 08, 150
BEST .......................... See Ana1lyse>BEST CASWELL ...... See Analyse>CASWELL similarity to dissimilarity ........ I 5, 47, SO
BEST match ............... 121, 122-25, 155-59 Ca.nve/I 1976.......................................... 165 Co-ordinates of ordination
all subsets v. stepwise ............... 121, 156 Caswell neutral model ........................... 165 in results (pre-rotation) ........... 73, 76, 77
applications Chao I & 2 richness eslimators ............. 180 save graph values (post-rotation) ....... 73
biomarkers v. contaminants.121, 156 Chcck .............................. See Tools>Check Copy/Paste .............. See Edlt>Copy/Paste
biota (subset) to environ ...... 121, 155 Check box ............................................... 12 Copying/pasting
biota (subset) to model ........ 155, 156 Checking <lata .............................................. 19, 120
biota (subset) to same biota .121, 157 aggregation file ........................... 98, 166 tactor contents ............................... 30, 34
biota to other biota ............... 121, 156 data matrix .................................... 15, 99 factors ................................................. 19
environ (subset) to biota ..... 121, 122, for missing values ....................... 99, 119 into dialog box ............................ 92, 159
123, 155 for negative values .............................. 99 results tables ............................ 16, 73, 77
structural redundancy........... 157, I 59 rcsemblances ............................... 99. 100 . text/plots to Note window .................. 73
dialog boxes Chi statistic .............................................. 56 Correlation coefficients
BVSTEPtab Chi-squared distancc.47, 93, 152, I 53, 160, Kendall rank........................ 52, 122, 138
fixed start (e.g. no vars) .......... 157 178 Pearson ........................................ 52, 113
num trial variables .......... 156, 158 Chi-squared mctric ................................... 47 Speannan rank ..... 52, 53, 113, 122, 138,
~ random restarts ........ 156, 157, 158 Chi-squared similarity ............................. 48 150, 153
stop criteria ..................... 157. 158 Citations ..................................................... 8 Weighted Speannan rank ... 52, 122, 138
'1' detail in results ..... 122, 123, I 57, 158 Clarke & Warwick /998a ........ 50, 156, 159 Correlations
~ fixed resemblance matrix .... 121, 123,
125, 155, 157
Clarke & Wan,·ick /998b ...................... 161
Clarke & Warwick 1999 ........................ 166
between diversity indices ......... 163, 167
between resemblance matrices 121, 124,
~ max number of results ................. 122
max trial vars (BioEnv) ....... 122, 123
Clarke & Wan,·ick 2001 ................ 168, 172
Clarke 1990............................................ 111
143, 155
between variables ...43, 52, 94, 113, 120
~ method (BioEnv/BVStep).... 121, 157
features
Clarke et al 1993 ................................... 103
Clarke er al 2006a ................................... 40
converting to sims ......... 52, 53, I 08, 164
to worksheet (D/man plot) 112, 113, 123
~ active sheet is sample dat:i ... 121, 157 Clarke et al 2006b .43, 44, 48, 50, 152, 178
Clarke er al 2006c .................. 136, 153, 155
Crossed factors ...... 24, 29, 92, 95, 129, 133,
causality never demonstrated! ..... 156 135, 144
~ generalises SIMPER Clark's divcrgence .............................. 47, 56 Camulating samples ................. .42, 56, 135
to all groups at once ................ 156 Clear HighlightSee Edit>Clear Highlight Cumulative dominance plot .. 174, 175, 177,
~ to continuous pattern .............. 157 Clearing 178
~
label matching ...................... 121, 145 dcsktop ...................... 14, 29, 49, 69, 117 Cumulative frequency curves .................. 56
ranks cater for different coeffs ..... 122 highlights .. 15, 45, 83, 89, I 06, 145, 175 Cursor
~ remove collinear variables .. 113, 121,
122, 123
Close ................................... See File>Close
Close All Windows ......... See Window> •••
change to hand .................................... 78
change to pointer .................... 63, '78, 81
~ global test .................... 11, 124, 125, 156
not for dependent matrices .......... 157
Closing
scssion ........................................... 32, 45
change to zoorn ................................... 63

~
current position ................................... 19
null distribution plot ............ 124, 125 windows .................... 14, 29, 49, 69. 117 row & column numbers .......... 22, 70, 98
number of permutations ....... 124, 125 workspace ......................... 15, 20, 24, 81 Cutting data .............................................. 19
~ real p on histogram ...................... 125 Clumping ofindividuals ..... .40, 41, 44, 164 CY coefficient. ......................................... 56
significance level ................. 124, 125 CLUSTER......... See Analyse>CLUSTER Cyclic-ity ................................................. 143
~ rank correlation p ...... 122, 123, 126, 155 Clustering ......................... 11, 57, 58-66, 75 Czckanowski coefficient ................... 44, 47
results window .................. 123, 125, 157 binary divisive (constrained) ...... 57, 126
~ uses variable number not label .... 158 display on ordination ...... 10, 78, 79, 116
~· stepwise search (BVStep).122, 142, 156 hierarchical agglomerative ................. 57 D
forward/back ................ 156, 157, 158 linkage mode ..................... 57, 58, 62, 65
#" Bin size in histograms ...................... 68, 125
Binomial deviance ................................... 55
on second stage matrix ..................... 154
with SIMPROF tests ....... 57, 65, 66, 140
Dr.ta labels & symbols ....... See Graph> •••
Data matrix

,..
~
I. -
Binomial deviance (scaled) .............. 56, 152
Bio-Env .............................See BEST match
Biogeography ......................................... 168
with undefined resemblances ............. 99
Collapse dendrogram ................... 14, 61. 62
reinstate sub-group ............................. 63
3-column format ............... 13, 25, 26, 27
aggregated to higher level .................. 97
checking ........................................ 15, 99
~· Biomarkers ... 8, 39, 43, 52, 53, 75, 94, 108, Collapsed MOS plot ................................ 88 direct entry .............................. 13, 18, 45
109, 111, 115, 118, 129, 133, 140, 156 Collinearity ...................... 94, 113, 121, 123 e:;timated values .................. 99, 119, 120
~
'-:
Bitmap output resolution ......................... 72
Blank samples .................................... 44, 99
Collins & Williams 1982 ......................... 65
Colwell & Coddington 1994.................. 180
Excel files ............... 13, 22, 23, 104, 176
large arrays ............................ 13, 27, 104
Combined cells in merging merging ................. 13, 27, 100, 103, 154
~- Blank species ............................. 52, 99, I 52
entries summed ................................. 101 minimax (rounded) of variables ......... 85
~· Body-size distributions .......................... 111
Bootstrap richness estimator .................. 180 forces error ........................................ 101 missing values ................... 11, 23, 94, 99

~
~
183

\, ..
~
~~
Indexes
·--~
••.•.• , ···1

mixed typc ........................................... 42 geodesic dist&nce .........•...•..............•..• 46 Delete>Rows/Columns ..................... 19


numeric cntries ••.....••.••.•...••••••............. 23 Hellinger ............................................. 47 Factors ••••• 13, 29, 59, 70, 133, 136, 178
opcning.•........•.........•............•.. 13, 21, 24 index of association ............................ 47 Factors>
orientation ............................. 13, 23, 104 Manhattan ....... 39, 46, 56, 135, I 52, 177 Add •.•••..•••.••....••..• 30, 31, 41, 80, 173
out of bounds values ••.........•.....•.......•. 99 maximum distance ...................... 56, 135 Canccl............................................ 30 :~
prcsencclabscncc................................. 23 Minkowski .......................................... 46 Combine ••.•.••...•••..•••.••• 32, 60, 92, 95
propcnics dialog Wald (chi-squarcd) ............................. 55 Delete..••.••....•.•.•.•.••.••••••••••••.•.•••..•.. 30 :·~-~
data typc •••..••.•.•.•....•.•. 18, 22, 27, 176 DIVERSE ........... See Analyse>DIVERSE Edit>Copy/Paste .................... 30, 34
description box .•.•.••.•••..•..... 13, 18, 22 Divcrsily cu:ves ..................................... 173 Edit>Delete ................................... 30 ·~·~
history box ............................... 22, 27 Divcrsily indices .............................. l 6, 161 Edit>FiJI Down ••. 30, 31, 60, 80, 173
-~ ,_
number of rows & columns ...•..•..•. J8 Brillouin H ••...........•..•......••............... 162 lmport ....................... 34, 41, 80, 108
samples as columns/rows.18, 22, I 04 Caswcll's V ....................................... 165 Key ........................................ 33, 172 ~--~
title ........................................... 18, 22 evenness J' (Piclou) •. 161, 162, 164, l 67, Rename ......................................... 31

--~·"~
row labels present on input ••...•........... 22 168 Reorder ......................................... 31
selecting subsets .................................. 89 Fisher's a ................................... 162, 173 lndicators ...••....•.....•...•••• 13, 35, 42, 164
size constraints (lack of) ............... 25, 27 Hill numbers ..................................... 162 lnsert>Row/Column ................. 19, I 05 ·~~
sparsc................................................... 44 Margalev richness d .......................... 162 Invert Highlight .................. 1S, 91, I59
text files ............................. 13, 2S, 26, 27 multivariate analysis of............. 163, 168 Labels ••..•••.••••.•...•... 13, 18, 42, I 05, I 07 -~
text format separator ..•••••••............ 2S, 27 rarefaction ................. 162, 164, 179, 180 Move>Rows/Columns ...................... 19
transpose ......................... 13, 15, 27, 104 results lo workshcel .......... 110, 161, 167 Propertiesl3, 15, 22, 27, 41, 43, 45, SO, ...-~
type ••••••• 9, 13, 17, 22, 23, IOI, 111, 176 richness s ... 97, 161, 162, 164, 165, 167, 86, 104, 109, 176
unique name .....................................t •• 24 168, 174, 179 Sort ................................................... 151 ..-~
v4 & vs formats .•....•....•.... 13, 17, 20, 27 Shannon H' ............... .162, 164, 165, 168 Sort>Rows/Columns ..•. 19, 26, 62, 100, ~·~
version number ................................... 2S Simpson I-A.' ............................. 162, 168 103, 179
De 'ath 1002 .•••.•........................•..........•. 12S total abundance N ............... 97, 162, 165 Eig:mvalucs/vectors •...... 113, 114, 118, 163 ,.......~
Default DOMDIS .............. See Analyse>DOMDJS Ellipse plots ..................................... I 0, 171
bubble scale/fill/border colours ........ 110 Dominance Plot ............... See Analyse> .. . EM algorithm ............................. I I, IS, 119 .-.~
fonl scale (global) •..•.••••••.••..••... 110, J76 Dominance plots E-mail.•••••.•••.•.•.••...••..•......••..•.••.•••..••••........ 7
initial dircctory .•.•.••.•...••••....•....•.. 21, 110 dialog box Enhanced (plus) metafile ......................... 72 -~
labels ................................................. 107 biomass worksheet...................... 176 Entering data direetly......................... 18, 45
resemblauces .•••....•.•..•.••..•..•.. 43, 45, 121 cumulative.................... 174, 175, 178 Emironmental data .•• 14, 16, 22. 23, 39, 80, ---~
results page width ............................. 110 ordinary ................................ 174, 115 94, 109, 11 I, 119, 122, 129
-~
symbol sizc/lypclcolour .••••. 58, 110, 176 partial ................................... 174, 175 Environmental variablcs ....•.. 18, 29, 57, 83,
Delete .•.••........................•... See File> Delete single worksheet ..•............... 174, 178 104, ltl, 112, 119, 133, 140, 155
Deleting correlating ........................................... 52 ~
W stats to worksheel .................... 176
blank species ..................................... 1Sl factors on ..................................... I 0, 173 Euclidean distance ........................ 45, 99 -~
branches in Explorer tree •..................• 71 match samples not species ................ 176 normalising ................................... 39, 99
cells in data sheet .............................. 120 testing ranked ............................... 108, 111, 113 ·~
data ...................................................... 71 DOMDIS ...................... 173, 177, 178 lransforming39, 104, 105, 111, 112, 123

--~
factor entries ....................................... 30 log weighting of ranks ......... 177, 178 E<(uitabilily ............................ See Evenness
factors .................................................. 30 W for ABC curves ....................... 177 Error message examples
labels ............................•................•.•. 107 Dose-response relations ......................... 11 1 cannot match labels ............ 83, 107, 145 ~~
results .................................................. 71 dotNct environment ......................... 7, 9, SI combined cell in merging ................. 101 ,---~
rows/colurnns ...................................... 19 Downweighling duplicate entty on rcad-in ................... 26
DenJrogram ........................... 14, S7, 58-66 dominant spccies ..................... 38, 40, 86 max iterations exceeded ................... 119
.~~
Dendrogram Options .................. See Graph erratic species................................ 40, 41 new cell in merging .......................... I 03
I
Dcndrogram dialog · unreliably identified species ............... 42 only one sample needed.................... 170 -·~
Denuded samples .................... .44, 144, 178 Draftsman Plot ................ See Analyse> ••• Esc key ..................................................... 18
Description box ...................... 13, 18, 22, 43 Draftsman plots .............................. 108, 111 Estimated values .............................. 99, 120 .. ~
Desklop .............................................. 17, 31 correlations lo worksheets ........ 112, 163 Estimates software package .................. 180
Detection limit .....................•........... 23, I OS Euclidean dislance .... 11, 14, 39, 43, 45, 56, -~~
for diversity indices .................. 163, 167
Dialog box ................................................ 12 for transform choice.......... 111, 112, 123 99, 104, 108, 111, 115, 116, 123, 133,
··~
·•
Diatoms .................................. 122, 125, 126 show (or omil) scales ................ 113, 167 143, 146, 153, 160, 178
Diel studies ....................... 8, 37, 86, 89, 129 with missing values .......................... 120 Evenness161, 162, 164, 167, 168, 173, 174,
Discrim!nating spccies .•.••...•.•.......• 140, I 57 176 ·~
Dropping mrer species ............................. 93
Dispersion weighting •....•....• ! I, 40, 42, 149
before transforming ..................... .40, 41
Dummy \'ariable ................................ 12, 49
for cocfficicnts .................................... 44
Examples............................................ 8, 181
deviations from displayed. .................. 76
-,..,
stats to workshc..:t ....•.•..........•..•.....•..•.. 41 value .............................................. 44, 45 finding ........................................... 13, 17 ·-~
Dissim ............................ See Tools>Dissim following lhrough ............................... 12
Dissimilarily co:fficicnts ................... 14, 43
Duplicate .................. See Tools>DupUcate
Duplicate sample/var labcls •.... 99, IOI, 102 Excel
blank cells ..................................... 23, 99
-.,.,'
Dissimilaril)' lo similarily •..•••.•...••..•..•.... .47 Duplicating
Distance between curves.S6, 135, 173, 177, aggregation sheet ................................ 98 data input ............................ 9, 22, 23, 34 ·-~
178 data sheet...•.15, IOS, 107, 120, 128, 159 faclor input/ou1put .............................. 34
Distance mcasurcs ........................ 14, 43, 46 file wizard .••••••.•......•...••••••.•.•.. 22, 23, 26 ..... ~
average distanc~ .................................. 46
graphs (on existing branch) 82, 100, 117
resemblances ..................................... 100 size conslrainl ......................... 9, 27, l 04
...
binumial deviance ............................... SS selections (to save) .............................. 90 version needed ................................ 7, 22 ~.~
binomial ~eviance (s-:aled) ..•.•.••. S6, I 52 st.irts new branch ............ 90, 92, 98, I 00 Exit ........................................See Filc>Exit
Canberra metric ................. .47, 152, 153 Dynamic linking (avoided) ...................... 72 Explanatory variables (BEST match) .... 121 ~
'chi' statistic ......................................... 56 Dynamic menus ................................. 29, 95 Explorer tree •.•••.•. 9, 14, 17, 29, 69-74, 176
chi-squared distance ....• 4"/, 93, 152, 153, :~·~
adding & saving notes ........................ 73
160, 178 back propagation....... 14, 69, 70, 90, 172 ..,~
chi-squared melric :............................. 47 deleting branches .......................... 14, 71
chord distance ..................................... 46 E duplicating to start new branch .... 90, 92 ···:~~
cocfficicnl of divergence .......•..•... 47, 56 hiding .................................................. 70 .~ :'" -
CY ....................................................... 56 Edit> .................................................. 70, 95 item version number ..................... 25, 71 ·~
Czckanowski ................................. 44, 47 Clear Highlight .. IS, 36, 45, 83, 89, 175 renaming hems ............................. 71, 73
Elh:lidean. 39, 45, 56, 99, 104, 108, 111,
116, 121, 133, 135, 143, 152, 153,
Copy/Paste ....................... 16, 19, 72, 73 rolling up/out branches ................. 71, 74 ~~
Exponing
178
Cut ...................................................... 19
factors to .xls/.txt files ........................ 34 ..
-~
,.
~
184
-·~
~~
Indexes

plots to other graphics software .......... 72 Rename Graph ................................ 112 Graph Options dialog ......................... 57, 79
Extensions Rename Resrm .......................... 71. 144 (overall) bubble scale ......................... 84
.agg ........................................ 17. 21. 166 Rename Results ................................. 71 axis Jog scale ....................... 57, 174, 177
.bmp/.jpg/.png.l.tif........... 14. 17. 72, 176 Save Data As .................... 20. 27, 34, 90 axis rescalc ........ S7, 59, 6S, 76, 125. 170
.cnv ...................................................... 35 Save Graph As ............................. 71, 73 axis scale size ...................................... 58
.csv .......................................... 17, 25. 26 Save Graph Values As .............. 73, 115 bubble filllboundaty colours .............. S5
.emf ................................. 14, 17. 72, 176 Snve Resem As ................................... 54 bubble minimax scaling...................... S5
.pml ................................ 17. 21. 27, 101 Save Results As ................................. 73 bubble values font. .............................. 84
.ppl .................................... 14.17.21, 72 Save Workspace .......................... 20. 74 contour fili ........................................ 172
.pri ................. 17. 20, 21, 29. 71. 90, 100 Save Workspace As .............. Is. 20, 32 contour line style/colour ....... 79, 80. t 72
.pwk ................................... 15, 17. 20. 21 Fill down de1ault settings (global) .................... 11 O
.rtf................................. 16, 17, 73 label number ................................. 32. 80 edit button ........................................... 60
.sid .•...•...................•.••...••.•• 14. 17. 21, 71 pattern ................................................. 30 edit main/sub/axis titles ... 57-59, 65, 76,
.sim ...................................................... 17 value ...................................... 31, 60. 173 84, I J2, 132, 170
.spc ...................................................... 28 Finding examples ............................... 13, 17 font typc/colour/siz:: .•. 5".7-59, 76, 82, S4
.txt .......... 17, 20, 25, 26. 34, 54. 73, 132 Fisher's a ........................................ 162, 173 greyed-out options ........................ 76, 85
.xls ...•.........•.•. 17. 20. 22. 23, 34. 54, 104 Flip ordination axes ...... 78, SI, SS, 96, 115, key button ................. SS, 62, 65, 86. 172
117, 123 key/sub title sizes linked ..................... 82
Flip XN .................... See Graph>Fllp XN line width ...................................... 57. 59
Force in/e,clusion (BESTJ ••..•••..... 121, 123 moving lcveb; in key ..................... 62, 65
F Forward stepping ................... 121. 156. I S7 plot history box ....................... S7. 59, 82
Fourth root transform •... 38, 5S, 65, 75, 133, plot keys box ........................... 59, 84, 85
Factor ........................................... 13. 29-35 136, 1SO, 157 plot labcls/symbols .... 14. 57, 58, 62, 65,
combinations Funnel plots.............................. 10, 170, 171 72, 75. i9, 81, s2. 118, 170, 178
for averaging .................................. 95 remove text pane ................................. 61
for selecting.................................... 92 remove titles .................................. 60, 82
defining levels for ANOSIM etc reverse vertical text ............................. 59
I-way layout ................................ 129 G ~ymbol size/typc/co1our ... S7, 58, 76, 81,
2-way crossed ...•.. 133, 135, 153, 155 E6, 112, 170, 172
2-\vny nested ................................ 136 Genetic/molecular analysis .......... S, SS, 121 text size relative ............................ 5S, 82
dialog box ............................... 30. 60, 62 Geodesic metric ....................................... 46 Graph types ...... See Ordination, Clustering,
adding ...................30, 31. 60, 80, 173 G~ometrlc Class Plot ...... See Analyse> ••• Null distribution, Similarity profile,
cancelling edit ..•........................•.... 30 Geometric cla.1;s plots ............................. 173 Shepard, Dominance, Draftsman ,
combining ........•... 10, 32, 60, 92. I5 I Geometric mean ....................................... 48 Linkage tree, Species accumulation.
filling down content ..•.. 30. 31. 60. 80 Global tests ............. See also ANOS IM test Geometric C:ass
import in workspace .. 10, 34, so. 108 ANOSIM ........................................... 130 Graph> .......................... :......................... 14
key ............................................ 33, 62 BEST match ...................... 124, 125. 156 Bubble tab .................................... 84, 85
renaming ........................................ 31 DOMDlS .......................................... 177 Contour tab ........................................ 80
re-ordering .....•........•........ 31, 62, ISi ncwinv6 ..................... ll.124, 156, 177 Data labels & symhols ... 14, 58. 59, 60,
features RELATE ................................... 124, 144 62, 65, 72, 75, 79, S5, 110, 112, 132,
alphanumeric contcnt ............... 29, 80 Goldman & lambs/read 1989 ................ 165 164, 170, 178
behaviour under averaging ............ 9~ Gower similarity ...... 38, 42, 47, 48, 93, 152 Flip X/Y .............................. 78, 115, 117
factor or variable? .................... 31, S9 Graph Configuration dialog ..................... 79 General... ........................ 59, 76, 82, 110
forward/back propagation .. 34, 60, 70 2-d bubble plot. ............... 79. 82-85, 116 MDS subset ................................ 88, 144
numeric 29. 31, 62. so. 116. 144, 146 a."(eS to displny............... 79, SS, 116, 117 Pointer .......................................... 63, 78
unlimited length ............................. 29 bubble dnta values as labels .......... 84, 85 Rotate Axes ................................ 85, 117
unordered ................... 29. 31. 62. I S6 bubble data variablc ................ 82, 83, S4 Rotate Data .......................... 78, 85, 117
for averaging.lsumming ......... 95. 96, 173 bubble data worksheet ............ 82, 83, S4 Special...14, 60, 6S, 72, 79, Sl, 82, 113,
from CLUSTER slice ....... S1, 60, 72, 79 greyed-out options .............................. 79 114. 116-18, 125, 132, 144, 154,
from SIMPROF groups ................ 65. 68 overlay clusters ................... 79, 116, I S4 163, 167
import/export externally ............... 10, 34 overlay trajectol)' .. 79, 80, 116, 144, I SO Titles tab ................... 14, 59, 60, 6S, 112
importing v4 conversion fite ............... 35 plot type .............................................. 79 ;\axis tab .............................. 59, 68, 125
in Excel ......................................... 34, S5 show/hide variable vcctors114, l 16, 123 Y axis tab ............................................ 59
in plot file (.ppl) .................................. 72 slack% .......................................... 79, 80 Zoom ln/Out ................ 63, 87, i 12, 117
in text file ............................................ 34 Graph Dcndrograrn dialog Gray er al J990 ........................................ 22
tevcls ................................................... 29 create factor from slice ..... 57, 60, n. 79 Greyed-out options ...................... 76, 79, 85
saving .................................................. 29 fixed slice ................................ 57, 60. 79 G:-oundfish ................... 50. 62, 95, 167, 169
Factors ............................See Edlt>Factors orientation ............................... l 0, 57. 60 Group average linking ........... 57. 58, 62, 65
Faith 1992 .............................................. 161 Graph Draftsman dialog Gro\\1h curvcs ........................ 81 43, 56, 177
Faith similarity ....................... ..4S. t 52, 1S3 show scales ....................................... 113
Field et al J982 ........................................ 57 Graph Histogram dialog
File types .................................................. 17 bin size ................................ 6S, 125, 132
File> ......................................................... 70 Graph operations H
Close Workspace ......................... 15, 20 changing aspect ratio in zoom ............ 57
Delete Data......................................... 71 changing border by zoom ..... 87, 96, l 5 I Harmonic mean ........................................ 48
Delete Results .................................... 71 collapsing dendrograms .......... 57. 61, 62 Hellinger distance .................................... 47
Exit ............................................... 32,45 copy/paste to Note window ................ 73 Help> ......................................................... 7
Import> flipping ordinations .•. 78, 81, 85, 96, 115 C.:ontents>File types .................... 34, 55
PRIMER4 Conversion File ......... 35 opening v6 plot files ........................... 72 Hide/show vector plot (PCA) 114, 116, 123
PRIMER4 Species File ................ 28 printing ................................................ 74 Hierarchy ................................. 49, 161, 168
New ................................... 13, 18, 20, 4S re-sizing window .......................... 57, 58 Higher taxonomic levels ...... 15, 36, 97. 150
()pen 13.20,21,22-24.26,27, 72, 101 rotating dcndrograms .. S7, 61. 62, 6S, 73 Highlighting ................................. 89. 90-94
Page Setup ......................................... 74 rotating linkage trees ........................ 128 by clicking on labels ........................... 89
Print .................................................... 74 rotating ordinations ..... 73. 78. 85, 86, 96 clearing all .............. 36, 83, 89, 106, 175
Print Preview ..................................... 74 savina graph values .................... 73, 115 clchighlighting part of sheet ................ 89
Recent Items ...................................... 41 saving plots (& fonnats) ................. 9, 71 factor contents ..................................... 30
Recent Workspaces ........................... 31 subset selection by rectangle 87, 88, 144 fnctors ................................................. 30
Rename............................................... 14 text pane ........................................ 61, 62 for transform (individual) ... 89, 104, 106
Rename Data ......................... 41. 71. 96 zooming in/out .......... 57, 63, 87, 8S, 172 inverting ...................................... 91, 159

185
Indexes

is I!.Q! selection ............... ~ .............. 15, 89 samples/variables ........................ 18, I 05 Shepard diagram ............................ 75
samples mlil variables ......................... 90 sorting ulphabctically by ... 100, 103, ISi slackness of contours ............... 79, 80
sequence of labels ............................... 89 uniqucness ....... 18, 34, 99, 102, 103, l4S features
toggled (on/otl) ............................. 15, 89 Labels .............................. See Edlt>Labels aspect ratio preserved ...... 77, 87, 164
whole sheet ................................... IS, 89 Large arrays ................................. 9, 27, 104 no vector plots (invalid)............... 116
Hill numbcrs ........................................... 162 Left-skewed distribution ........................ 11 t rank order preserved .. 75, 76, 77, I 09
History box ....................... 14, 22, 27, 4S, 86 Legendre & Legendre 1998............... 14, 43 separate 2-d &. 3-d plots .. 79, 85, 117
/{uurslon el al 2004 ................................. 31 Likelihood ratio statistics......................... 55 for subset of points
Hurlberl 1971 ........................................ 179 Line selected by factor levels ........... 88, 94
width ................................................... 59 selected by rectangle•. IO, 87, 88, 144
Linc stylc/colour .......................... 33, 79, 80 linking to clusters.......................... 78, 79
I Linkage oplions (in ciustcrlng) 'means' plots ...................................... 131
furthest neighbour ............................... 57 ofv~abl.~ .................... S2, 53, 161, 164
lcons ......................................................... 17 group averagc.................... 57, S8, 62, 65 on d1vers1t1es ..................................... l6I
in Explorer tree ................................... 70 nearest neighbour .......................... 57, 62 on higher taxonomic levels ................. 97
on Tool bar .............. 78, 87, 88, 112, 117 Linkage trces .................................. 125, 126 on model matrix ................................ 148
lmport ............................. See FUe>lmport dialog box on replicate means .............................. 96
Importing fixed resemblance matrix ............ 126 on tied ranks ................................ i1, 148
factors from .xls/.txt files .................... 34 min group/split sizes &. split R .... 127 results window ........................ 1S, 76, 86
factors within workspace ...... 34, 80, I 08 SJMPROF tab .............................. 128 •• (iteration not converged) .......... 76
PRIMER 4 conversion file ................. 35 features lowest stress in restarts ............ 76, 77
PRIMER 4 species filc........................ 28' binary divisive clustering ............ 126 second stage plot.....:.150, ISi, 152, 154 ~~
Index ofdispersion............................. 40, 41 divisions maximise R. .• 126, 127, 156 stress valuc ........ 75, 76, 77, 8S, 117, 133 r.- . -
Indicator ................................. 13, 19, 35-36 not metricllincar/additivc............. 126 with undefined resemblances ..... 99, 100 .. ~
fo:- averaging/summing ....................... 97 prune variables for prcdiction...... 126 zooming in/out .............................. 87, 88
for labelling variable plots ................ 164 tolerates~ missing dara ......... 128
uses untransformcd active shect..126
MOS subset ........ See Graph>MDS subset
Measurement scales
'~.~
~mport/e~pon to external files ............ 35
1n selection .......................................... 36 plot&. text pane ................................ 127 common ................ 40, 42, 111, I IS, 118 !~J.,
Indicators .................. See Edit>Indicators 8%, mean across-group ranks ..... 127 common by ranking .................... IS, 108
rotate (text also changes) ............. 128 difforcnt... 14,39,S2, 113, 11S, 118, 163 ~
Infinite values ........................................... 99
results window .......................... 127, 128 Meiofauna ....... 24, 40, S7, 1S, 92, JOO, 133,
Initial directory ................................ 21, I JO
with SIMPROF tesc .......................... 128 138, 141, 160 . -~
Input fonnats ............................................ 17
Linking biota & environment · Memory ................................................ 7, 27
text files ............................................... 25
Insert ................................ See Edit>Insert BESTmatch .. 16, 116, 121, 122-2S, lSS Menu contrasts ............................. 16, 70, 95 ~~~-~
Inserting rowskolumns .................... 19, I 05 BEST v. bubbles v. LINKTREE ...... 125
bubble plots ......................... 83, 116, 125
Merging
automatic label matching .......... too, I 03 -~
Installation ............................................ 7, 17
causality never demonstrated ........... 156 combining cells forces error ............. 101 -~
Interrupt cxccution ............... 10, 51, 6S, l:!S
Inverse uansfonn ................................... 111 linkage trees .................. 11, 16, 125, 126 combining cells sums cntrics ............ 101
LINKTREE .... See Anah•se>LINl\.'TREE join (rename duplicates) ........... 101, 102 -~._,
Invert Highllght ........... See Edit>Invcrt .. .
Inverting highlights .................... 15, 91, 159 Log axis scale...................'S7, 173, 174, 177 large arrays .......................................... 27
Log normal distribution ......................... 173 new cells force error ......................... 103 ..:...~
Js::ak & Price 1001 .................................. 50 : I..

Log series distribution ........................... 173 new cells zero/missing ...................... 103
~
J
Log transform .. 38, 105, 111, 118, 122, ISO non-uniform species lists ... 9, IS, 18, 27,
103, IS4
relax strict matching ................. 101, 102
~- ~

M samples (for same variables) .... 100, I S4 _-.~
Jaccard similarity ............................... 48, 49 variables (for same samples) .•. 100, JOI,
-~-~
Jacknife I & 2 richness estimators ........ 180 102, 106
Join (vS routine) ............ See Tools>Mcrgc Mncroalgae ..................................... 136, I SS
Macrofauna ... 17, 21, 80, 81, 106, 146, 150, Methods manual ................................... 7·, 17
Joining data sheets .................. See Merging Michaelis-Menton richness estimator •.. 180 -~
Joining points 10, 29, 79, 80, 108, 116, 144 :s2. IS7, 161, 164, 173, 174, 17S, 179 Millar & Anderso11 2004 ......................... 56
Joint absences Manhattan distance ..... 39, 46, S6, 135, IS2, ;~. ~
177 Minimum path length .............................. SO
counted ...................................... 111, I 53 Minko\vski metric .................................... 46
Mantel-type tests ............ See RELATE tests
ignored ..............44, 47, 48, 56, 111, 153 : -··~
treatment is cn'.cial ............................ 153 Margalcv's richncss ................................ 162
Margin numbers ....................................... 12
Missing ........................ See Tools>Missing
Missing data -.......,._,
estimation .... I I, IS, 23, 94, 99. 119, 120

K
Matching cocfficicm ...........See RELATE p
Matching labels ...... 9, 18, 34, I00, I02, I 07
assumptions (restrictive).............. 119
EM algorithm ............................... 119
~- . ,,.,
biomass &. abundance ....................... 176
biota&. environmcnt ................... 83, 121 sensitivity study (use a) ............... 120 ::.~~
-.·-.., -
k-dominanccplot. ........... 174, 175, 177, 178 in aggregation data ...................... 97, 166 use with great care! ...................... 119
Kendall 1970 .. :......................................... 52 in RELATE ............................... 143, 145 in aggregation files ............................. 98
.:;_.~~ -
~.
~cndall rank ~orrclation .......... 52, 122, 143 Maximum distance ........................... 56, 13S in linkage trccs .................................. 128
pairwise v. listwise deletion ..... 119, 138 -·~
Key propagauon ....................................... 33 MOS ............................. See Analyse>MOS ...t ,
Key symboVtinc/colour ............... 33, S8, 86 unbalanced dcsigns ..................... 23, 119
MOS ordination ......................-... 16, 75-88 ,:Q..~
Kruskal fit scheme ..................... 77, 78, 148
Kulczynski pres/abs ...........48, SO, I 52, IS3
2-d configuration ...... 1S, 77, 78-SS, 133 values ................ I I, 23, 94, 99, 119, 120
Misspelt species name ....... 97, 98, 103, 166
- -~-

Kulczynski similarity ......... 44, 48, IS2, IS3


3-d configuration .............. 1S, 85, 86, 87
Model matrices ................................ 11, 143 --~
Uf~(--
arbitrary rotation/flip/scale ..... 77, 78, 8S
bubble plot ...................... 79, 82-85, 140 applications
cyclicity with rcplication ............. 149
- .J.~
~j~
changing 2-d border shapc.. 87, IS I, 164
L collapsed plot ...................................... 88 geographic layout ................ 143, 148
dialog boxes · scriation with rcplication ..... 146, JS6
simple cyclicity ............................ 143 ...:..: .. ~
Label 2-d plots from 3-d solution .......... 117 ~·--

~~
Kruskal lit schcmc ........... 77, 78, 148 simple seriation .................... 143, J5S
in plots appplications
centring with/without symbol.. 79, 82 minimum stress ........................ 77, 81
independent of symbol ................... 58 number of restarts .................... 75, 76 seriation with replication ............. 143 ~.~~
matching .......9, 1.8, 34, 83, 97, 100, 102, overlay cluster contours ......... 79, IS4 dialog box J·-
107, 121, 143, 14S, 166, 176 overlay trajcclory ...... 79, 80, 81, I 08, cyclicity................................ 148, 149
Euclidean 20 ............................... 148 ~- ~
- .~~
relabelling duplicate names .............. 102 144
factor A/B ............................ 146, 148
~~
186 ~~
:_r
--·"1 I'm\

~
A)
~
~
A-·
'<-· ···r .
Indexes

scriation ........................................ 148 Othcrcocfficients ................... 46, 52, S3, SS Ochiai .......................................... 48, 153
unordered groups ................. 148, IS6 Out of bounds values ............................... 99 Rogers & Tanimoto ............................ 48
Model Matrix ... See Tools>Model Matrix Outliers Russell & Rao ..................................... 48
Modifying plots ........ See Graph Options etc dominating PCA ....................... 111, 163 simple matching .......................... 48, I 52
Morphomctry 8, 43, S2, 111, II S, 119, 121, in environmental variables ....... 108, 111 Sorcnsen ........................ 48, 49, I 52, I S3
133, 140, IS6 in k-dominance plot .......................... 178 Presence/absence transfonn38, 49, 51, 142,
Move .................................. See Edit>Move in ordination ........................ 88, 144, 153 153, 169, 171
Moving rows/columns ............................. 19 Output fom1ats ............................... 9, 17, 27 Pre-treatment ...................See Analyse> •..
Multinomial frequencies .......................... 56 Overlay Pre-treatment operations ...... i3, 37-42, 149
Multiple page print................. :..•...•....•. 9, 74 bubbles .................................... 79, 82-85 c1Jmulating samples ...................... 10, 42
Multiple PRIMER dcsktops .•. 17, 31, 72, 74 clustercontours ................... 79, I 16, 154 dispersion weighting ..................... 40, 41
Multiple selections ................................... 92 trajcctory.79,80,fsl, 108, 116, 144, 163 nonnatising variables .... 14, 39, 113, 118
Multi-processor .................................... 9, 51 vector plot ................................. 113, 114 standardising ............... 10, 13, 37, 42, 86
Multi-tasking .......................... 9, 32, Sl, 12S Ovcrview .................................................... 8 transforming (overall)38, 43, 49, 51, 62,
Multivariate dispersion (MVOISP) 65, 75, 81, 86, 108, 122, J50
dependence on resemblance ............. 160 weighting variables ....................... I 0, 42
dispersion sequence (all groups) ...... 160 p PRIMER 4 files ...................... 17, 22, 27, 28
IMO (pairs of groups) ....................... 160 PRIMER 5 files .................................. 17, 27
Multivariate normality ...... 23, 99, I04, 111, Page Setup ............... See File>Page Setup PRIMER-E Ltd .......................................... 7
119, 133 Page width in results window ................ 110 Print ..................................... See File>Print
Mult~variate regression trees/CART.•..•. Sf!.e Printing
Parsimony .............................................. 123
Linkage trees ~arge plot over multiple pages ............ 74
Partial dominance plot ................... 174, 175
MVDISP ............... St!e Analyse>MVDISP Particle size analysis.8, 42, 43, 56, 75, 111, page set-up/print preview ................... 74
118, 129, 135, 140 plots/results/notes ............................... 74
PCA .............................. See Analysc>PCA Profiles (curves) ....................................... 56
N PCA ordination .......................... 16, 111-19 Propcrtlcs .................. See Edit>Properties
% variance explained 113, 114, 118, I63
Properties dialog ............................ See Data
Negative values 2-d configuration ...... 113, 115, 117, 118 matrix/Resemblances
after nonnalising ......................... 39, 111 3-d configuration .............. 113, 117, 118 Pscudo-produc:ion ................................. l 06
checked for .....................•....•.......••.•.... 99 arbitrary flip of axes ......................... 115 Pseudo-rcpticates ................... 129, !36, 139
in correlations, to sims ........ 52, I SO, 164 bubble plot ................................ 116, 140
ofW (ABC curvcs) .•.•.......•....••......... 176 dialog boxes
Nested factors ................................. 129, 136 2-d/3-d toggle ......... 79, I IS, 117, 118 R
New ....................................... See File>Ne\v max number of PC's ..................... 115
Non-metric MOS ................•......•..See MOS overlay trajectory ......................... 163 R statistic ..............See ANOSIM R statistic
Normalising variables.14, 39, 99, 113, 123, plots of!ligher PC axes ................ 117 Radio button ............................................. 12
IS3, 160, 163 scores to workshcet. ............. 115, 118 Random number seed ...................... 76, 158
after transforming •.... 104, 111, 113, 118 features Rank ................................ See Tools> Rank
automatic in correlation ...................... 52 aspect ratio preserved .................. 117 Rank r.orrcl3tions ..................................... 52
stats to workshcet. ......................... 38, 39 axes as gradient summaries ......... 11 S between resemblance matrices 121, 122,
to harmonise ranges .......................... 11 S axes linear in variables .......... 113-16 124, 138, 143, ISS
v. standardising ................................... 39 Euclidean distance implicit.......... 116 caters for different measures ....... 122
Notes expects correlated variables ......... 113 input to 2STAGE ................. 150, 153
added to Explorer tree ......................... 73 need for complete data ................. 119 p (rho) statistic ............. 122, 124, 138
printing ................................................ 74 sensitivity to outliers .................... 108 between variables
saved as .rtf files ................................. 73 on divcrsitics ..................................... 161 avoids transform/normalising ........ 53
Null distribution plot results window .................. 113, 1l 4, 115 Rank triangular matrix ............. 43, 109, 122
in ANOSIM 16, 129, 132, 134, 135, 138 eigcnva!s/vecs ...... 113, 114, 118, 163 Ranking l"escmblances ........................... 109
in BEST............................................. 125 PC scores ............................. 113, 115 always a distance rank ...................... 109
in RELATE ............................... 124, 144 Special menu shared with MOS. 79, 116 in ANOSIM .............. 109, 122, 129, 138
in SIMPROF ....................................... 68 vector lengths .................................... 114 in BEST ............................ 109, 122, 155
Null hypotheses •................. 16, 68, 124, I 5S vector plot in 2-d ... IO, 79, 113, 114, 116 in Llt.SKTREE ................................... 126
conservative tests .............................. 139 Pearson 1975 ......................................... 174 in RELATE ....................... 109, 122, 143
for 2-way crossed .............. 129, 133, 138 Pearson correlation .. 52, 112, 113, 163, 164 Ranking variables .............. 15, 53, 108, 115
for 2-way nested ........................ 129, 136 Pennutation testsSee ANOSIM, SIMPROF, common measurement scale ..... 108, 113
Numbering of resemblance coefficients .. 43 RELATE, BIO-ENV to give linearity ......................... 111, 113
Numbers in margins ................................. 12 Pcnnutcd values to file to give symmetry ...................... 111, 113
Numbers of variables v. samples ........... 119 for ANOSIM test (R) ........................ 132 to remove outliers ..................... 108, 113
for BEST match test (p) .................... 125 Rare species ..... 38, 42, 47, 52, 93, 153, 157
for RELATE test (p) ......................... 147 Rarefaction ..................... 162, 164, 179, 180
0 for SlMPROF test (7t) ......................... 68 Reccnt ............................... See File>Recent
Phylogenetic diversity (PD) .•. 161, 165, 166 Rectangular zoom.63, 87, 88, 96, 112, 15 I,
Ochiai pres/abs ................................ .48, 153 Pi (7t) statistic .................See SJMPROF test 164, 174
Piclou's c\:enness ........... 162, 164, 167, 168 RELATE .............. See Analyse>RELA TE
Ochiai similarity ..........................•...48, 153
Plot options ..See Graph Options/operations RELATE test ................... 16, 102, 124, 144
Olsgard et al 199719812000 ......•.•.•.......• 151
Open ................................... See File>Open Pointer ........................ See Graph>Pointer applications
Pointer icon on Tool bar .......................... 63 cyclicity........................................ 148
Opening seriation ........................................ 144
data shcets ........................•.... 21, 26, 100 Pointer return to
from rotate ..................................... 78, 81 scriation with replication ............. 146
multiple files ................... 17, 24, 80, 1SO t\'IO biotic arrays .......................... 14 5
PRIMER 4 & 5 files ................... 27, 101 from zoon1~ .................................... 63, 87
with actual distances ............ 146, I56
PRIMER 6 plot file ............................. 72 Poisson counts .................................... 40, 55
Pooling samples ................. 15, 95, 154, 173 dialog box
same file twice •..........•......•..•.............. 24 max permutations ....................... l 44
workspaces .................................... 20, 74 Pooling taxonomically ...... See Aggregating rank correlation method ............... 144
Presence/absence data .. 8, 46, 153, 166, 169
Optimal mapping statistic ...............•........ SO
Presence/absence similarities secondary data ............. 144, ·~~' t 4B
Options ........................ See Tools>Options
Ordination ...••.••.•.......•..•.• See MOS or PCA Faith .................................... 48, 152, 153
p values (permuted) to file ..... 7m1.44
Jaccard ................................................ 48 features .· )'t.>
Orient dcndrogram ....................... 14, 57, 60 nQ! for BEST variable su~i24
Orloci's chord distance .............................46 Kulczynski.. ........................48, 152, I 53 '·1,'..
Indexes

nru for dependent matrices .......... 144 wrapped columns .............................. 110 desclecting .............. 89, 91, 94, 145, 17S
nu\\ hypothc.o;is is Jlq match .144, 146 Reverse (log) transform ......................... 111 in resemblance matrix ......................... 94
null distribution plot .........•..•.... 124, 144 Rho (p) stitistic..... See RELATE p statistic no missing values ........................ 94, 119 ~~~

'".>~
regression analogue .................. 146, I S6 Richr.ess estimator:; ................. 16, 173, 180 resemblance >/< threshold.......... 94, 123
results window .................................. 144 bootstrap............................................ 180 Selecting samples Jlllil vari_ablcs ........ 90, 93 '.

v. ANOSIM tcst ........................ 146, 148 Chaol/2 ..................................... 179, 180 Selecting variables ....................... 89, 90-94 ~~·~
RELATE p statistic.......... 124, 143, 144-S9 jacknife 112 ........................................ 180 by highlighting .. 15, 39, 89, 93, 106, 120
Relatedness ...................................... 49, 16S Michaelis-Menton ............................. 180 by indicator levels ......................... 36, 93 ~·~
Relative composition ......... 13, 37, 174. 177 results to worksheet .......................... 180 by most important
Relax strict label matching Right-click menu fixed %................................... 93, 1S2 .;..-~
~ .":
in ABC curvcs ................................... 176 over Explorer tree ......................... 71, 73 fixed numbcr .......................... 93, IS7
in BEST............................................. 121 over plot ........58, 60, 63, 78, 85, 88. JIS by number ............................. 15, 93, 159 --~
in combining (transform) .......... 106, 107 over worksheet ...... 13, 18, 22, 43, 60, 90 deselecting .......... 36, 4S, 83, 89, 91, 159
.;._ ·'
-.-~
'in merging worksheets .............. 10 I, 102 Right-skewed distribution ....•. 111, 112, 123 no missing values .................. 23, 94, 119
in RELATE ....................................... 14S Rogers eta/ 199? ............. so, 168, 170, 172 Selection (available/include) dialog ~~·,~
Rename.......................... See File>Rename Rolling up/out tree branches ........ 71, 74, 99 for choosing factors ............................ 32
Renaming Rotate for choosing levels ...................... 91, 132 -...-~
duplicates in mcrging ................ 101, 102 dcndrogram ........... 14, 51, 61, 62, 65, 73 for choosing variables ....... 121, 123, 157 •-
items in Explorcrtrcc.14, 41, 71, 73, 96, linkage trce ........................................ 128 for ordering samples ......................... 179 -~
112, 144 ordination axcs ...................... 8S, 86, 117 Scriation ................. 143, 144, 146, 153, 15S
variables (auto on transform) .... 104, 123 ordination data ............ 73, 78. 81, 85. 96 Setting defaults ......................... See Default ~~~~
Repeated mcasurcs ........................... S6, 15S .. Rotate Axes ..•.•... See Graph> Rotate Axes Shannon diversity .......... 162, 164, 165, 168
Resemblance ..See Analyse>Resemblance Rotate Data ........ See Gr11ph>Rotate Data Shepard diagrams ........................ 10, 75, 76 .;..~
Resemblances II, 14, 17, 38, 43, 44-S6, S1 Rotate icons (on Tool bar) ................. 78, 81 Show/hide vector plot (PCA) .•...... 114, 163
between curvcs............................ S6, I 3S
between groups (from means, R) ...•.. 131
between variables ................................ S2 s
Similarity% contributions ..•... See SIMPER
Similarity coefficients........................ 14, 43
Bray-Cunis .... 38, 42, 43, 49, 52, 93, 96,
-
-·.~
-~

checking ...................................... 99, 100


comparing matrices .. 109, 124, J38, 143,
99, 111, 116, 121, 133, 136, 144,
152, 153, 160, 178
-!-~·,~
Sample totals ............................................ 38
153, lSS Canberra (excljoint absences) .... 44, 48, --·~
for environment-type data ................ 111
in Excel files ................................. 22, 54
in text files .......................................... 54
Samples as columns/ro\\s? .. 18, 22, 27, 104
Sampling effort uncontrollcd ......... 161, 168
Save......................................•See File>Save
152, 153
chi-squared .......................................... 48
Gower .•..•...•...... 38, 42, 47, 93, 152, 153
-·..
IL
~
.

More button & tab ........................ 46, SS


Saving
Gower (exc joint absences) ................ 48 €:...~
data sheets •.................. 1S, 20, 27, 71, 74
not sample s!zc independent ............. 119 factors .................................................. 29 Jaccard ................................................ 49
numbering scheme ..............................43 Kulczynski .................... 44, 48, 152, 153
graph values ................................ 73, 115
opening & saving formats ......•............ 54 Ochiai .......................................... 48, 153
plot (pixel) ............................. 14, 72, 176
out of bounds values ...............•........... 99 plot (PRIMER format) .................. 14, 72 presence/absence ................................ 46
propcnies dialog
plot (vector) .................... 14. 72, 77, 176
quantitative ......................................... 46 ~~
description box ............................... 43 Sorensen .............................................. 49
results window (.txt/.nf) ..................... 73
history box ............................... 43, 86 selected data ........................................ 90
zero-adjusted Bray-Cunis ..... 44, 99, 178
€-·-~
....
resemblance type .43, 4S, SS, 86, 109 Similarity of size-class data ............... 42, 56
to Excel ............................................... 27
titlc ................................................. 43
workspace ......................... 20, 24, 71, 74
Similarity profile............ I I, 57, 64, 67, 128 r~
ranking ........................................ 11, 109 Similarity slice on cluster ...... 51, 60, 72, 79
Schafer et al 2002 .................................... 37
saving whole matrix ............................ 54 Similarity to dissimilarity .................. 47, SO -~
Scores (PCA) to worksheet ........... 1lS, 118
transforming ................................ 11, 108 Sl~IPER ................ See Analyse>SIMPER
undcfincd ..............................44, 99, 100
Scrolling
factors dialog box ............................... 32 SIMPER proccdurc .... 16, 52, 140, 141, 142 ~-~
....
Rcscmblan,es .......•. See also (Dis)slmilarity I -way layout ............. 140, 142, I 56, 160
coefficients & Distance measl!res
in zoomed plot ...................... 63, 87, 112
Seasonality ............................. 143, 148, 149
2-way (crossed) layout ............... 11, 141 E·-~
Results to worksheet removes 'nuisance' factor ... 140, 142 ~~
Second stage matrix ........... 16, 150, 151-55
Caswell's neutral model .................... 16S dialog box
cluster analysis .......................... 1S3, 154
diversity indices ........ 110, 16 l, 162, 167 dialog box cut-off percentage ................ 140, 141 ~~·-~
richness cstimators .................... 173, 180 multiple matric:s ................. 150, 152 defining factors .................... 140, 142
Results window .............................. 9. 16, S1
outer/inner factors ................ 1S4, lSS
list only high contributors .... 140, 141 €.;~
copying tables into Excel... ..... 16, 73, 77 on Bray-Cunis or Euclidean .• I I, 140
rank correlation methoo ............... ISO
default page width ............................. J 10 single matrix with groups .... 153, I S4 features ..:.~
deleting ................................................ 71
for Aggrcgation ................................... 97
MOS features active sheet is data matrix............ 142
gives discriminating species ........ 140
e-··
.·~
cocff.>transform>taxon IS I, 1S3, 178
for ANOSIM ..................... 130, 133, 13S gives typifying species ........ 140, 141
for BEST ................................... 123, I S7
comparison with 'Isl stage' plot ... 154
in full.cl (not MOS) space ............ 140 ~·~
copes with repealed measures ..... 1SS
for CLUSTER ..................................... 66
for DIVERSE .................................... 110
deals with negative p ................... 150
'interaction plots' .......... 153, I 54, I55
results tables...................................... 141
by decreasing contribution .. 140, 141 ~.J~
for LINKTREE ................................. 127 Diss/SD or Sim/SD ...................... 141 :_~
summarises choices ............. 150, 153
for MOS .................................. 15, 76, 86 m:an (dis)sims ............. 131, 141, 160
for Merge .......................................... 100
testing by ANOSIM .......................... 1SS
Select> ............................................... 70, 89
mean (transformed) abund ... 141, 142 €·~!~
for MVDISP ...................................... 160 Simple matching similarity•...••..•..•.• 48, 152
All IS, 36, 45, 83, 91, 94, 159. 171, 175 SIMPROF .•.•.•.... See Analyse>SIMPROF
for Normalise ............................ 113, 11 S
Highlighted ....... 1S,39, 89, 90, 92, 120,
for other Tool items ............................ 96 SIM PROF tcst .......... 16, 57, 64, 65-68, I 28
for PCA ............................. 113, 114, 115 169. 175
n: (pi) statistic .................. 64, 6S-68, 128
for RELATE ..................................... 144 Samples ..... IS, 68, 91, 92, 94, 119, 123,
created groups on MOS ...................... 79
for SIMPER .............................. 110, 141 132. 140, 170, 178, 179
dialog boxes
for SIMPROF ...................................... 68 Variables.23,36, 91, 93, 119, IS2, 157,
cn:atc factor ......... 6S, 68, 72, 79, 140
for TAXDTEST ........................ 169, 170 159
number of perms ........ 64, 65, 68, 128
from Analyscfl"ools not Edit/Select ... 9S Selecting samples ......................... 89, 90-94
output n's to (text) file ................... 68
icon in Explorer tree ........................... 70 by factor levels .... IS, 68, 91, 92, 94, 144
SIMPROF tab .................. 6S, 67,' 128
ordination co-ordinates ....••..• 73, 77, 11 S by highlighting ................ IS, 89, 90, 175
stats to worksheet ........................... 67
printing ................................................ 74 by multiple factors ........................ IS, 92
underlying data shect ....... 65, 67, 128
saved as text/rich text file ................... 73 by number ............................. 15, 92, 179
groups on dendrogram .................. 57, 6S
selection list .................................. 89, 92 by rectangle on ordination .... 87, 88, 144
null distribution plot ........................... 68

188
Indexes

results window .................................... 68 typclcolour .................. 33, 57, 58, 76. 86 Dissim ..................................... 15, 47, 50
run dircctly .......•..............•••......•.... S7. 67 Syste1n requirement!. .................................. 7 Duplicate .... 15, 8l, 90, 92, 98, l 00, 116
run !n ~lustering ........•..... 57. 6S, 66, 140 ~fergc •.. 15,2i, 100, ICJl, 103, 106, 154
run m hnkage trces •.•.•.......•..•...... S7, 128 1Usslng ................... 15.23, 99, 119, 120
significance level .•..•• 64, 65, 66, 68, J28 T Model Ma!rlx •.•.........•..... 143, 146. 148
similarity profile plot .................... 64, 67 Options ......................... 21, 95, 110, 176
v. ANOSIM test .................................. 64 Rank ................................................. 109
Tab-scparatcd ................... 16. 25, 27, 73, 77
Simpson diversity .......................... 162, J68 TAXDTEST ....See Analyse>TAXDTEST Rank variables .................... i5, 53, 108
Single linkage .................................... 57, 62 Taxon richnl!ss ................................. 50, 166 Stop Tasks .................................... 51, 95
Size constraints (lack of) •.......••••.•. 9. 2S. 27 Taxonomic dissimilarity .............. 11, 36, 46 !um .......•..•..•...••.•.••. 15, 95, 97, 165, 173
Skewed distribution ............................... 163 - ransform .......................... 53, 108, 164
r+ (gamma+, extends B+)............ 50, 5 I
left ..................................................... 111 Transform (individual' ..... 15, 39, 104,
right •.•.•....•..........•......•.••..•• 111. 112, 123 e+ (theta+, extends K+) ..................... 50 105-7, Ill, 112, 175
Slack% in cluster overlays ................ 79, 80 Taxonomic distinctness ..... 16, 161 165-72 Transpose ............................. IS, 27, 104
dialog boxes '
Sobs curve ...................................... 179, 180 Tree .................................................... 99
Someefield el al 1994 ............................... 40
aggregation data ........... 166, 167, 169 Transform (individual) ....... See Tools> •••
Someefield et al 2002 ............................. 146 current level of sample data.166, 167 Transform (overall) .........See Analvse> .. .
Sorensen similarity ...... 48, 49, SO, IS2, I 53 no sample data ............................. l 70 Transforming (individual) ... 39, t 04: I OS-7
Sort ...................................... See Edlt>Sort num random selections ........ 169, 170 all, if no highlights ............................ 104
Soning rows/columns .. 19, 26. 62. 100, 103 S rangclratiolinterval ........... l 70. 172 automatic variable renaming .... 104, 123
Spearman rank correlation 52, 53, 122, 123, second aggregation file ................ 169 before normalising ...............• IS, 39, 118
143, ISO, IS3 .. ste:ps by taxon richness ....•..... 50, 166 expression ............. 104, 105-8. 123, 146
weighted form ..................................... 52 stepsequal.. ...... 50, 51, 166, 167. 169 expression combining
Special ................•.....•. See Graph>Speclal taxon frequency worksheet .......... t 71 variables/samples etc ........... JOS, 106
Species abundance distribution ............• 173 use links fr1.1m/to l~vels ........ 166, J67 worksheets ................... 106, I 07, I 08
Species Accum PJot ......... See Analyse> ••• use sample data ...................... 169-72 help on operators/functions .............. 104
Species accumulation curves ..... J I, J6, 179 user specified weights .. 166, J67, J69
on highlighted data ................... 104, 123
asymptote .......................................... 180 features
to approx linearity ............................. 111
sample order (original/permute)l 79, 180 average v. total ............. 161, 166, 168
to approx symmetry .......... 111, 112, 123
UGE (mean accumulation) ....... 179, 180 independence from S ... 161, 168, 171
Trausforming (overall) ... 14, 38, 43, 49, 51,
Species contributions .............. See SIMPER PIA v. quantitative ....... 161, 166 168
62, 65, 15, 86, 108, 122, 133
indices. '
Species ranks .......................... 174, 176 177 after dispersion weighting .................. 41
Species reduction ............................... 5i, 93 AvTD ................. 36,49, 161, 166-72
ocher TD/PD ......................... 166, 167
choi-;:es ................................ 38, 14~. 150
Species richness ..... J62, 164-72, 173, 174
VarTD ............................ 161, 166-72
'fransforming resemblances ..... 53, I 08, 164
176, 179 • Transpose ................ See Tools>Transposc
Species similarities .................................. 52 path lengths (weights) .........49, 166, 169
Transposing data sheet. .............. 15, 27, 104
Speed gains from vS ................ 9, 27, SI, 90 plots Tree ................................... See Tools>Tree
Square root transform .....38, 43, 62, 81, 86, ellipse ................................... 168 171 Tree display for aggregation file ............. 99
fi •
~nnel ................................... 168, 170
108, 111, 112, 122, 144, 150, JS4, 160 Trend ............................................. 143 146
St211dardising.............................. I 3, 37, 153 histogram ............................. 168 t 69 Triangular matrix ....See also Resembl~nccs
before transforming ............................ 38 testing with master list ................ 11: 161
eorrclation ........................... 43, 108 113
by maximum ............................... 38, 149 aggregation is active sheet. .......... 169
distance ....................................... 45: t 77
bytotal............37,42,86, 149, 153, 164 frequency-based ..................... 11, 171 lo\\·er ............................................ 43, 55
probability limits .................. J 71J, J72
~ta: to w?r~sheet. ....................... 38, 164 rnod~I ................................ J43, 146, 148
. ormahstng ............................. 39, IS3 results window ..................... 169, 170 pairwise ANOSiM R values ............. 131
variables .............................. 52, 149 153 simple random ............... 168--70, J72 rnnks .................................... 43, 109, 122
Staning PRIMER .................................:... 17 Taxonomic level ........ 97, 98, 149, 150, 166
saving & opening ................................ 54
Taxon~mic tree ......... .49, 97, 161, 166, 168
Stats to workshcet. ............................. 14, 38 p values ............................................. 150
dispers~o~ weighting ........................... 41
rolling upiout branches ....................... 99
Typifying species ................... 140. 141, 157
normahs1ng ................................. 39, 113 Taxonomy dialog ............. 51, 166, 167. 169
SIMPROF test ..................................... 67 Tests for structure .................See ANOSIM,
SIMPROF, RELATE, 810-ENV
standardising ................................. 38, 97
Text format input files ................. 25, 26, 34 u
W from ABC curves ......................... 176
Status bar Textual conventions
canouche ............................................. 12 UGE (mean accumulation) curve .. 179, 180
hiding .................................................. 70
cascading menu!> ................................. 12 Ugland et al 2003 .................................. 179
row & column of cursor................ 70, 98
check box ........................................... t 2 Unbalanced designs ................... 23, 37, 129
Stebbing & Dethlefsen 1993 .................... 53
dialog box ........................................... 12 Undefined entries
Stepwise selection ofvariables ...... 121, 156 in factors ....................................... 95, 96
radio button ......................................... t 2
Stop button ........................... JO, St, 65, 125 in resemblance matrices ............. 99, 100
Stop Tasks .............. See Tools>Stop Tasks Tied ranks .......................... 52, 77, I08, J09
Tile ................................. See Window>Tile with ANOSIM .................................... 99
Stress value with CLt.:STER .................................. 99
Tiling windows ..... 38, 49, 69, ~2. 117, 175,
% contributions to ......................... 10, 76 with MOS .................................... 99, I 00
3-d v. 2-d ....................................... 85, 86
176
Timing bar (on status bar) ....................... 51 Undo (not available) .............. 19. 30, 71, 95
construction ................................... 76, 77 Uninstall ..................................................... 7
Title ........................ See also Graph Options
decimal places ..................................... 77
check box ............................................ 22 Unique
lo~~st in restarts ..................... 76, 77, 86
data sheet. ...................................... 22, 41 item name ............................................ 24
m1n1mum box ................................ 77, 81 labels ..................... 13, 18, 102, 103, 145
Structural redundancy .................... 157, J59 forward/back propagation ................... 43
removing ............................................. 60 Unix ........................................................... 9
Sum .................................... Sec Tools>Sum Unpeeling triangular matrices ................. 54
resemblance ........................................ 43
Summary
analysis routines .................................... 8 sizes & fonts.................................. 14, 59
Tool bar
application areas ................................... 8
example data sets .............................. 181
hiding .................................................. 70 v
icons ............63. 78, 85, 87, 88, 112, 11 7
Summ~ng samples .............. 15, 95, I 6S, 173
Tools v. Edit menus ................................. 95 Valesini et al 2003 ................................. 156
Summing variables .................................. 97
Tools> ............................ 15, 70, 95, 96-110 Varianceexplained ........ 113, 114, 117, 163
Symbols .................. See also Graph Options
Aggregatc ....................... 15, 36, 97, 150 Vector plot (PCA) ............ 10, 113, 114, 116
centring with/without labels ... 79, 82, 84
Avcragc ...... 15, 32. 95, 96, 97, 154, 165 View> ...................................................... 70
on dendrograms ............................. I0, 58
Check .............. 15, 98. 99, JOO, 119, 166 Vi11ual Windo,vs ........................................ 9
size ..........................................58, 76, 81

189
Indexes

WHks' lambda .............. :~ ........................ 133 planning .............................................. 74


Window minimising ................................ 70 rcccnt. .................................................. 31 ~.:;:~
Window rcsizing .................... 22, 32, 49, 77 saving .......................... 1S, 20, 24, 71, 74
]J~·
W statistic....................................... 176 177
Wald (chi-squared) coefficient ............:... SS Window>
Warning message examples Cascade .............................................. 69 ~.

(some) labels not matchcd .. 97, 107, 145 Close All Wlndows .... 14, 29, 49, 69, 82 z i.t.~
before delete ........................................ 71 TUe Horizontal ............................ 69, 82
duplicate labels ................................... 99 Tiie Vertical ..... 38, 49, 69, 82, 175 176 Zero fill in arrays ............................... 23, 27 ~~
Windows™ •
file already exists ................................ 20 Zero-adjusted Bray-Curtis ......... 44, 99, 178
no transfonn applicd .....................43, 45
primary data not abw1dancc .............. t 76
clipboard ........................... 19, 72 73 77
Explorer ......................................:.....:. 17
Zoom Jn/Out .... See Graph>Zoom ln/Out
Zooming .
~
'""'
... · }
mctafite (.cm!) .......................... 9, 14 72 \..~l~
to save workspace ......................... 20, 32 by rcctanglc ...... 63, 87, 88, 96, 112 164
Wanvick & Clarke 2001 ................ 161, 172 ycrsion .................... ~ ..........................:... 7 174 ' '
~-
_,~
Wanvicket al 1986 ................................ 114 Worksheet ..... See Data/resemblance matrix changed aspect ratio ............. I0, 63, 174
Workspace ........................................... 9 17 .;-~~
Warwick et al 1990 .......................... 24, 178 I . , dendrogram ................................... lo, 63 ....,, .
Water-quality metrics ............................ 1 lS c osang ..................................... 15, 20, 24
draftsman plot ................................... 112
managing with Explorer ......... 14, 69-74
Web site ..................................................... 7
Weighted Spcannan ......................... S2, 122 multiplc ....................... 17, 31,72, 74, 90
new ...................................................... 20
clli~sc plot......................................... 172
ordmataon ............................ 87, 117 ISi ....,.t...,
-'.i~
Weighting variables ................... 14, 42, 149 return to pointer ............................ 6J, 87
Weights ................................. See >Weights no dynamic links to external files ....... 72 scroll bars ........................... ,........ 63, 112 ~-.
Whittaker index of association ................ 47 opening.............................. IS, 20, 31, 74 Zooplankton ......................... :......._.... 65, 140 ~-t~

~~·~
~

...'~~
~
~r~

~\-,

--LAll)
-,;~

~?~

-·~
..
;;..f(~

~~
..""'~~
..,;~

~--~

'!Ii~~
,."i~
~ .
ll~
~·;

~:~ \

~~
190 ~~
~·.

~w,~
~

7/

También podría gustarte