Documentos de Académico
Documentos de Profesional
Documentos de Cultura
345
Polyphyletic
Paraphyletic
Monophyletic
(holophyletic)
Some basics
Terminology
A phylogenetic tree is composed of branches (edges) and
nodes. Branches connect nodes; a node is the point at
which two (or more) branches diverge. Branches and nodes
can be internal or external (terminal). An internal node
corresponds to the hypothetical last common ancestor
(LCA) of everything arising from it. Terminal nodes
correspond to the sequences from which the tree was
derived (also referred to as operational taxonomic units or
OTUs). Trees can be made up of multigene families (gene
Corresponding author: Sandra L. Baldauf (slb14@york.ac.uk).
TRENDS in Genetics
http://tigs.trends.com 0168-9525/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S0168-9525(03)00112-4
Review
346
(b)
(c)
(a)
r
r
ste
Oy
Ca
via
Lobste
Coffee
Chocolate
Caviar
Oyster
Lobster
Truffle
ffle
Tru
i
Nor
Chocola
te
Coffee
Roots
At the base of a phylogenetic tree is its root. This is the
oldest point in the tree, and it, in turn, implies the order
of branching in the rest of the tree; that is, who shares a
more recent common ancestor with whom. The only way to
root a tree is with an outgroup, an external point of
reference. An outgroup is anything that is not a natural
member of the group of interest (i.e. the ingroup). This
might not seem like a difficult concept, but do not be
misled. The excluded member of a monophyletic group
(i.e. the exclusion that makes it paraphyletic, Fig. 1) is not
an outgroup (just an outcast); for example, humans are not
an outgroup to animals. In the absence of an outgroup, the
best guess is to place the root in the middle of the tree
(at its midpoint), or, better yet, not root it at all (Fig. 2f).
Alternatively you can use extrinsic, more traditional
taxonomic information, such as the fossil record in the
case of species trees. This is obviously more difficult with
gene trees.
Nori
Oyster
r
via
fle
Ca
uf
Tr
(e)
(d)
Lobster
Chocola
te
Coffee
Nori
Oyster
via
Ca
(f)
(g)
Tru
f
Coffee
Chocolate
Truffle
Caviar
Oyster
Lobster
Nori
Lobster
fle
Chocola
te
Coffee
Coffee
Chocolate
Truffle
Caviar
Oyster
Lobster
No
ri
TRENDS in Genetics
Homology
Evolution is about homology; that is, the similarity due to
common ancestry. Homologues can be orthologues or
paralogues (Fig. 3). Orthologues only duplicate when
their host divides; i.e. along with the rest of the genome
(Fig. 3a). They are strictly vertically transmitted (parent
to offspring), so their phylogeny traces that of their host
lineage (Fig. 3b). Paralogues are members of multigene
families; they arise by gene duplication (Fig. 3a). If you try
to infer species relationships with paralogues you can run
into trouble; if some of the copies are missing, you can be
very convincingly misled (Fig. 3c). However, if you have all
copies of two paralogues in your tree, then you are fine.
Better still, you have two mirror phylogenies (Fig. 3b). In
this case, paralogues can serve as each others natural
Fig. 2. Phylogenetic tree styles. All these trees have identical branching patterns.
The only differences are (f), which is unrooted. (g) is a cladogram, so the branch
lengths are right justified and not drawn to scale (i.e. they are not proportional to
estimated evolutionary difference).
(a)
(b)
geneX
A
Gene duplication:
geneX
A
Chimp
geneX
Gorilla
Speciation:
geneX
A
geneX
Species A
(c)
geneX
geneX
A
geneX
geneX
Human
geneX
Human
Chimp
Gorilla
Chimp
Gorilla
Human geneX
Human
Chimp
Gorilla
Species B
TRENDS in Genetics
Fig. 3. The problem with paralogues. (a) Paralogous genes are created by gene duplication events. Gene X is duplicated in a common ancestor to species A and B resulting
in two paralogous genes, X and X0 . All subsequent species inherit both copies of the gene (unless one or the other is lost somewhere along the way). (b) Phylogenetic analysis of the X/X0 gene family gives two parallel phylogenies. All sequences of gene X are orthologues of each other, and all the sequences of gene X0 are orthologues of each
other. However, X and X0 are paralogues. Both the X and X0 subtrees show the true relationships among the three species. The subtrees are also each others natural outgroup, and as a result each subtree is rooted with the other (reciprocally rooting). (c) A tree of the X/X0 gene family can be misleading if not all the sequences are included
(because of incomplete sampling or gene loss). If the broken branches are missing, then the true species relationships are misrepresented.
http://tigs.trends.com
Review
outgroup. This was the method used to infer the root of the
universal tree of life [3 5].
Step 1. Assembling a dataset
The first step in constructing a tree is building the dataset.
For most of us, this means finding and retrieving
sequences from the public domain. The main repository
for these data is the public nucleotide database (Box 1),
stored independent in the USA (GenBank), EU (EMBL)
and Japan (DDBJ). Primary entries are redundant among
them, and they are updated against each other nightly.
Some of the most exciting molecular evolutionary data are
coming from genome sequencing projects (Box 1). Much of
this data, both in-progress and completed, is deposited in
the public database, with some in-progress data partitioned
off separately. Other genome project data are available
only from their own websites; for example, The Institute
for Genomic Research (TIGR, Box 1) and the Joint Genome
Research Institute (DOE, Box 1). Comprehensive lists and
Genomes
TIGR: http://www.tigr.org/tdb/mdb/mdbcomplete.html
JGI: http://www.jgi.doe.gov/JGI_microbial/html/index.html
Sanger: http://www.sanger.ac.uk/Projects/Microbes/
NCBI: http://ncbi.nlm.nih.gov/Genomes/index.html
Similarity
BLAST: http://www.ncbi.nlm.nih.gov/BLAST/
Phylogenetic analysis
PAUPp: http://paup.csit.fsu.edu/index.html (tutorial can be found
at http://paup.csit.fsu.edu/Quick_start_v1.pdf)
Mega2: http://www.megasoftware.net/
PHYLIP: http://evolution.genetics.washington.edu/phylip.html
Treeview: http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
List: http://evolution.genetics.washington.edu/phylip/software.html
http://tigs.trends.com
347
Review
348
(a)
taxon
Fu
Fu
Ap
An
An
An
An
Fu
Pl
My
Rh
Kt
Kt
Nosema.40928
Aspergillus.
Plasmodium.3
Cricetulus.2
Homo.7434727
Drosophila.9
Celegans.133
Spombe.54881
Athaliana.40
Ddiscoideum.
Porphyra.316
Tbrucei.1021
Leishmania.7
....|....10...|....20...|....30...|....40...|....50
QFGLFSPEEIRASSVALIR--YPETLENG--VPKESGLVCAGHFGHIELVK
QFGLFSPEEIKRMSVVHVE--YPETMDEQRQRPRTKGLECPGHFGHIELAT
ELGVLDPEIIKKISVCEIV--NVDIYKDG--FPREGGLYCPGHFGHIELAK
QFGVLSPDELKRMSVTEGGIKYPETTE--GGRPKLGGLECPGHFGHIELAK
QFGVLSPDELKRMSVTEGGIKYPETTE--GGRPKLGGLECPGHFGHIELAK
QFGILSPDEIRRMSVTEGGVQFAETME--GGRPKLGGLECPGHFGHIDLAK
QFGILGPEEIKRMSVAH--VEFPEVYE--NGKPKLGGLDCPGHFGHLELAK
QFGILSPEEIRSMSVAK--IEFPETMDESGQRPRVGGLDCPGHFGHIELAK
QFGILSPDEIRQMSVIH----VEHSETTEKGKPKVGGLECPGHFGYLELAK
--------------------------------------ECPGHFGHIELAK
--------------------------------------ECPGHFGFIELAK
QFEIFKERQIKSYAVCLVEHAKSYANA----ADQSGEAECPGHFGYIELAE
QFEVFKEAQIKAYAKCIIEHAKSYEHG----QPVRGGIECPGHFGYVELAE
Nosema.40928
Aspergillus.
Spombe.54881
Plasmodium.3
Cricetulus.2
Homo.7434727
Drosophila.9
Celegans.133
Athaliana.40
Ddiscoideum.
Porphyra.316
Tbrucei.1021
Leishmania.7
....|....10...|....20...|....30...|....40...|....50
QFGLFSPEEIRASSVAL--IRYPETLE--NGVPKESGLVCAGHFGHIELVK
QFGLFSPEEIKRMSVVH--VEYPETMDEQRQRPRTKGLECPGHFGHIELAT
QFGILSPEEIRSMSVAK--IEFPETMDESGQRPRVGGLDCPGHFGHIELAK
ELGVLDPEIIKKISVCE--IVNVDIYK--DGFPREGGLYCPGHFGHIELAK
QFGVLSPDELKRMSVTEGGIKYPETTE--GGRPKLGGLECPGHFGHIELAK
QFGVLSPDELKRMSVTEGGIKYPETTE--GGRPKLGGLECPGHFGHIELAK
QFGILSPDEIRRMSVTEGGVQFAETME--GGRPKLGGLECPGHFGHIDLAK
QFGILGPEEIKRMSVAH--VEFPEVYE--NGKPKLGGLDCPGHFGHLELAK
QFGILSPDEIRQMSVIH--VEHSETTE--KGKPKVGGLECPGHFGYLELAK
--------------------------------------ECPGHFGHIELAK
--------------------------------------ECPGHFGFIELAK
QFEIFKERQIKSYAVCL--VEHAKSYA--NAADQSGEAECPGHFGYIELAE
QFEVFKEAQIKAYAKCI--IEHAKSY--EHGQPVRGGIECPGHFGYVELAE
(b)
taxon
Step 1
A+B
E+F
Step 2
AB +
Step 3
ABC
Step 4
ABCD
I + J
EF + G
D
IJ + K
EFG + H
+
EFGH
Fu
Fu
Fu
Ap
An
An
An
An
Pl
My
Rh
Kt
Kt
TRENDS in Genetics
Step 5
ABCDEFGH
IJK
TRENDS in Genetics
Fig. 4. Steps in progressive sequence alignment. (a) The first step is to calculate
the guide tree. (b) This determines the order in which sequences are added to the
growing alignment.
Fig. 5. Refining an alignment. (a) The raw output from a ClustalX alignment of
rpb1 sequences, which predicts six insertion/deletion events (boxed), some of
which are blatantly inconsistent with known taxonomy. (b) The refined alignment
makes much better evolutionary sense, because it shows only two insertion events
in well-defined taxonomic groups (animals and higher fungi). Taxon labels are Fu
(fungi), An (animals), Pl (green plant), Ap (apicomplexan), Rh (rhodophyte), My
(mycetozoan), Kt (kinetoplastids). In (b), the sequence from Saccharomyces
pombe has been placed adjacent to the other fungi to make these relationships
more obvious.
Review
349
Review
350
Dataset
0123456789
ACCGTTCGGT
ATGGTTCAGA
ATCGATCGGA
seqA
seqB
seqC
Replicate 1
(a) Step 1
Assemble pseudodatasets, repeat
1000 times
(b) Step 2
Build trees for each
pseudo-dataset
to give 1000 trees
seqA
seqB
seqC
1562314951
CTCCGCTTTC
TTCGGTTATT
TTCCGTAATT
Tree 1
seqA
seqB
seqC
(c) Step 3
Tabulate results
(strict consensus tree)
Replicate 2
seqA
seqB
seqC
5234924418
TCGTTCTTCG
TGGTAGTTTG
TCGAACAATG
Tree 2
67%
seqA
seqC
seqB
Replicate 3
seqA
seqB
seqC
Tree 3
5607718907
TCAGGCGTAG
TCAAATGAAA
TCAGGTGAAG
etc
seqA
seqB
seqC
etc
seqA
seqB
seqC
Fig. 6. Bootstrap analysis proceeds in three steps. The dataset is randomly sampled with replacement to create multiple pseudo-datasets of the same size as the original
((a), three are shown in this example). (b) Individual trees are constructed from each of the pseudo-datasets. (c) Each of the pseudo-dataset trees are scored for which
nodes (groupings) appear and how often. In this case, a node uniting seqA plus seqB is found in two of the three replicate trees. This gives a bootstrap support for this
grouping of 2/3 or 67%.
http://tigs.trends.com
Review
351