Documentos de Académico
Documentos de Profesional
Documentos de Cultura
markers
within and among populations (WAAP analysis), meaning
of AE
and tips on software-based analyses.
Return to Main Index page
We have looked at the derivations for a number of population genetic parameters (variance-
based and distance measures of population structure) and their strengths and weaknesses in
the face of various complexities of natural populations (e.g., small and fluctuating population
size, variation in the breeding sex ratio). We will now focus on the practicalities of assessing
the genetic structure within and among populations -- what measures are essential for any sort
of reasonably comprehensive assessment of genetic structure, what programs are available for
computing those measures and how do we organize data for analysis?
Here are some of the essential components (adapted from a checklist developed by Jim
Hamrick at U. GA):
Mean P, A, AE, H. The values in Part I are calculated with all the samples
considered to constitute a single group.
These ones are calculated population by population, then averaged over the set
of populations.
Differences among populations in the above. Does one or more populations
have unusually high or low values for any of the above?
Deviations from Hardy-Weinberg expectations (per locus and population)
Assessment of linkage disequilibrium
Estimates of Ne, effective population size (4*Ne*m)
Relatedness or allele-sharing matrices
Eqn 1
where Dj is the gene diversity of the jth of r loci. Note that we calculate the OTU-level AE by
averaging over the AE calculated locus-by-locus rather than by calculating a mean gene
diversity and then calculating AE from that. The graph below shows why: AE is a nonlinear
function of the gene diversity (Hexp), which brings into play Jensen’s inequality [the
expectation of a function ≠ the function of the expectations for nonlinear curves; see Ruel and
Ayres (1999)]. Here, because the curve is concave up, the AE we compute will be greater than
if we calculated it from the overall gene diversity.
Fig. 1. Effective number of alleles, AE, as a function of the gene diversity (D or Hexp). The
nonlinear relationship brings into play Jensen’s inequality. Note that most of the "action"
happens for D in the range 0.5 to 0.9 (AE goes from 2 to 10).
The meaning of AE. Say we have two populations (or species, or whatever our OTU
is) with the same number of total alleles, but with very different distributions of allele
frequencies. We would like to be able to assess the effective number of alleles as a corollary
to the expected heterozygosity. Remember that, for any given number of alleles, the expected
heterozygosity (gene diversity) is highest when the all the allele frequencies are equal (look at
Fig. 5.1 in the web notes). Simply reverse the logic. When the heterozygosity is high (the
peak of the curve in Fig. 5.1) we will have the highest effective number of alleles. For a
heterozygosity of 0.85 we will have, effectively, 6.7 alleles {formula is AE= 1/(1-Hexp)}. If a
locus has 8 total alleles (meaning a maximum possible Hexp of 0.875), but the Hexp is only 0.6,
the effective number of alleles will be only 2.5. This tells us that we have a set of alleles with
very different frequencies. Alleles with frequencies away from the even “average” contribute
very little to the effective number of alleles. When will the effective number of alleles be the
same as the actual number of alleles? At the maximum gene diversity (peak of the curve).
When will it be at a minimum (near 1)? When one allele (the only real contributor to the
effective allele number) dominates the allele frequencies and all the others are very rare.
Imagine that one OTU has 10 total alleles, another just 4; they could have the same effective
number of alleles, if the allele frequencies are very unbalanced in the first case and much
more balanced in the second case. Because of the reciprocal nature of the formula, if the
OTUs have the same AE, they will have the same Hexp. That is, if AE1 = AE2, then Hexp1 =
Hexp2.
Required readings: Avise text, pp. 248-257, Gillespie book Chapters 1, 2 and 5.
Population genetics is the study of Mendel’s laws, the Hardy-Weinberg principle and other
genetic principles as they apply to entire populations of organisms. Population genetics
describes genetic variation in populations, and determines, by observation, experiment and
theory, how that variation changes over time and space. In other words, how much variation
exists in natural populations, and how can we explain variation in terms of origin,
maintenance, and evolutionary importance?
Hardy-Weinberg Principle
No Non-random Mating
Infinite population size (= No Genetic Drift)
No Mutation
No Genetic Migration (permanent movement of alleles from one
population to another, usually by dispersal of individuals)
No Natural Selection (plus sexual selection)
{Three additional assumptions are that the organisms are diploid, reproduce sexually and
have non-overlapping generations}.
Allele frequencies:
p + q = 1 Eqn 3.1
With the five assumptions given above, one can calculate the genotype frequencies for a gene
with two alleles (A and a). The frequency of homozygous genotype AA is the probability of
one allele A being in combination with another allele A. The expected frequency is simply the
product of the separate allele frequencies. We will use the term p to refer to the frequency of
allele A:
The values we have just calculated are EXPECTED genotype frequencies IF the Hardy-
Weinberg assumptions hold. We now turn to how we could check that from actual
OBSERVED genotypic data (such as the microsatellite data for Wyoming black bears). In
order to calculate allele frequencies all we need are the observed genotype frequencies. [No
assumptions needed about the five forces, but what statistical requirement.assumption do we
need to have in place?]
p = p2 + (2pq/2) and q = q2 + (2pq/2) Eqn 3.7
Let's look at an example from the beginning. We will examine a population of trout with a di-
repeat microsatellite marker that has two alleles, 120 and 122. For simplicity, let’s call allele
120 A and allele 122 a. We genotype 100 individuals and find genotype frequencies of AA =
0.25, Aa = 0.5, and aa = 0.25 (check that when summed these genotype frequencies add to
one). We ask the question of whether this population is in Hardy-Weinberg equilibrium. We
first need to calculate the p and q (allele frequencies of A and a; note that the A and a are
names for the alleles themselves, the p and q refer to the frequencies of those alleles). We
calculate the frequencies using Eqns 3.6.
p = p2 + (2pq/2) = 0.25 + (0.5/2) = 0.5
We see that the allele frequencies sum to one, as required by Eqn 3.1. Using the allele
frequencies, we then calculate the expected genotype frequencies using Eqns 3.3, 3.4, and
3.5.
AA = p2 = 0.5 * 0.5 = 0.25
The expected genotype frequencies are same as the observed genotype frequencies (from the
microsatellite data). This tells us that our population is in Hardy-Weinberg equilibrium. If the
expected genotype frequencies calculated from the allele frequencies were not the same as
the observed genotype frequencies our population would not be in Hardy-Weinberg
equilibrium -- we assess whether the difference is statistically significant using a chi-square
test, as we will see shortly. [Note that statistical significance is not a guarantee of biological
significance].
The expected frequency distribution of genotypes AA, Aa, and aa in proportions p2, 2pq and
q2 respectively is called the Hardy-Weinberg equilibrium. If the population meets the eight
assumptions listed above, then the population will go to the Hardy-Weinberg equilibrium in
the first generation, and remain there. Again, the Hardy-Weinberg principle and its predicted
equilibrium, is a simple model that serves as a starting point for examining the genetic
structure of populations.
How likely are we to meet the major assumptions of random mating, no drift, no mutation, no
migration, and no natural selection? If we violate the assumptions, how much difference does
it make? Here is a list of processes that violate the Hardy-Weinberg assumptions and some
discussion of each of them. These "big five" forces are the major engines of evolutionary
change. An important point is whether the given force tends to increase or decrease the
genetic variability in populations.
• Non-random mating (tends to reduce genetic variation)
Random mating means that alleles (as carried by the gametes -- eggs or sperm) come together
strictly in proportion to their frequencies in the population as a whole. Example: if p = 0.6
and q = 0.4, then the probability of an Aa heterozygote is 0.48 (the product of the allele
frequencies, plus consideration of the fact that two ways exist to make a heterozygote; see
Fig. 3.1). Situations where the random mating assumption does not hold include:
Often, the impact of a moderate amount of non-random mating has a negligible impact on our
conclusions about the patterns and causes of genetic variation.
The effect of random genetic drift is inversely proportional to population size. Allele
frequencies change because the genes appearing in offspring are not a perfectly representative
sampling of the parental genes (in a finite population). Since drift is a random process,
outcomes of drift must be stated as probabilities. Drift removes genetic variation from the
population at a rate inversely proportional to population size. As population size decreases
the force of drift increases, and vice versa. Drift also affects the probability of survival of new
mutations. The probability that an allele will move to fixation is equal to its frequency in the
population -- an allele with a frequency of 0.2 (20%) has a 20% chance of fixation. New
alleles introduced by mutation almost inevitably begin at low frequencies and have a low
probability of fixation. Drift can lead to the loss of rare alleles and the fixation of common
alleles. If the population is large, however, drift has little effect.
Marble analogy: Think of a jar containing a million marbles in ten different colors. If we
draw a random sample of 500,000 it will almost certainly contain all the marbles in
proportions very similar to the original proportions. If we pick only five marbles, however,
we will definitely have a biased sample (we can’t have picked more than 5 of the 10 alleles --
any duplicates and we'll have even fewer alleles). Even if we take a sample of 50, we will be
unlikely to maintain the proportions of the original million -- the small sample prevents us
from drawing a representative array. Similarly, drift is inversely proportional to population
size -- large population = minor drift, small population = major drift.
Drift can have major effects on endangered (small, almost by definition) populations. For
other species it can take a long time (thousands, hundreds of thousands or even millions of
years) for drift to have large effects.
Fig. 3.2. Computer simulation of genetic drift. The fate of the A1 allele (with
frequency p, on the Y-axis) is shown in five replicate populations for a time
course of 100 generations (time on the X-axis). Note that if p drops to 0 or
rises to1.0 then A1 will be lost (0) or reach fixation (1.0). Those frequencies (0
and 1.0) are therefore called "absorbing boundaries". Notice also the jagged
trajectories that often characterize random processes.
Selection is the differential survival and reproduction of phenotypes that are better suited to
the environment or to obtaining mating success. Selection is the evolutionary force
responsible for adaptation to the environment. Selection generally removes genetic variation
from the population (occasionally special circumstance such as "frequency-dependent" or
"balancing" selection can serve as forces maintaining variation). Alleles that confer
advantages in survival or reproduction will tend to be represented in greater proportion in the
next generation. After numerous generations (the time required will depend on the intensity
of selection and the heritability of the trait), the advantageous allele will tend to spread to
fixation. It is sometimes useful (and almost always interesting) to distinguish, as Darwin did,
between natural and sexual selection.
If drift and natural selection tend to reduce genetic variation, what maintains or increases it?
-- Mutation.
• Mutation (increases genetic variation and introduces novel variants)
Mutation is the process that produces a gene or chromosome set differing from the wild-type
(ancestral allele). Mutation restores genetic variation to a population by producing novel
alleles. Mutation is difficult to measure or observe directly, and rates of mutation can vary
between loci. It is usually a weak force and therefore tends not to pull populations very far
from Hardy-Weinberg equilibrium -- over long enough time periods, though, even a weak
force can have major effects (e.g., the erosion of the Grand Canyon). Much of the neutral
theory of genetic variation is based on a calculation of the balance between drift and mutation
as forces of change.
Genetic migration is the permanent movement of genes from one population into another.
Migration can restore genetic variation into isolated and differentiated populations or reduce
variation among populations when it occurs frequently. Assessing the patterns and
importance of genetic migration (often referred to as "gene flow") is one of the major aims of
population genetics. [Note that this definition of migration is very different from that for the
seasonal back and forth movement of birds, for example, from breeding grounds in the
temperate zone to non-breeding grounds in the tropics. Migration, in that sense may have
little effect on permanent movement of alleles].
Much of population genetics involves manipulations of equations that have a base in either
probability theory or combination theory. We saw combination theory in action when we
used the formula for the number of distinct unrooted trees as a function of the number of
OTUs. The basic Hardy-Weinberg equation p2 + 2 pq + q2 is a probabilistic one (with the
addition that since order is unimportant we account for two ways to get heterozygotes).
Rule 1: If you account for all possible events, the probabilities sum to 1. [e.g., p + q = 1 for a
two-allele system].
Rule 2: The probability that two independent events occur is the product of their individual
probabilities.
[e.g., probability of a homozygote is q*q = q2].
Punch line: Genetic techniques examine individual variation to discern the emergent
properties of populations and higher taxa. We can examine genetic variation at multiple
scales -- from the level of the individual (e.g., forensics applications) to analysis of higher
taxa in systematic and taxonomic studies. Population genetics integrates a broad spectrum of
process and pattern -- geneticists simplify by including only essential forces in their models
and by making simplifying assumptions that, if violated, do not change the qualitative
conclusions. A traditional first step is to build from the Hardy-Weinberg principle -- despite
its admittedly unrealistic assumptions of random mating, no drift, no mutation, no migration,
and no natural selection. In situations where one or more of these assumptions is clearly
violated in a major way, a variety of more complex models can then be brought to bear on the
problem.
Lecture 4. Population Genetics II
Heterozygosity, HExp (or gene diversity, D)
Go to web page describing how to calculate FST from heterozygosities.
Several measures of heterozygosity exist. The value of these measures will range from zero
(no heterozygosity) to nearly 1.0 (for a system with a large number of equally frequent
alleles). We will focus primarily on expected heterozygosity (HE, or gene diversity, D, as
Bruce Weir prefers to call it). The simplest way to calculate it for a single locus is as:
Eqn 4.1
where pi is the frequency of the ith of k alleles. [Note that p1, p2, p3 etc. may correspond to
what you would normally think of as p, q, r, s etc.]. If we want the gene diversity over several
loci, we need double summation and subscripting as follows:
Eqn 4.2
where the first summation is for the lth ("ellth") of m loci. [Note that we average over the m
loci via the 1/m term]. The second summation is as in Eqn 4.1.
Why does it work to take the sum of the squared gene frequencies and subtract that from one?
Let’s think back to basic Hardy-Weinberg:
p2 + 2 pq + q2 = 1 Eqn 4.3
where the heterozygosity is given by 2pq. The rest of the expression (p2 + q2) is the
homozygosity. If we want the heterozygosity, we just subtract that from the total. With just
two alleles it isn't as efficient to calculate the heterozygosity by the "one minus the
homozygosity route". Consider the case, though, of a locus with 6 alleles. It has 21 possible
genotypes -- 6 kinds of homozygotes and 15 kinds of heterozygotes. Writing it out, 6 + 5 + 4
+ 3 + 2 + 1 = 21 = [6*(6+1)]/2 -- this is the formula for combinations of six things taken two
at a time, order unimportant -- [n(n+1)] / 2. The more alleles, the simpler it becomes simply
to square the gene frequencies and sum then, compared to enumerating all possible
heterozygotes and calculating the (possibly very many) different heterozygote frequencies.
We trade a little inefficiency on two-allele systems for much greater efficiency with multi-
allele systems.
What does heterozygosity tell us, and what patterns emerge as we go to multi-allelic systems?
Let’s take an example. Say p = q = 0.5. The heterozygosity for a two-allele system is
described by a concave down parabola that starts at zero (when p = 0) goes to a maximum at
p = 0.5 and goes back to zero when p = 1. In fact, for any multi-allelic system, heterozygosity
is greatest when
Here is a way that I like to think of heterozygosity (HE or D). It is the (expected) probability
that an individual will be heterozygous at a given locus (or over the assayed loci for a multi-
locus system). For many human microsatellite loci, for example, HE is often > 0.85, meaning
that you have a > 85% chance of being a heterozygote.
Now that you have a way to calculate gene diversity/expected heterozygosity, you are ready
to calculate F-statistics by the method of:
If you run some data through Eqns 4.5 and an analysis program you may ask:
Mutation process: Microsatellites are useful genetic markers because they tend to be
highly polymorphic. It is not uncommon to have human microsatellites with 20 or
more alleles and heterozygosities (Hexp = gene diversity, D) of > 0.85. Why are they
so variable? The reason seems to be that their mutations occur in a fashion very
different from that of "classical" point mutations (where a substitution of one
nucleotide to another occurs, such as a G substituting for a C). The mutation process
in microsatellites occurs through what is known as slippage replication. If we envision
the repeat units (e.g., an AC dinucleotide repeat) as beads on a chain, we can imagine
that during replication two strands could slip relative positions a bit, but still manage
to get the zipper going down the beads. One strand or the other could then be
lengthened or shortened by addition or excision of nucleotides. The result will be a
novel "mutation" that comprises a repeat unit that is one bead longer or shorter than
the original. The idea that adding or subtracting one repeat is likely easier than adding
or subtracting two or more beads is the basis for using the Stepwise Mutation Model
(SMM) as opposed to the Infinite Alleles Model (IAM). An advantage of the SMM
(at least in theory) is that the difference in size then conveys additional information
about the phylogeny of alleles. Under the IAM the only two states are "same" and
"different". Under the SMM we have a potential continuum of different similarities
(same size, similar in size, very different in size). If, however, the SMM does not
hold, then we may be worse off using it -- it may actually be highly misleading. Even
if the underlying mutation process is largely stepwise, it is not difficult to see how
drift might affect the distribution of allele sizes in a way that would almost entirely
invalidate the SMM (visualize this by examining Figs. 6.1 and 6.2 in Lecture 6).
1) Extract DNA from tissue (wide variety of possible methods depending upon tissue
type)
2) Fragment the genome. Cut our genomic DNA into suitable size fragments with
restriction enzymes. Generally, restriction enzymes that produce mean fragment
sizes in the range of 300-600 bp are the desired goal.
3) Insert. Insert the fragments into plasmids. This step allows cloning of the
fragments -- producing many copies of the 300-600 bp pieces we have inserted in the
plasmids. To get a slightly more detailed idea of how plasmids act as cloning vectors,
look up the boldface terms in the glossary of terms page. PUC19 is a commonly used
plasmid for this sort of analysis. Why PUC19? The restriction sites in PUC19 are
known (so that the ligated DNA fragments can later be cut out) and it replicates well
in a bacterial culture.
6) Culture the positive clones (the plasmid-fragments that bonded with the oligo
probes).
7) Cut the insert out of the plasmids with restriction enzymes and run them out on an
agarose gel.
8) Probe. Use Southern transfers to probe the digest again with labeled oligos. This
serves:
10) Select. Analyze the sequence to check for "good" primer sites and useful repeat
length (generally at least 8 repeats and it is often best to have more -- depending upon
our intended application we may want long pure repeats or we may be interested in
shorter interrupted repeats, which may have lower mutation rates). Criteria that enter
into primer selection include:
11) Order the locus-specific primers (generally these will be 20-30 bp sections of the
flanking regions not immediately adjacent to the repeat unit).
From beginning of forward primer to end of reverse primer, the above is 131 bp
Repeat is CA11
The repeat unit is highlighted in red, while the forward and reverse primers are
highlighted in blue and green. We would send out an order for the primer sequences
(in our case we add an additional 19 bp M13 tail, which allows us to attach
fluorescent nucleotides/dNTPs to our amplified product in the PCR). A laser in our
sequencer/automated genotyper then detects the fluorescence, which is how we
visualize the bands that constitute the allelic data we hope to gather and analyze.
Strassmann et al. (1996) has a more detailed run-through of much of this section.
1) Extract the DNA. One often begins by somehow breaking up the tissue (e.g., by
grinding in liquid nitrogen). Alternatives for the extraction process include classic
phenol-chloroform extractions, salt-based extractions, and a variety of commercial
kits. We are getting rid of proteins and other non-DNA tissue components in this step.
A typical analysis might include extracting DNA from each of the individuals in a
local population of 30 individuals.
2) Amplify. We add a very small amount of each of our 30 samples of extracted DNA
to a PCR cocktail for amplification in a thermocycler. This is a "magic" step that has
revolutionized molecular biology. We start with almost no DNA and wind up with
enough that we can see it on a gel! Various "cocktail" recipes exist -- they typically
contain the thermophilic bacterial enzyme Taq polymerase (essential), the dNTP mix
(nucleotides that will allow massive replication of our target DNA), magnesium
chloride, and the fluorescently labeled dNTPs (these will bind to the specially added
M13 or T3 tail and light up under the laser and make bands of DNA alleles show up
on the gel).
4) Run the sequencer. We run the amplified product through the sequencer until all
the alleles have had time to run by the laser, which illuminates the fluorescent
nucleotides and makes bands light up on the gel (or go digital-direct to the computer).
The sequencer generates both an analog image (for older, gel-based sequencers) and
digitally stored data concerning the size of the fragments.
5) Optimize (variations on Steps 2-4). It often takes considerable fiddling to get the
PCR conditions right for a particular combination of primer, DNA, thermocycler and
sequencer. Major variables in optimization include:
temperature (the primer sequence will have a predicted melting temperature but what
actually works may be higher or lower),
the PCR-programmed times for denaturing, annealing and extending steps
magnesium chloride concentrations
E. How do we analyze the allelic information? For a slightly more detailed description
go to the Genetic analysis page.
You can also download my Word document on Web Genetic software. Luikart and England
(1999) provides an (older) overview of approaches. For use of alternative markers see papers
(mostly from TREE) by Sunnucks (2000), Mueller and Wolfenbarger (1999; AFLP),
Campbell et al. (2003; AFLP) and Brumfield et al. (2003; SNPs - single nucleotide
polymorphisms).
The table entries are Cavalli-Sforza chord distances (Cavalli-Sforza and Edwards,
1967; described on pp. 163-166 of Weir, 1996) between five jay populations. For
example, the "distance" between population WSp3 and population WOb3 is 0.0332,
which is smaller than the distance of 0.0488 between WSp3 and WCal. The matrix is
symmetrical (A to B = B to A) and has zeros on the diagonal (A to A = 0).
How did we get these Cavalli-Sforza distances? They are simply a geometric view of
the distances between multi-dimensional points on a hypersphere (a sphere with > 3
dimensions). Say we have two subpopulations S1 and S2 assayed at a single locus
with alleles i = 1 to k. The formal definition is:
Eqn 6.1
That is, we take the square root of the frequency of allele 1 in S1 times that of allele 1
in S2 and repeat and sum that quantity for all k alleles. That gives us Cos () which
we can plug into the square-root term on the RHS (right hand side) of Eqn 5.1 above.
I don’t expect you to use or memorize this -- just to see that it is a purely
numerical/geometric approach. If we were doing it in 3 dimensions it would be akin
to figuring out the distance from New York to London along the surface of the globe
(called the chord distance). It can be fairly easily incorporated into a number-
crunching computer program that will produce output like the table of Cavalli-Sforza
distances shown above. Those distances, in matrix form, can then be used as input for
phylogenetic tree-building routines such as the UPGMA, Fitch-Margoliash and
neighbor-joining approaches we used in the homeworks.
The Cavalli-Sforza chord distance was an early measure and is still used (in fact I see
it gaining ground for use with microsatellites). Another geometric distance that was
widely used with allozymes (but I have not seen used with microsatellite data) is
Rogers’ distance (Wright, 1978). One reason the Cavalli-Sforza distance may be in
greater current use is that it was specifically evaluated (and performed well) in
simulations of tree-building algorithms by Takezaki and Nei (1996). [For all we know
Roger’s distance may perform equally well or better under circumstances that would
apply well to the questions people like me seek to address -- but since no one has
done such a study, people like me will tend to go with one that has a documented
good track record]. A very important part of the robustness of a distance measure is its
performance under a variety of conditions. It is always best if we can compare several
distance measures under conditions in which we know what the answer should be.
Paetkau et al. (1997) provide an evaluation of various distance measures that apply to
distance measures potentially useful for microsatellite analysis of bear populations.
2) Distance methods with biological assumptions. With a little luck (or a lot of hard
work), we know something about the evolutionary forces (most importantly here
mutation and drift, since we assume we are using markers that are not subject to
natural selection) driving genetic change in the system we're interested in. If so, it
seems reasonable to take advantage of that knowledge by incorporating it into
building a distance model. After all, we expect models with greater realism to perform
better (albeit at the cost of greater complexity, usually). Several distance measures
incorporate assumptions about the importance of drift and mutation as forces of
change:
Reynolds’ distance or the "coancestry" distance (Reynolds et al., 1983; see Weir,
1996, p. 167)
Nei’s distance (Nei 1972, 1978)
Models using a stepwise mutation model (SMM) specifically developed for
microsatellites (e.g., 2[delta mu squared] of Goldstein et al., 1995).
The problem with making assumptions is that violations can cause errors.
Empirically, it appears that many of the stepwise mutation models for microsatellites
do not perform well when analyzing many (most?) data sets, especially those where
small population sizes mean that drift has played at least as large a role as mutation.
Reynolds’ distance, which was derived for allozyme data on small (e.g., vertebrate)
populations assumes a primary role for drift and is an infinite-alleles model (an allele
can change from any given state into any other given state). Reynolds’ reliance on
"drift only" seemed inappropriate for microsatellites, which have:
a) a mutation rate that appeared clearly much larger than that of allozymes (1
mutation per 1,000 or 10,000 replication events for microsatellites vs. 1 mutation per
1,000,000 replication events for allozymes). [But that may be based on very long
repeats in highly polymorphic human populations].
b) a mutation process that would seemingly not fit the infinite-alleles model because
mutations generally occur in "stepwise" fashion by adding or deleting one of a series
of beads (AC10 goes to AC9 or AC11, where the subscript refers to the number of AC
repeat units).
[See my web page http://www.uwyo.edu/zoology/mcdonald/dna.htm for a quick
overview of microsatellites].
Nevertheless, Reynolds' distance and its neglect of the importance of mutation, may
work better than we would have expected (at least in some species/populations) for
two reasons:
a) small population sizes (= high potential for drift)
b) "missing steps" because drift creates a "chunky" distribution of alleles instead of
the smooth bell curve we would expect under a strict stepwise process.
Fig. 6.2. An allele frequency distribution that has been greatly affected by drift
and may better fit an infinite-alleles model (IAM). Even if the mutations that
generated the original variation did occur in stepwise fashion, drift has
removed some allele sizes (e.g., the 10-repeat category) while randomly
selecting others to be greatly over-represented (e.g., 12, 15 and 17). This sort
of "chunky" distribution may be quite common in many natural populations of
vertebrates (where effective population sizes, Ne, are always small or at least
often fluctuate to low numbers).
Genetic diversity
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Part of a series on
Evolutionary biology
Key topics[show]
Natural history[show]
Social implications[show]
Category
Book
Related topics
v
t
e
Genetic diversity, the level of biodiversity, refers to the total number of genetic
characteristics in the genetic makeup of a species. It is distinguished from genetic variability,
which describes the tendency of genetic characteristics to vary.
Genetic diversity serves as a way for populations to adapt to changing environments. With
more variation, it is more likely that some individuals in a population will possess variations
of alleles that are suited for the environment. Those individuals are more likely to survive to
produce offspring bearing that allele. The population will continue for more generations
because of the success of these individuals.[1]
The academic field of population genetics includes several hypotheses and theories regarding
genetic diversity. The neutral theory of evolution proposes that diversity is the result of the
accumulation of neutral substitutions. Diversifying selection is the hypothesis that two
subpopulations of a species live in different environments that select for different alleles at a
particular locus. This may occur, for instance, if a species has a large range relative to the
mobility of individuals within it. Frequency-dependent selection is the hypothesis that as
alleles become more common, they become more vulnerable. This occurs in host-pathogen
interactions, where a high frequency of a defensive allele among the host means that it is
more likely that a pathogen will spread if it is able to overcome that allele.
Contents
[hide]
The interdependence between genetic and species diversity is delicate. Changes in species
diversity lead to changes in the environment, leading to adaptation of the remaining species.
Changes in genetic diversity, such as in loss of species, leads to a loss of biological
diversity.[1] Loss of genetic diversity in domestic animal populations has also been studied
and attributed to the extension of markets and economic globalization.[4][5]
Genetic diversity is essential for a species to evolve. With very little gene variation within the
species, healthy reproduction becomes increasingly difficult, and offspring are more likely to
have problems resulting from inbreeding.[8] The vulnerability of a population to certain types
of diseases can also increase with reduction in genetic diversity.
Agricultural relevance[edit]
When humans initially started farming, they used selective breeding to pass on desirable traits
of the crops while omitting the undesirable ones. Selective breeding leads to monocultures:
entire farms of nearly genetically identical plants. Little to no genetic diversity makes crops
extremely susceptible to widespread disease. Bacteria morph and change constantly. When a
disease causing bacterium changes to attack a specific genetic variation, it can easily wipe out
vast quantities of the species. If the genetic variation that the bacterium is best at attacking
happens to be that which humans have selectively bred to use for harvest, the entire crop will
be wiped out.[9]
A very similar occurrence is the cause of the infamous Potato Famine in Ireland. Since new
potato plants do not come as a result of reproduction but rather from pieces of the parent
plant, no genetic diversity is developed, and the entire crop is essentially a clone of one
potato, it is especially susceptible to an epidemic. In the 1840s, much of Ireland’s population
depended on potatoes for food. They planted namely the “lumper” variety of potato, which
was susceptible to a rot-causing oomycete called Phytophthora infestans.[10] This oomycete
destroyed the vast majority of the potato crop, and left one million people to starve to death.
Cheetahs are a threatened species. Low genetic diversity and resulting poor sperm quality has
made breeding and survivorship difficult for cheetahs. Moreover only about 5% of cheetahs
survive to adulthood.[13] However, it has been recently discovered that female cheetahs can
mate with more than one male per litter of cubs. They undergo induced ovulation, which
means that a new egg is produced every time a female mates. By mating with multiple males,
the mother increases the genetic diversity within a single litter of cubs.[14]
species diversity
ecological diversity
morphological diversity
degeneracy
There are broad correlations between different types of diversity. For example, there is a
close link between vertebrate taxonomic and ecological diversity.[