Está en la página 1de 7

Cloning genomic DNA

If you want to understand how genes are regulated and how they have evolved you need to be able
to examine their structure. cDNA can only tell you so much about a gene an individual cDNA
clone is derived from a processed RNA transcript. Compared with the whole gene, it will have lost
any upstream regulatory regions and it will have lost the introns. So to study genes in a complete
way you need to be able to make clones containing genomic DNA and, from your library of clones
representing the whole genome, you must be able to identify those clones which are derived from
the gene of interest.
As with all molecular biology techniques, the techniques of library construction have evolved
during the past twenty years.

Principles
The key principles are
1. Ensure that the library is representative of the entire genome. i.e. any point in the genome
should have an equal chance of appearing in the library compared with any other point.
This is not always as easy to accomplish as you might think.
2. Make sure that the library contains enough cloned DNA so that the chance of any point in
the genome being contained within at least one of the clones is high.
Subsidiary to this, but still important
3. Individual clones should contain sufficiently large inserts so that the number of clones
needed to ensure that point 2 is satisfied is not so great that the library size becomes too big
to handle conveniently.
It goes without saying (I hope) that the vector chosen, as well as helping to satisfy the conditions
above, should be able to propagate clones stably in the chosen host so that each cloned fragment of
genomic DNA is a true copy of the piece of genome from which it originated and is not rearranged
(duplicated, deleted, reshuffled or otherwise mutated) in any way.

How many clones are needed to make a representative library?


Even if every piece of the genome is equally clonable
(which is far from true and will be discussed below), no
library is guaranteed to contain every point in the genome.
This is a matter of statistics. The formula which relates the
number of clones needed in the library to the size of the clone
insert and the size of the target genome, for any given chance
that any point in the genome will appear in the library at least
once is:
N

ln 1 p

i
ln 1
g

where

N = number of clones in the library, p = probability that any


point in the genome will occur at least once in the library, i =
insert size, g = genome size

If this is not easy to envisage, imagine a


very large number of necklaces each
containing 100 beads numbered from 1
to 100. If we snap all the strings and
pour the beads into a box how many
individual beads will we have to pull
blindfold from the box to be able to
completely reassemble at least one
necklace? If the number of necklaces
really is very large (as it would be if
each necklace represents one complete
genome isolated from one cell) then we
can never be certain that we will pull
one copy of each bead. However, the
more beads we take the greater the
chance that we will in the end have one
copy of each number.

How does this work in practice?


Several different vectors have been used as methods have evolved. This little spreadsheet shows N
calculated for different vectors given various desired values of p and a genome size of 3 109 bp
which is the approximate size of the human or mouse genome.

vector
lambda
cosmid
YAC
BAC

genome size =
P
insert size
0.5
20000
103972
45000
46209
1000000
2079
500000
4159

3000000000 base pairs


0.9
345387
153505
6907
13814

In practice, these are typical values


Vector Maximum Approx. No. of
Insert size clones required
in library
lambda 20 kb

5 105

cosmid

45 kb

2 105

YAC

1 Mb

104

PAC

~120 kb

105

BAC

> 500 kb

2 104

0.99
690773
307009
13813
27629

0.999
1036160
460514
20720
41443

Advantages

Disadvantages

easy to construct
libraries, relatively
stable inserts
easy to construct
libraries and to
prepare DNA from
clones
few clones required

many clones required,


hard to prepare DNA
from clones
not always stable

fewer clones
required than for
cosmids, stable
few clones
required, very
stable

very prone to
rearrangement,
difficult to construct
single copy origin of
replication therefore
harder to prepare DNA
single copy origin of
replication therefore
harder to prepare DNA

Bacteriophage lambda was originally used as a genomic cloning vector because 20kb of its
genome, containing the genes required for lysogeny, could be replaced by insert DNA from
another species. It is now only used for genomic library building in exceptional circumstances.
Cosmids were popular in the 1980s and early 1990s for genomic cloning. They are a hybrid
vector, mostly plasmid with the lambda cohesive end incorporated. This is used to give a very
high efficiency of bacterial transformation because the recombinant DNA molecules can be
packaged into lambda protein coats which then will infect bacteria. YACs, yeast artificial
chromosomes, were designed to propagate in yeast, because of problems with genomic instability
they are no longer much used. PACs are another hybrid vector, part plasmid and part phage P1
and BACs are based around the F origin of replication. The book, Analysis of Genes and
Genomes by Richard Reece, (Wiley 2004) has an excellent chapter outlining all these vectors
which I recommend that you read.

Why are some pieces of genomic DNA more difficult to clone than others?
Some DNA sequences which form normal parts of eukaryotic genomes seem to be effectively
unclonable in bacteria. Some sequences are simply very AT rich, others may contain long inverted
repeat sequences which can form stable cruciform structures and which may interfere with the
supercoiling of plasmids containing them. Thus the plasmids cannot be maintained, the antibiotic
resistance is lost and the host dies. Some sequences are simply poison. Possibly they can
accidentally code for a polypeptide with detrimental effects, sometimes they accidentally act as
promoters driving unwanted transcription into the vector. Modern vectors contain powerful
terminator sequences flanking the cloning site which prevent insert driven transcription escaping
from the cloning site. A DNA sequence does not have to be absolutely unclonable to be effectively
unclonable. If a sequence is present at just 10% of the average frequency in a library and that
library is less than 10 deep then it is likely to be absent.

How do we create the fragments for cloning?


If we want to clone fragments of genomic DNA which are even as small as 20kb we cannot just cut
the genomic DNA to completion with a restriction enzyme.
For an average enzyme, the average distance between sites is given by the formula:
D = 4n
where D = distance in base pairs,
n = number of bases in the recognition site
and 4 because there are 4 different bases in DNA
this is shown in this table for the two commonest classes of enzyme.
Number of bases in recognition
site
4
6

Average distance between sites


256
4096

Neither of these distances is long enough to be useful in a


complete digestion.
Two methods have been used to generate suitable large
fragments of DNA.
1. carry out a partial restriction digest where a small
amount of the restriction enzyme is added to the
DNA for a short time so that only some sites are
cleaved. This can be difficult to control because high
molecular weight DNA solutions are very viscous
and it is difficult to mix the enzyme with the DNA
adequately in the short time available (maybe only 2
min. or so) without shearing the DNA by too
vigorous stirring.
2. A better method is to mix the restriction enzyme with
its corresponding methylase. Then the cleavage and
the protective methylation reactions are in a race (see
Figure 2). Both enzymes will diffuse into the viscous

Figure 1: a partial digestion time course.


Genomic DNA has been digested with a
small amount of restriction enzyme for from
0 to 25 min. The size markers are digested
with HindIII

DNA solution at the same rate so the proportion of cleaved sites compared to the protected
sites remains constant throughout the reaction. The proportion of sites which are cleaved is
determined by the relative amounts of the two enzymes which can be adjusted in trial
reactions beforehand
Restriction sites occur randomly in the genome.
But that is not what you want, you would prefer
them to occur evenly spaced in the genome. True
randomness gives some areas of the genome
where the sites for one enzyme are widely spaced
and other areas where the sites are tightly
clustered. This has the consequence that it
requires different digestion conditions to prepare
the same length DNA fragments from different
regions of the genome. Consequently, a range of
conditions are sometimes employed and the
resulting digests are mixed before cloning is
attempted. This effect has more marked
consequences for enzymes with a six base pair
Figure 2: competition between a restriction enzyme
site rather than those with a four base pair site.
and its methylase
So, on the whole, four base pair enzymes such as
MboI have been preferred for library construction. (MboI has the added advantage that its sticky
end is compatible with the sticky end generated by the six base pair enzyme BamHI which can be
used to prepare the vector.) No further treatment of the DNA is required after the digestion for
lambda and cosmid cloning because the lambda head packaging machinery takes care of the size
selection. But for YACs or BACs the partial restriction digest is usually size fractionated by
electrophoresis using pulsed field gel electrophoresis, the region containing the required size of
DNA fragments is cut from the gel and the agarose is digested away using the enzyme -agarase.
Sometimes, to protect the high molecular weight DNA from breakage by random shearing, the
partial restriction digest and the ligation to vector are both carried out before the agarose is
removed.

What do we do once the library is made?


A library is formed when a population of recombinant DNA molecules is inserted (by
transformation or transfection) into the host E. coli. At this point the experiment will still comprise
just a suspension of bacteria in some nutrient broth. Any clone will still be represented by just a
single bacterium. The suspension will then be plated onto nutrient agar (which in the case of
plasmid libraries will contain a selective antibiotic). Colonies (or plaques but from here on I'll
just refer to colonies) form overnight, each will consist of about a million descendants of the
original transformant. At this point the library is unamplified. If the library could be screened at
this point (for instance by making a replica plate and carrying out a colony hybridisation using a
gene specific probe) then all the signals obtained would result from independent cloning events.
(The number of signals would give you some indication of the depth of the library. Usually the
library will be plated at a high density onto large square filters (20cm 20cm) placed on nutrient
agar plus antibiotic. It is not easy to store libraries in this unamplified state. They must first be
replicated onto fresh filters so that replicas retain the positions of all the colonies. Replica filters
can then be stored at -80. Often all the colonies of one replica will be resuspended in a small
volume of buffer and 1ml aliquots frozen at -80. In this state the library is available for replating
if it is desired to screen it. However, there is no longer any guarantee that signals obtained from
the replated library will be independent each colony will have contributed approximately one

million genetically
identical cells to the
suspension each of
which can found a new
colony. Also, if two
different laboratories
each screen the same
amplified library with
the same probe and each
obtain some positive
colonies it will be
necessary to test if they
have each obtained
exactly the same clone or
whether they have
obtained overlapping
clones which were both
originally present in the
unamplified library.

Figure 3:

gridding a library

To circumvent these problems gridded libraries have been introduced. The first stage of library
construction is carried out as normal except that the library is plated out at a low colony density so
that clones are well separated. (In the old fashioned libraries above, colonies were plated at a very
high density to minimise the number of plates which had to be replica plated and to minimise the
number of filters which had to be screened.) Then a colony picking robot takes 96 sterile needles
and pricks 96 colonies. The 8 12 array of needles is used to inoculate small cultures in the wells
of a 96 well tray, the needles are sterilised and the robot is then ready to pick another 96 colonies.
In this way thousands of individual clones can be picked. (The Sanger institute has a good
information page at http://www.sanger.ac.uk/Teams/Team54/faq.shtml which is worth a visit.)
Each clone thus gains an address, its
plate, row and column number e.g.
255A6 would mean tray 255 row A
column 6. The advantages of gridding
are several. The trays are easily handled
by robots and so replication is
straightforward. Trays are easy to store
in a deep freeze. Copies of the same
library can be made available to any
interested laboratory and information
about any individual clone can be
placed in a central database. In this
way, if two laboratories discover
independent pieces of information they
Figure 3: Colony picking robot at the Sanger Institute
will realise that both facts are true of the On the right, illuminated by a red lamp, are the nutrient agar plates, in
same clone for instance that two
the middle are trays of 70% ethanol for sterilising the pins, on the left
are trays containing wells of nutrient broth into which individual
different genes are in the same clone,
colonies will be placed. The pins are in a rack behind the lamp and they
which might not have been easy to
can be brought forward individually and positioned very exactly over a
detect if the two laboratories had probed single colony.
the same amplified library each with
their own unrelated probes.

As time has gone by most labs have upgraded to 384 well (double density) and even to 1536 well
(quadruple density) trays. The same 96 pin hedgehogs can still be used to transfer between trays
it just takes extra cycles by the robot.

Library screening
Libraries may be screened by one of two methods, colony hybridisation or by PCR screening.
1. colony hybridisation: this is essentially the same as is used to screen a cDNA library. The
probe has frequently been a cDNA a genomic
Chromosome walking
clone for a known gene is identified by screening
From an initial clone, subclone the end
colonies with the appropriate cDNA clone. The
fragments so that each may be used to
probe will only hybridise to the exons. If the gene
reprobe the library. The ends may hybridise
to genomic clones which overlap and extend
contains introns which are bigger than the insert
the first clone. Identify how the clones
size in the library then you will not be able to
overlap, subclone the end fragment of the
identify clones containing them directly. You may
new clone and continue for as many cycles as
necessary.
need to walk. Gridded libraries can be replica
End probe
plated by robots onto nylon filters so that colonies
grow in a very dense array. When the colonies
have been lysed and their DNA has stuck to the
filter at the site of the colony, they can be probed
with a radioactive cDNA probe. Problems may
arise if your probe contains a repetitive sequence
original
New clone
clone
or if the gene is a member of a gene family in
which case genomic clones for related genes may
Original site of
hybridisation to
be found. This is especially a problem if there are
cDNA probe
retroposons of your gene in the genome which
will probably contain most of the cDNA sequence
and thus give a
much stronger signal than clones derived from the real gene but which contain only a few
small exons.
2. Figure 4 shows a typical array pattern. Each dot represents one colony. Thirty two 96 well
trays of colonies have been arrayed here (or more likely, 16 96 wells twice, so that each
colony is present in duplicate).

Figure 4:

A typical colony screening filter.

Figure 5:

some duplicate signals overlaid with an


alignment grid

3. PCR screening: When you


already know some DNA
sequence in the target gene

Figure 6: part of a library pooling scheme

you can design a PCR assay to ask which genomic clone contains it. It would be time
consuming, expensive and foolish to PCR every clone in the library this would be
thousands of PCR reactions. So it is necessary to subdivide the library into pools which
can be used to reveal the coordinates of the positive clones. A library of 20,000 clones will
consist of about 200 96 well plates. The clones in these can be pooled into groups of 10
plates see the diagram in figure 6 of one such super-pool. Then, from each super-pool
of 10 plates, can be made 10 individual plate pools, 12 column pools and 8 row pools. In
the figure this corresponds to 10 yellow pools, 12 red pools and 8 blue pools. Every clone
in the library will thus be found within one plate pool, one row pool and one column pool.
The library is screened by first just testing the super-pools (20 PCR reactions plus
controls). When a positive signal is obtained in one such pool then its plate / row / column
sub-pools are tested (10 + 8 + 12 = 30 PCR reactions). There should be one positive signal
in each of these which will reveal the coordinates of the positive clone. Finally this can be
confirmed with a third round of testing (1 PCR reaction). Giving a grand total of about 60
reactions (including controls). This method works well so long as the super-pools are
unlikely to contain more than 2 or 3 positive clones each (preferably no more than one). It
is also very amenable to robotization and so robots can carry out many such screens
simultaneously.

También podría gustarte