Documentos de Académico
Documentos de Profesional
Documentos de Cultura
If you want to understand how genes are regulated and how they have evolved you need to be able
to examine their structure. cDNA can only tell you so much about a gene an individual cDNA
clone is derived from a processed RNA transcript. Compared with the whole gene, it will have lost
any upstream regulatory regions and it will have lost the introns. So to study genes in a complete
way you need to be able to make clones containing genomic DNA and, from your library of clones
representing the whole genome, you must be able to identify those clones which are derived from
the gene of interest.
As with all molecular biology techniques, the techniques of library construction have evolved
during the past twenty years.
Principles
The key principles are
1. Ensure that the library is representative of the entire genome. i.e. any point in the genome
should have an equal chance of appearing in the library compared with any other point.
This is not always as easy to accomplish as you might think.
2. Make sure that the library contains enough cloned DNA so that the chance of any point in
the genome being contained within at least one of the clones is high.
Subsidiary to this, but still important
3. Individual clones should contain sufficiently large inserts so that the number of clones
needed to ensure that point 2 is satisfied is not so great that the library size becomes too big
to handle conveniently.
It goes without saying (I hope) that the vector chosen, as well as helping to satisfy the conditions
above, should be able to propagate clones stably in the chosen host so that each cloned fragment of
genomic DNA is a true copy of the piece of genome from which it originated and is not rearranged
(duplicated, deleted, reshuffled or otherwise mutated) in any way.
ln 1 p
i
ln 1
g
where
vector
lambda
cosmid
YAC
BAC
genome size =
P
insert size
0.5
20000
103972
45000
46209
1000000
2079
500000
4159
5 105
cosmid
45 kb
2 105
YAC
1 Mb
104
PAC
~120 kb
105
BAC
> 500 kb
2 104
0.99
690773
307009
13813
27629
0.999
1036160
460514
20720
41443
Advantages
Disadvantages
easy to construct
libraries, relatively
stable inserts
easy to construct
libraries and to
prepare DNA from
clones
few clones required
fewer clones
required than for
cosmids, stable
few clones
required, very
stable
very prone to
rearrangement,
difficult to construct
single copy origin of
replication therefore
harder to prepare DNA
single copy origin of
replication therefore
harder to prepare DNA
Bacteriophage lambda was originally used as a genomic cloning vector because 20kb of its
genome, containing the genes required for lysogeny, could be replaced by insert DNA from
another species. It is now only used for genomic library building in exceptional circumstances.
Cosmids were popular in the 1980s and early 1990s for genomic cloning. They are a hybrid
vector, mostly plasmid with the lambda cohesive end incorporated. This is used to give a very
high efficiency of bacterial transformation because the recombinant DNA molecules can be
packaged into lambda protein coats which then will infect bacteria. YACs, yeast artificial
chromosomes, were designed to propagate in yeast, because of problems with genomic instability
they are no longer much used. PACs are another hybrid vector, part plasmid and part phage P1
and BACs are based around the F origin of replication. The book, Analysis of Genes and
Genomes by Richard Reece, (Wiley 2004) has an excellent chapter outlining all these vectors
which I recommend that you read.
Why are some pieces of genomic DNA more difficult to clone than others?
Some DNA sequences which form normal parts of eukaryotic genomes seem to be effectively
unclonable in bacteria. Some sequences are simply very AT rich, others may contain long inverted
repeat sequences which can form stable cruciform structures and which may interfere with the
supercoiling of plasmids containing them. Thus the plasmids cannot be maintained, the antibiotic
resistance is lost and the host dies. Some sequences are simply poison. Possibly they can
accidentally code for a polypeptide with detrimental effects, sometimes they accidentally act as
promoters driving unwanted transcription into the vector. Modern vectors contain powerful
terminator sequences flanking the cloning site which prevent insert driven transcription escaping
from the cloning site. A DNA sequence does not have to be absolutely unclonable to be effectively
unclonable. If a sequence is present at just 10% of the average frequency in a library and that
library is less than 10 deep then it is likely to be absent.
DNA solution at the same rate so the proportion of cleaved sites compared to the protected
sites remains constant throughout the reaction. The proportion of sites which are cleaved is
determined by the relative amounts of the two enzymes which can be adjusted in trial
reactions beforehand
Restriction sites occur randomly in the genome.
But that is not what you want, you would prefer
them to occur evenly spaced in the genome. True
randomness gives some areas of the genome
where the sites for one enzyme are widely spaced
and other areas where the sites are tightly
clustered. This has the consequence that it
requires different digestion conditions to prepare
the same length DNA fragments from different
regions of the genome. Consequently, a range of
conditions are sometimes employed and the
resulting digests are mixed before cloning is
attempted. This effect has more marked
consequences for enzymes with a six base pair
Figure 2: competition between a restriction enzyme
site rather than those with a four base pair site.
and its methylase
So, on the whole, four base pair enzymes such as
MboI have been preferred for library construction. (MboI has the added advantage that its sticky
end is compatible with the sticky end generated by the six base pair enzyme BamHI which can be
used to prepare the vector.) No further treatment of the DNA is required after the digestion for
lambda and cosmid cloning because the lambda head packaging machinery takes care of the size
selection. But for YACs or BACs the partial restriction digest is usually size fractionated by
electrophoresis using pulsed field gel electrophoresis, the region containing the required size of
DNA fragments is cut from the gel and the agarose is digested away using the enzyme -agarase.
Sometimes, to protect the high molecular weight DNA from breakage by random shearing, the
partial restriction digest and the ligation to vector are both carried out before the agarose is
removed.
million genetically
identical cells to the
suspension each of
which can found a new
colony. Also, if two
different laboratories
each screen the same
amplified library with
the same probe and each
obtain some positive
colonies it will be
necessary to test if they
have each obtained
exactly the same clone or
whether they have
obtained overlapping
clones which were both
originally present in the
unamplified library.
Figure 3:
gridding a library
To circumvent these problems gridded libraries have been introduced. The first stage of library
construction is carried out as normal except that the library is plated out at a low colony density so
that clones are well separated. (In the old fashioned libraries above, colonies were plated at a very
high density to minimise the number of plates which had to be replica plated and to minimise the
number of filters which had to be screened.) Then a colony picking robot takes 96 sterile needles
and pricks 96 colonies. The 8 12 array of needles is used to inoculate small cultures in the wells
of a 96 well tray, the needles are sterilised and the robot is then ready to pick another 96 colonies.
In this way thousands of individual clones can be picked. (The Sanger institute has a good
information page at http://www.sanger.ac.uk/Teams/Team54/faq.shtml which is worth a visit.)
Each clone thus gains an address, its
plate, row and column number e.g.
255A6 would mean tray 255 row A
column 6. The advantages of gridding
are several. The trays are easily handled
by robots and so replication is
straightforward. Trays are easy to store
in a deep freeze. Copies of the same
library can be made available to any
interested laboratory and information
about any individual clone can be
placed in a central database. In this
way, if two laboratories discover
independent pieces of information they
Figure 3: Colony picking robot at the Sanger Institute
will realise that both facts are true of the On the right, illuminated by a red lamp, are the nutrient agar plates, in
same clone for instance that two
the middle are trays of 70% ethanol for sterilising the pins, on the left
are trays containing wells of nutrient broth into which individual
different genes are in the same clone,
colonies will be placed. The pins are in a rack behind the lamp and they
which might not have been easy to
can be brought forward individually and positioned very exactly over a
detect if the two laboratories had probed single colony.
the same amplified library each with
their own unrelated probes.
As time has gone by most labs have upgraded to 384 well (double density) and even to 1536 well
(quadruple density) trays. The same 96 pin hedgehogs can still be used to transfer between trays
it just takes extra cycles by the robot.
Library screening
Libraries may be screened by one of two methods, colony hybridisation or by PCR screening.
1. colony hybridisation: this is essentially the same as is used to screen a cDNA library. The
probe has frequently been a cDNA a genomic
Chromosome walking
clone for a known gene is identified by screening
From an initial clone, subclone the end
colonies with the appropriate cDNA clone. The
fragments so that each may be used to
probe will only hybridise to the exons. If the gene
reprobe the library. The ends may hybridise
to genomic clones which overlap and extend
contains introns which are bigger than the insert
the first clone. Identify how the clones
size in the library then you will not be able to
overlap, subclone the end fragment of the
identify clones containing them directly. You may
new clone and continue for as many cycles as
necessary.
need to walk. Gridded libraries can be replica
End probe
plated by robots onto nylon filters so that colonies
grow in a very dense array. When the colonies
have been lysed and their DNA has stuck to the
filter at the site of the colony, they can be probed
with a radioactive cDNA probe. Problems may
arise if your probe contains a repetitive sequence
original
New clone
clone
or if the gene is a member of a gene family in
which case genomic clones for related genes may
Original site of
hybridisation to
be found. This is especially a problem if there are
cDNA probe
retroposons of your gene in the genome which
will probably contain most of the cDNA sequence
and thus give a
much stronger signal than clones derived from the real gene but which contain only a few
small exons.
2. Figure 4 shows a typical array pattern. Each dot represents one colony. Thirty two 96 well
trays of colonies have been arrayed here (or more likely, 16 96 wells twice, so that each
colony is present in duplicate).
Figure 4:
Figure 5:
you can design a PCR assay to ask which genomic clone contains it. It would be time
consuming, expensive and foolish to PCR every clone in the library this would be
thousands of PCR reactions. So it is necessary to subdivide the library into pools which
can be used to reveal the coordinates of the positive clones. A library of 20,000 clones will
consist of about 200 96 well plates. The clones in these can be pooled into groups of 10
plates see the diagram in figure 6 of one such super-pool. Then, from each super-pool
of 10 plates, can be made 10 individual plate pools, 12 column pools and 8 row pools. In
the figure this corresponds to 10 yellow pools, 12 red pools and 8 blue pools. Every clone
in the library will thus be found within one plate pool, one row pool and one column pool.
The library is screened by first just testing the super-pools (20 PCR reactions plus
controls). When a positive signal is obtained in one such pool then its plate / row / column
sub-pools are tested (10 + 8 + 12 = 30 PCR reactions). There should be one positive signal
in each of these which will reveal the coordinates of the positive clone. Finally this can be
confirmed with a third round of testing (1 PCR reaction). Giving a grand total of about 60
reactions (including controls). This method works well so long as the super-pools are
unlikely to contain more than 2 or 3 positive clones each (preferably no more than one). It
is also very amenable to robotization and so robots can carry out many such screens
simultaneously.