Documentos de Académico
Documentos de Profesional
Documentos de Cultura
2) Genomes vary from 600 Kb to > 3,000 Mb but can all be sequenced
3) Genes are units of transcription. Almost all code for proteins
4) Since there is a universal 3-letter translation code, the amino acid
sequence of a protein can be determined from the nucleotide sequence
of the gene. Comparison to other proteins give useful hints to function.
This is where you need to understand protein evolution.
5)Knowing which proteins are encoded in the genome of an organism
helps us understand what it can and cannot do.
6) But it does NOT tell us what it does.
DNA is just a string of 4 bases - A,T,G,C - but a very long string!
The human genome has about 3,000 Mb carried on
22 chromosomes plus an X and a Y.
It has been completely sequenced and annotated.
How is it done?
Bob Waterston John Sulston Craig Venter
Genomic sequencing is an industrial, high-throughput process
(not to be carried out in an academic laboratory - Craig Venter)
Shot-gun sequencing is the way to go.
Paired reads and BAC end sequencing establishes overlap and gaps
reads (500 bp)
contigs (5 Kb)
metacontigs (50 Kb)
markers
chromosomes
Shear DNA into fragments of ~2 Kb.
Ligate into a plasmid.
Transform E. coli with plasmid.
Pick thousands of individual clones
ROBOTICALLY.
Store in 96-well plates.
C6
3245 3631 3969 3238 3015 3693 3597 3818 3260 4004 3234 3167 3400 3490 3906 3160 3776
3197 3582 3081 3100 3235 3669 3574 3873 3254 3097 3307 3609 3789 3883 3083
3034 3202 3438 3241 3961 3957 3959 4005 3749 3649 3850 3960 3112 3030
3022 3470 3453 3471 3219 3742 3489 3322 3037 3002 3567 3696 3052 4007 3479
3331 3561 3127 3372 3884 3180 3633 3142 3817 3053 3126
Process
Function
Summary of methodolgy for recognizing genes
Experimental
Sequence cDNAs from a large number of mRNAs and compare to genomic sequence.
Computational
Train a HMM program to recognize start sites, exons, splice sites,
introns, and termination sites of ORFs. Predict genes.
Experimental
In situ hybridization to determine cell type expression.
Molecular genetics (knock outs etc) to determine function.
Lecture 8
When organisms evolved a closed circulatory system about 400 Myrs ago,
there was a strong selection for clotting proteins to fill any accidental leaks.
Factor XII, a protein of 600 amino acids, is one of the clotting factors.
It "borrowed" several previously established domains.
Introns must begin and end in the same phase class [AG/yz].
Therefore, an inserted exon must have the same phase group
as the flanking exons. Many inserted exons are class 1.
Protein modules
Exon shuffling can not only add a new exon, but can also
duplicate existing exons or delete an exon.
1002 genes present in tomato were not found in Arabidopsis. 154 were
clearly present in either soy or Medicago. These are cases of gene loss in
the Arabidopsis lineage.
Some highly conserved genes that are present in both monocots and
dicots have been lost in Arabidopsis.
One of them, slr2032, appears to have come from the Synechocystis-like
genome that gave rise to the chloroplast.
History of gene slr2032
a diatom
a "primitive" alga algae
slr2032 found in
chloroplast genome
a blue-green bacterium
number of genes
in chloroplast genome
Homo Ciona
Fugu
Dictyostelium Drosophila
Oryza Anopheles
Arabidopsis
Caenorhabditis
Plasmodium
Leishmania
Dicty Dicty
0.5
1
20 2 9 3
45 56
Arab. Sacch. Arab. Dros.
7 3
15 1 20
12
Half-transporters
Full transporters
There are 11 ABCA genes in Dictyostelium.
Fungi have no genes of this family.
In animals ABCA proteins all have two transmembrane domains
(humans have 12 such genes)
In plants there is one gene with two domains and 16 with a single domain.
There appears to have been several cases of gene loss affecting whole
kingdoms.
The ABCG family is the only one in which the ABC cassette
preceeds the transmembrane domain. The progenetor may have
arisen by fusion of domains or domain loss.
Summary