Está en la página 1de 54

GENÓMICA Y BIOINFORMÁTICA.

Frecuentemente se relaciona
el genoma con “el libro de la
vida”. Esta analogía se debe
a que ambos pueden ser
leídos secuencialmente, de
principio a fin, una letra tras
otra, y porque en el genoma
se encuentra la información
necesaria para hacer de cada
organismo un ser vivo.
Genómica
Se le llama así al proceso que
lleva a la caracterización total del
material genético de un
organismo
Genómica
Datos genómicos Bioinformatica

Secuenciación Bases de datos

Transcriptoma Análisis
de secuencias
Proteoma
Análisis datos
Metaboloma ….ómicos

Automatización
procesos
GENÓMICA

GENÓMICA GENÓMICA
COMPARATIVA ESTRUCTURAL

GENÓMICA
FUNCIONAL
GENÓMICA ESTRUCTURAL

Estudia la estructura
tridimensional de las
macromoléculas especialmente
proteínas y ácidos nucleicos, y
las funciones asociadas a ella.
GENÓMICA ESTRUCTURAL
 Estos estudios incluyen también :

-Organización de las secuencias dentro los


genomas.
-Asignación de loci.
-Resolución de mapas cromosómicos.
-Mapeo físico de genes.
-Secuenciación.
-Uso de mapas genómicos para el análisis de
genes
GENÓMICA COMPARATIVA
Es la información conocida de un organismo que puede
ser utilizada para obtener información de otro.

Ejemplo:
-Alineamientos
-Búsqueda de motivos
-Análisis filogenético
-Predicción de estructura de proteínas
GENÓMICA FUNCIONAL
Campo de la biología molecular que se
propone utilizar la vasta acumulación de
datos producidos por los proyectos de
genómica (como los "proyectos genoma"
de los distintos organismos) para
describir las funciones e interacciones
entre genes (y proteínas).
A diferencia de la genómica y la
proteómica, la genómica funcional se
centra en los aspectos dinámicos de los
genes, como su transcripción, la
traducción las interacciones proteína-
proteína, en oposición a los aspectos
estáticos de la información genómica
como la secuencia del ADN o su
estructura.
Bioinformática
 Permitan manipular la información contenida en estas
bases de datos se hace indispensable.

 Caracterización y clasificación de nuevos genes y


proteínas

 Métodos que faciliten la identificación de los elementos


encargados de la regulación de los diferentes elementos
genómicos

 Determinen la historia evolutiva, la función y los


determinantes estructurales.
Ejemplos
 Genomica comparativa: Minimal gene set
y metagenomic comunities
Micoplasma genitalium “the Haemophilus influenzae
smallest among known cellular 1703 coding genes
life forms 468 coding genes Gram negative
Gram positive

Minimal set of
genes requeridos
para la vida

“Extrapolation between genomes will then


most likely accelerate the definition of what
amounts to a “parts catalog” of cellular components
in a large number of organisms”. Bernhard Palsson,
NATURE BIOTECHNOLOGY VOL 18 NOVEMBER 2000
ow and dependencies in metagenome an

Its really complex


and full of pitfalls
Data
Microbial genomes analysis: the signs before the f
published per year

Completely sequenced
14Mio
and published Early
growth
linear

microbial genomes followed by


exponential
ORFs from increase
complete
genomes vs
metagenomics
ORFs

2003 2004 2005


1.1Mio 2006
1.5Mio
350k 500k 750k 1.5Mio
2003 2004 2005 2006

2003 2004 2005 2006


No of ORFs in all genomes (incl. ours)
¿Por qué 16S?
 Its presence in almost all bacteria, often
existing as a multigene family, or operons.

 The function of the 16S rRNA gene over time


has not changed, suggesting that random
sequence changes are a more accurate measure
of time (evolution).

 The 16S rRNA gene (1,500 bp) is large enough


for informatics purposes
Comparative metagenomics
Increase of functional
assignments (via orthologous
Orthologous groups (COGs + NOGs)

Reason for differenc


groups) with coverage
Biological issues
GC content
Genome sizes
Phylog
eny
Evolutionary speed
Evenness/Richness
Functionality

Technical issues
Sampling +preparati
Sequencing method
Assembly+annotatio
Coverage
…..
von Mering* … Bork, Hugenholtz, Rubin Science 308(05)554
Whale fall samples
Soil DNA •Between 25 and150 distinct
ribotypes
*At least 847 distinct ribotypes
•The most abundant accounts for 15
from more than a dozen phyla
to 25%
*More than 3000 predicted
•Between 100 and 700 Mbp would
bacterial ribotypes
be needed to generate a draft
assembly for the most prevalent
*Less than 1% of the nearly
genome
150,000 reads exhibited overlap

*Between two and five billion bp


would be necessary to obtain the
eightfold coverage
Acid mine drainage biofilm
A wide diversity of bacteria, few community
archaeal species, and some fungi
and unicellular eukaryotes were 90% of the orthologous groups were
found. detected with just 25 Mbp of raw sequence
Functional profiling of microbial communities.

These profiles clearly suggest that the predicted protein complement of a


community is similar to that of other communities whose environments of
origin pose similar metabolic demands.
-These profiles clearly suggest that the predicted protein complement of a
community is similar to that of other communities whose environments of origin
pose similar metabolic demands.

-Our results further support the hypothesis that the Bfunctional[ profile of a
community is influenced by its environment and that EGT data can be used to
develop fingerprints for particular environments.
Tringe*, von Mering* et al. Science 308(0
Specific
enrichments.
Protein function prediction in metagenomics
samples

Neighborhood
Blast
NextKnown
to Known NextUnknown
to Unknown No Prediction
No Hit
Neighbor-based predictions possible for almost 30% of 1.25 Mio ORFs studied
More than 75,000 would not have been possible by homology
Overall function predictions for >75% of environmental data!
Our functional knowledge: glass half full or half empty?
Function prediction in gene families of 1.5Mio proteins from 4
environments

Our knowledge
concentrates in
large, well
established families
contributing 65% of
the ORFs; However,
many specialized
functions in small
gene families are to
be discovered
All against all, MCL clustering,
(60bits, inflation factor 1.1)

Harrington et al, PNAS Aug.200


RNA interference:
listening to the sound of silence
776 genes are enriched in the ovary 98%

A large proportion was 429 no-letales 362 Sin fenotipo Over half the
required for either egg 322 letales 389 con fenotipo genes tested
production or ------------------ ---------------------- showed at least
embryogenesis 751 totales 751 totales one detect enable
phenotype

Most commonly
embryonic lethality
209 fenotipo post-embriónico
180 fenotipo pre-embriónico
67 post embriónico no letal ----------------------------------------
322 post embriónico esteril and letal 389 con fenotipo
------------------------------------
389 con fenotipo
Finding relationships between embryonic lethality, degree
of conservation, and extent of enriched expression in the
ovary

¿Los genes que participantes en la embriogenesis estan altamente conservados?


Correlation between RNAi Phenotypic
Classes and Expression Levels in the
Ovary for the Set of 751 Genes Tested

No-phenotype

Post-embryonic

Partial

Strong
Genes giving rise to embryonic lethal phenotype
were under-represented in the X chromosome

Expected based on # of genes


Expected based on distribution
Observed
1, unknown;
2, RNA transcription/modification;
3, chromatin/chromosome structure;
4, cell cycle control;
5, protein synthesis/folding/translocation/degradation;
6, energy metabolism;
7, Signal transduction/differentiation;
8, cytoskeleton associated or component;
9, Nuclear-cytoplasmic transport; a
10, chitin biosynthesis.
1, unknown;
2, RNA transcription/modification;
3, chromatin/chromosome structure;
4, cell cycle control;
5, protein synthesis/folding/translocation/degradation;
6, energy metabolism;
Phenoclusters
7, Signal transduction/differentiation;
8, cytoskeleton associated or component;
9, Nuclear-cytoplasmic transport; a Chromosome
10, chitin biosynthesis.
No visible biology
Exagerated Defect in the Granular Or DNA
Few eggs asyncrony First 50 min cytoplasm replication

Genes predicte to function in Genes involved in mRNA procesing


chromosome biology or DNA replication
The resulting ‘phenoclusters’ can provide information about both the
involvement of genes in particular modules and the functional relationships
that might exist between them
Clustering algorithms were used to group genes with similar expression
profiles, and these groups were visualized as ‘mountains’ in a ‘topomap’.
Proteínas ampliamente distribuidas con función
variada relacionadas con respuesta a estres
DNA damage response (DDR) protein interaction map for C. elegans. Arrows represent yeast two-
hybrid (Y2H) interactions and nodes (circles) represent proteins. Blue and orange nodes indicate products of genes from the
DNA repair and checkpoint phenoclusters, respectively. mrt-2 and C04F12.3 belong to a common phenocluster,
and their products physically interact in the Y2H system
Damage recovery module.
The nodes represent proteins and the lines represent yeast two-hybrid (Y2H) interactions.
All of the six proteins were identified as required for methyl methanesulfonate (MMS)
resistance in phenotypic analyses
Core germline interactome map for C. elegans.

The nodes represent proteins and the lines


represent yeast two-hybrid (Y2H)
interactions. The interactome map
was integrated with transcriptome
and phenome data.
Red lines indicate:

a) interactions between proteins whose


corresponding genes have both

b) similar expression profiles and

c) overlapping RNA interference (RNAi)


phenotypes.
A systems biology strategy

Starting from one known component, more components involved in a module of interest can be
identified, for example, by interactome mapping. A network can be constructed to describe
these interactions. Perturbation experiments are then systematically performed and responses
from the rest of the network are recorded, for example, by transcriptome profiling.