Está en la página 1de 46

Overview: Assembly of large genomes using

NGS

Juliana Assis

Giant Panda
Ailuropoda melanoleuca
~2.25-Gb genome sequence
2n = 42 Chromosomes

The genome assembled using only next generation sequencing.

Strategy for genome assembly


NGS systems:
Illumina Genome Analyser
Libraries: 37 (pair-end, fosmid-end and BAC-end)
Coverage: 94x
Software:
SOAPdenovo (DBG) K=17, 27-mer
Server:
SGI: 512 Ram, 32 core

Flowchart of the panda genome de novo assembly.

cover approximately 94% of the whole genome

September 2010 | Volume 8 | Issue 9 | e1000475

Turkey
Meleagris gallopovo
~1,1-Gb genome sequence
2n= 80 Chromosomes

A synergistic combination of two next-generation sequencing


platforms.

Strategy for genome assembly


NGS systems:
454/Roche, Illumina GAII, BAC (Sanger)
Libraries: 2 (454), 1 (illumina)
Coverage: 5x (454) 25x (illumina) 6x (Sanger)
Software:
Celera Assembler modified (OLC and BOG -best overlap graph)
454 + illumina
Server:
Sunfire Enterprise 15000 with 72 processors and 288 GB
of shared memory

The assembled scaffolds were then ordered and oriented on


turkey chromosomes using a combination of two linkage maps
and a comparative BAC contig physical map.

Celera Assembler release 5.3 was used to produce the


assembly.
The assembly process can be summarized to the following major
stages:
Stage 1 (gatekeeper): input of reads and quality control
Stage 2 (overlapper): computation of read overlaps and
trimming of poor quality sequence based on the overlaps
Stage 3 (unitigger): initial assembly of uniquely-assemblable
contiguous chunks of sequence based on the overlaps
Stage 4 (cgw): scaffolding of unitigs based on mate pair data,
followed by merging overlapping unitigs into contigs
Stage 5 (consensus): computation of consensus sequences for

the assembly encompasses na estimated 89% of the total sequence of the genome.

Strawberry
Fragaria vesca
~ 240 Mb genome sequence
2n = 14 Chromosomes

The draft F. vesca genome, which was sequenced to 39


coverage using second-generation technology, assembled
de novo and then anchored to the genetic linkage map
into seven pseudochromosomes.

Strategy for genome assembly


NGS systems:
454, Illumina, and SOLiD
Libraries:
Coverage: 39x
Software:
Celera Assembler (OLC)
Velvet (DBG)
NUCmer
Bowtie
Server:
128 Gb memory and 32 processors

94%?

VOLUME 31 NUMBER 2 FEBRUARY 2013 nature biotechnology

Goat
Capra hircus
~2.66-Gb genome sequence
2n = 60 Chromosomes

The sequence was obtained by combining short-read


sequencing data and optical mapping data from a highthroughput whole-genome mapping instrument.

Strategy for genome assembly


NGS systems:
Illumina Genome Analyser and , Fosmid library
Mapping data
Libraries: 14 pair-end
Coverage: 65,6x
Software:
SOAPdenovo (DBG) and in-house software
Server:
SGI: 512 Ram, 32 core

Hybrid Assembly
The whole-genome mapping data facilitated the assembly of
super-scaffolds >5 longer by the N50 metric than scaffolds
augmented by fosmid end sequencing (scaffold N50 = 3.06 Mb,
super-scaffold N50 = 16.3 Mb).
Super-scaffolds are anchored on chromosomes based on
conserved synteny with cattle, and the assembly is well
supported by two radiation hybrid maps of chromosome 1.
These reads were used to extend scaffolds by in-house
software

To validate the quality of this assembly, they mapped onto it the


raw reads generated from the small insertion libraries, which
had been used for contig assembly and gap filling.
Over 89% of the raw paired-end reads could be mapped to the
assembled goat genome, of which 95% had the correct
orientation and correct distance between the ends, indicating
that the assembly is largely correct at the local level

92% of assembled

Turtle
Chrysemys picta bellii
~2.59-Gb genome sequence
2n = 32 Chromosomes

Sequenced the nuclear genome of a single female Western


Painted Turtle, Chrysemys picta bellii, using a combination of
next-generation whole genome shotgun and Sanger-based BAC
end reads.

Strategy for genome assembly


NGS systems:
454, Illumina and Sanger, cdna
Libraries: 64 (Sanger)
Coverage: 7x (454), 15x (illumina), x(Sanger)
Software:
Newbler (OLC)
BWA (Illumina and Sanger x contigs - 454) and Samtools
Server:
SGI: 512 Ram:

93%

Norway Spruce
Picea abies
~20-Gb genome sequence
2n=24 Chromosomes

To assemble the P. abies genome, we developed a hierarchical


strategy combining fosmid pools11 with both haploid and diploid
whole genome shotgun (WGS) data, and RNA sequencing (RNASeq) data.

Strategy for genome assembly


NGS systems:
Illumina (Hiseq 2000) and , Fosmid library
Libraries: 5 illumina, 450 pools Fosmids
Coverage: 38x
Software: CLCbio
Server: 2TB Ram: (5 days)

Cacao
Theobroma cacao
~445 Mbp genome sequence
2n=24 Chromosomes

Combination of Sanger and Roche 454 pyrosequencing

Strategy for genome assembly


NGS systems:
454/Roche, BAC-ends (Sanger), Illumina (mate-pair)
Libraries: 4 (454), 1 (Illumina)
Coverage: 21x (454), 0,4x (Sanger)
Software:
Arachne (modified)
Server:
:

Library
LINE
LINC
TC3A
TC3F
TC3B
TC3D
TC6C
TC8E
TC8F
TC8A
TCFB
TCFA
TCFC
TCCB
TCCC
TCCA
Total

Average
Insert Size

Read
Number

42679*
474194*
2,461695
2,635785
3,988453
3,988454
6,320655
7,101990
7,2881,256
8,206988
35,4834,209
35,7324,271
36,1324,316
93,11814,985
112,03624,40
1
127,06521,36
3

8,225,232
6,528,396
1,467,064
1,398,890
2,930,101
1,804,034
2,630,038
3,197,803
918,807
3,121,150
6,574
8,426
168,566
17,364

Assembled
Sequence
Coverage
(x)
8.280
7.300
0.560
0.510
0.920
0.560
0.920
1.040
0.300
0.970
0.010
0.010
0.260
0.030

17,371

0.030

20,491

0.040

32,460,307

21.740

Nelore
Bos indicus
~3,6 Gb genome sequence
2n=60 Chromosomes

Analysis of mitochondrial DNA sequences has shown


divergence of 250 thousand years between these 2 types,
indicating at least 2 distinct centers (Bradley et al. 1996).

Strategy for genome assembly


NGS systems: SOLiD
Libraries: 6
Coverage: 52x
92.6% (BCM4), 98.8% (UMD2)
Software: BWA and Samtools
Server:

Bos indicus: Gir e Guzer

Strategy for genome assembly


NGS systems: SOLiD, PacBio, Illumina
Libraries: 4 (SOLiD)
Coverage: 36x
Software:
SOAPdenovo (DBG), SMRTanalysis
Mira (OLC)
Server:
SGI: 512 Ram, 32 cores

Dados gerados para as duas raas

También podría gustarte