Está en la página 1de 36

# High-throughput sequencing metadata template (version 2.1).

# All fields in this template must be completed.


# Templates containing example data are found in the METADATA EXAMPLES spreadsheet tabs at the foot of this pag
# Field names (in blue on this page) should not be edited. Hover over cells containing
# Human data. If there are patient privacy concerns regarding making data fully public through GEO, please submit to

SERIES
# This section describes the overall experiment.
title
summary
overall design
contributor
contributor
supplementary file
SRA_center_name_code [optional]

SAMPLES
# This section lists and describes each of the biological Samples under investgation, as well as any protocols that are specific
# Additional "processed data file" or "raw file" columns may be included.
Sample name title
Sample 1
Sample 2
Sample 3

PROTOCOLS
# Any of the protocols below which are applicable to only a subset of Samples should be included as additional columns of the
growth protocol
treatment protocol
extract protocol
library construction protocol
library strategy

DATA PROCESSING PIPELINE


# Data processing steps include base-calling, alignment, filtering, peak-calling, generation of normalized abundance measurem
# For each step provide a description, as well as software name, version, parameters, if applicable.
# Include additional steps, as necessary.
data processing step
data processing step
data processing step
data processing step
data processing step
genome build
processed data files format and content

# For each file listed in the "processed data file" columns of the SAMPLES section, provide additional information below.
PROCESSED DATA FILES
file name file type
# For each file listed in the "raw file" columns of the SAMPLES section, provide additional information below.
RAW FILES
file name file type

# For paired-end experiments, list the 2 associated raw files, and provide average insert size and standard deviation, if known
PAIRED-END EXPERIMENTS
file name 1 file name 2
MPLES spreadsheet tabs at the foot of this page.
ells containing field names to view field content guidelines.
data fully public through GEO, please submit to NCBI's dbGaP (http://www.ncbi.nlm.nih.gov/gap/) database. dbGaP has controlled

stgation, as well as any protocols that are specific to individual Samples.

source name organism characteristics: tag

es should be included as additional columns of the SAMPLES section instead.

ng, generation of normalized abundance measurements etc…


arameters, if applicable.

ection, provide additional information below.

file checksum
ide additional information below.

file checksum instrument model read length

erage insert size and standard deviation, if known. For SOLiD experiments, list the 4 file names (include "file name 3" and "file name 4" colu

average insert size standard deviation


abase. dbGaP has controlled access mechanisms and is an appropriate resource for hosting sensitive patient data.

characteristics: tag characteristics: tag molecule


single or paired-end

name 3" and "file name 4" columns).


ve patient data.

description processed data file raw file


# High-throughput sequencing metadata template (version 2.1).
# All fields in this template must be completed.
# Templates containing example data are found in the METADATA EXAMPLES spreadsheet tabs at the foot of this page
# Field names (in blue on this page) should not be edited. Hover over cells containing
# Human data. If there are patient privacy concerns regarding making data fully public through GEO, please submit to

SERIES
# This section describes the overall experiment.
title Genome-wide maps of chromatin state in pluripotent and lineage-committe
summary We report the application of single-molecule-based sequencing technolog
overall design Examination of 2 different histone modifications in 2 cell types.
contributor John,B,Goode
contributor Bradley,Smith
supplementary file
SRA_center_name_code [optional]

SAMPLES
# This section lists and describes each of the biological Samples under investgation, as well as any protocols that are specific t
# Additional "processed data file" or "raw file" columns may be included.
Sample name title
Sample 1 H3K4me2_ChIPSeq
Sample 2 H3K4me1_ChIPSeq
Sample 3 input DNA

PROTOCOLS
# Any of the protocols below which are applicable to only a subset of Samples should be included as additional columns of the
growth protocol ES cell–derived NS cells were routinely generated by re-plating d 7 adhere
treatment protocol
extract protocol Lysates were clarified from sonicated nuclei and histone-DNA complexes w
library construction protocol Libraries were prepared according to Illumina's instructions accompanying
library strategy ChIP-Seq

DATA PROCESSING PIPELINE


# Data processing steps include base-calling, alignment, filtering, peak-calling, generation of normalized abundance measurem
# For each step provide a description, as well as software name, version, parameters, if applicable.
# Include additional steps, as necessary.
data processing step Basecalls performed using CASAVA version 1.4
data processing step ChIP-seq reads were aligned to the mm9 genome assembly using EasyAl
data processing step Data were filtered using the following specifications…
data processing step peaks were called using PeaksFind version 2.2 with the following setting:
data processing step
genome build mm9
processed data files format and content wig files were generated using …; Scores represent …

# For each file listed in the "processed data file" columns of the SAMPLES section, provide additional information below.
PROCESSED DATA FILES
file name file type
H3K4me2.peaks.wig wig
H3K4me1.peaks.wig wig
H3K4me2.b.peaks.wig wig

# For each file listed in the "raw file" columns of the SAMPLES section, provide additional information below.
RAW FILES
file name file type
080716_BI-EAS46_0001_209DH_L1.fastq fastq
080716_BI-EAS46_0001_209DH_L2.fastq fastq
080716_BI-EAS46_0001_209DH_L3.fastq fastq
080716_BI-EAS46_0001_209DH_L4.fastq fastq
080716_BI-EAS46_0001_209DH_L5.fastq fastq
080716_BI-EAS46_0001_209DH_L6.fastq fastq
080717_BI-EAS46_0001_20DH_L5.fastq fastq
080717_BI-EAS46_0001_20DH_L6.fastq fastq

# For paired-end experiments, list the 2 associated raw files, and provide average insert size and standard deviation, if known.
PAIRED-END EXPERIMENTS
file name 1 file name 2
MPLES spreadsheet tabs at the foot of this page.
ells containing field names to view field content guidelines.
ata fully public through GEO, please submit to NCBI's dbGaP (http://www.ncbi.nlm.nih.gov/gap/) database. dbGaP has controlled a

chromatin state in pluripotent and lineage-committed cells.


n of single-molecule-based sequencing technology for high-throughput profiling of histone modifications in mammalian cells. By obtaining o
nt histone modifications in 2 cell types.

tgation, as well as any protocols that are specific to individual Samples.

source name organism characteristics: cell type


Neural progenitor cells Mus musculus ES-derived neural progenitor cells
Neural progenitor cells Mus musculus ES-derived neural progenitor cells
Neural progenitor cells Mus musculus ES-derived neural progenitor cells

s should be included as additional columns of the SAMPLES section instead.


s were routinely generated by re-plating d 7 adherent neural differentiation cultures (typically 2–3 × 106 cells into a T75 flask) on uncoated p

om sonicated nuclei and histone-DNA complexes were isolated with antibody.


according to Illumina's instructions accompanying the DNA Sample Kit (Part# 0801-0303). Briefly, DNA was end-repaired using a combinat

g, generation of normalized abundance measurements etc…


rameters, if applicable.

ng CASAVA version 1.4


gned to the mm9 genome assembly using EasyAlign version 3.2 with the following configurations…
the following specifications…
PeaksFind version 2.2 with the following setting: ChIP threshold (0.2), Enrichment Fold (2.5), Rescue Fold (3).

d using …; Scores represent …

ection, provide additional information below.

file checksum
95cf1d1fa509d871b2ef0bb9fd734c3d
8ec6ee3cce10b970e5bfea4e35cdb231
f8fcd650914ff1a733956d6d06e8b543

de additional information below.


file checksum instrument model read length
6cc6ee3cce10b970e5bfea4e35cdb Illumina Genome Analyzer 36
88ceb0e0d056dda9208a03acf9073 Illumina Genome Analyzer 36
f2786fedc5106789a2af4014a0e74f Illumina Genome Analyzer 36
d8fcd650914ff1a733956d6d06e8b0Illumina Genome Analyzer 36
03839cca2e797b28b9f9371f7b9ca Illumina Genome Analyzer 36
604fbb658413c559511eb6ad2bb14 Illumina Genome Analyzer 36
57cf1d1fa509d871b2ef0bb9fd734c Illumina Genome Analyzer IIx 42
e5718e1a97690d410464f24f37aae Illumina Genome Analyzer IIx 42

erage insert size and standard deviation, if known. For SOLiD experiments, list the 4 file names (include "file name 3" and "file name 4" colu

average insert size standard deviation


base. dbGaP has controlled access mechanisms and is an appropriate resource for hosting sensitive patient data.

mammalian cells. By obtaining over four billion bases of sequence from chromatin immunoprecipitated DNA, we generated genome-wide chr

characteristics: passages characteristics: strain characteristics: ChIP antibody


15-18 C57BL/6 H3K4me2 (Millipore, 07-030, lot 12
15-18 C57BL/6 H3K4me1 (Millipore, 08-034, lot 11
15-18 C57BL/6 none

into a T75 flask) on uncoated plastic in NS-A medium (Euroclone, Milan, Italy) supplemented with modified N2 and 10 ng/ml of both EGF a

end-repaired using a combination of T4 DNA polymerase, E. coli DNA Pol I large fragment (Klenow polymerase) and T4 polynucleotide kin
single or paired-end
single
single
single
single
single
single
single
single

name 3" and "file name 4" columns).


e patient data.

we generated genome-wide chromatin-state maps of mouse embryonic stem cells, neural progenitor cells and embryonic fibroblasts. We fin

molecule description processed data file


genomic DNA H3K4me2.aligned.txt
genomic DNA H3K4me1.aligned.txt
genomic DNA H3K4me2.b.aligned.txt

N2 and 10 ng/ml of both EGF and FGF-2 (NS expansion medium).

ase) and T4 polynucleotide kinase. The blunt, phosphorylated ends were treated with Klenow fragment (32 to 52 exo minus) and dATP to y
d embryonic fibroblasts. We find that lysine 4 and lysine 27 trimethylation effectively discriminates genes that are expressed, poised for exp

raw file raw file raw file


H3K4me2.peaks.txt 080716_BI-EAS46_0001_209DH_L1080716_BI-EAS46_0001_209DH_L2
H3K4me1.peaks.txt 080716_BI-EAS46_0001_209DH_L4080716_BI-EAS46_0001_209DH_L5
H3K4me2.b.peaks.txt 080717_BI-EAS46_0001_20DH_L5.080717_BI-EAS46_0001_20DH_L6.fastq

o 52 exo minus) and dATP to yield a protruding 3- 'A' base for ligation of Illumina's adapters which have a single 'T' base overhang at the 3’
t are expressed, poised for expression, or stably repressed, and therefore reflect cell state and lineage potential. Lysine 36 trimethylation m

raw file
080716_BI-EAS46_0001_209DH_L3.fastq
080716_BI-EAS46_0001_209DH_L6.fastq
1_20DH_L6.fastq

ngle 'T' base overhang at the 3’ end. After adapter ligation DNA was PCR amplified with Illumina primers for 15 cycles and library fragments
ntial. Lysine 36 trimethylation marks primary coding and non-coding transcripts, facilitating gene annotation. Trimethylation of lysine 9 and ly

15 cycles and library fragments of ~250 bp (insert plus adaptor and PCR primer sequences) were band isolated from an agarose gel. The p
Trimethylation of lysine 9 and lysine 20 is detected at satellite, telomeric and active long-terminal repeats, and can spread into proximal uniq

ted from an agarose gel. The purified DNA was captured on an Illumina flow cell for cluster generation. Libraries were sequenced on the Ge
d can spread into proximal unique sequences. Lysine 4 and lysine 9 trimethylation marks imprinting control regions. Finally, we show that c

ries were sequenced on the Genome Analyzer following the manufacturer's protocols.
egions. Finally, we show that chromatin state can be read in an allele-specific manner by using single nucleotide polymorphisms. This study
otide polymorphisms. This study provides a framework for the application of comprehensive chromatin profiling towards characterization of d
ng towards characterization of diverse mammalian cell populations.
# High-throughput sequencing metadata template (version 2.1).
# All fields in this template must be completed.
# Templates containing example data are found in the METADATA EXAMPLES spreadsheet tabs at the foot of this page
# Field names (in blue on this page) should not be edited. Hover over cells containing
# Human data. If there are patient privacy concerns regarding making data fully public through GEO, please submit to

SERIES
# This section describes the overall experiment.
title Next Generation Sequencing Facilitates Quantitative Analysis of Wild Type
summary Purpose: Next-generation sequencing (NGS) has revolutionized systems-
summary Methods: Retinal mRNA profiles of 21-day-old wild-type (WT) and neural r
summary Results: Using an optimized data analysis workflow, we mapped about 30
summary Conclusions: Our study represents the first detailed analysis of retinal tran
overall design Retinal mRNA profiles of 21-day old wild type (WT) and Nrl-/- mice were g
contributor Rebecca,A,Smith
contributor David,Doe
supplementary file
SRA_center_name_code

SAMPLES
# This section lists and describes each of the biological Samples under investgation, as well as any protocols that are specific t
# Additional "processed data file" or "raw file" columns may be included.
Sample name title
Sample 1 WT rep1
Sample 2 WT rep2
Sample 3 Nrl-KO rep1
Sample 4 Nrl-KO rep2

PROTOCOLS
# Any of the protocols below which are applicable to only a subset of Samples should be included as additional columns of the
growth protocol
treatment protocol
extract protocol Retinas were removed, flash frozen on dry ice, and RNA was harvested us
library construction protocol RNA libraries were prepared for sequencing using standard Illumina proto
library strategy RNA-Seq

DATA PROCESSING PIPELINE


# Data processing steps include base-calling, alignment, filtering, peak-calling, generation of normalized abundance measurem
# For each step provide a description, as well as software name, version, parameters, if applicable.
# Include additional steps, as necessary.
data processing step Illumina Casava1.7 software used for basecalling.
data processing step Sequenced reads were trimmed for adaptor sequence, and masked for low
data processing step Reads Per Kilobase of exon per Megabase of library size (RPKM) were ca
data processing step
data processing step
genome build mm8
processed data files format and content tab-delimited text files include RPKM values for each Sample ...

# For each file listed in the "processed data file" columns of the SAMPLES section, provide additional information below.
PROCESSED DATA FILES
file name file type
WT.txt abundance measurements
WT2.txt abundance measurements
mutant1.txt abundance measurements
mutant2.txt abundance measurements

# For each file listed in the "raw file" columns of the SAMPLES section, provide additional information below.
RAW FILES
file name file type
Run123abc.csfasta solid_native_csfasta
Run123abc_QV.qual solid_native_qual
2011_01_gfh_qseq.txt Illumina_native_qseq
DS18389-7_1.fastq fastq
DS18389-7_2.fastq fastq
run454.seq 454_native_seq
run454.qual 454_native_qual
2011_05_rst_qseq.tar Illumina_native_qseq
GAXHYMS02.sff sff
080717_BI-EAS46_1.fastq fastq
080717_BI-EAS46_2.fastq fastq

# For paired-end experiments, list the 2 associated raw files, and provide average insert size and standard deviation, if known.
PAIRED-END EXPERIMENTS
file name 1 file name 2
DS18389-7_1.fastq DS18389-7_2.fastq
080717_BI-EAS46_1.fastq 080717_BI-EAS46_2.fastq
MPLES spreadsheet tabs at the foot of this page.
ells containing field names to view field content guidelines.
ata fully public through GEO, please submit to NCBI's dbGaP (http://www.ncbi.nlm.nih.gov/gap/) database. dbGaP has controlled a

ncing Facilitates Quantitative Analysis of Wild Type and Nrl-/- Retinal Transcriptomes
on sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived
A profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplic
ized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,0
represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized da
f 21-day old wild type (WT) and Nrl-/- mice were generated by deep sequencing, in triplicate, using Illumina GAIIx.

tgation, as well as any protocols that are specific to individual Samples.

source name organism characteristics: strain


Retina Mus musculus C57BL/6
Retina Mus musculus C57BL/6
Retina Mus musculus C57BL/6
Retina Mus musculus C57BL/6

s should be included as additional columns of the SAMPLES section instead.

flash frozen on dry ice, and RNA was harvested using Trizol reagent. Illumina TruSeq RNA Sample Prep Kit (Cat#FC-122-1001) was used w
ared for sequencing using standard Illumina protocols

g, generation of normalized abundance measurements etc…


rameters, if applicable.

ware used for basecalling.


trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to mm8 whole genome using bowtie
exon per Megabase of library size (RPKM) were calculated using a protocol from Chepelev et al., Nucleic Acids Research, 2009. In short, ex

nclude RPKM values for each Sample ...

ection, provide additional information below.

file checksum
d8fcd650914ff1a733956d6d06e8b091
abcdef123456789abc123456789abc
95cf1d1fa509d871b2ef0bb9fd734c3d
0wd6ee3cce10b970e5bfea4e35cdb987

de additional information below.

file checksum instrument model read length


6cc6ee3cce10b970e5bfea4e35cdb AB SOLiD System 3.0 50
88ceb0e0d056dda9208a03acf9073 AB SOLiD System 3.0 50
95cf1d1fa509d871b2ef0bb9fd734c Illumina HiSeq 2000 72
95cf1d1fa509d871b2ef0bb9fd734c Illumina HiSeq 2000 50
0wd6ee3cce10b970e5bfea4e35cdbIllumina HiSeq 2000 50
f2786fedc5106789a2af4014a0e74f 454 GS FLX Titanium 400
d8fcd650914ff1a733956d6d06e8b0454 GS FLX Titanium 400
03839cca2e797b28b9f9371f7b9ca Illumina Genome Analyzer II 36
604fbb658413c559511eb6ad2bb14 454 GS 20 36
57cf1d1fa509d871b2ef0bb9fd734c Illumina Genome Analyzer IIx 42
e5718e1a97690d410464f24f37aae Illumina Genome Analyzer IIx 42

erage insert size and standard deviation, if known. For SOLiD experiments, list the 4 file names (include "file name 3" and "file name 4" colu

average insert size standard deviation


222 25
300 32
base. dbGaP has controlled access mechanisms and is an appropriate resource for hosting sensitive patient data.

y are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase cha
d by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isof
(build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat w
q technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression

characteristics: tissue characteristics: age characteristics: genotype


retina post natal day 21 wild type
retina post natal day 21 wild type
retina post natal day 21 Nrl-/-
retina post natal day 21 Nrl-/-

(Cat#FC-122-1001) was used with 1 ug of total RNA for the construction of sequencing libraries.

m8 whole genome using bowtie v0.12.2 with parameters -q -p 4 -e 100 -y -a -m 10 --best --strata
ds Research, 2009. In short, exons from all isoforms of a gene were merged to create one meta-transcript. The number of reads falling in th
single or paired-end
single
single
single
paired-end
paired-end
single
single
single
single
paired-end
paired-end

name 3" and "file name 4" columns).


e patient data.

se transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis
e analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat follo
4,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were
tive investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative

molecule description processed data file


total RNA WT1.txt
total RNA WT2.txt
total RNA mutant1.txt
total RNA mutant2.txt

The number of reads falling in the exons of this meta-transcript were counted and normalized by the size of the meta-transcript and by the s
hroughput data analysis
OVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays
ng genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders o
rate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome charact

raw file raw file raw file


Run123abc.csfasta Run123abc.qual 2011_01_gfh_qseq.txt
run454.seq run454.qual 2011_05_rst_qseq.tar
GAXHYMS02.sff
080717_BI-EAS46_1.fastq 080717_BI-EAS46_2.fastq

he meta-transcript and by the size of the library.


BR Green assays
PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential e
eq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions.

raw file raw file


DS18389-7_1.fastq DS18389-7_2.fastq
transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression
of complex biologic functions.
value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq me
f sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized gen
eral as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significa
t workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling.

También podría gustarte