Está en la página 1de 6

Significance of MSA A MSA can reveal conserved residues that enable the identification of possibly important sites.

For example, conserved aminoacid residues are usually involved in protein function or are responsible for protein structural stability. In DNA sequences, conserved regions can represent a regulatory element. Besides of identifying conserved residues a more sophisticated approach is to use information form a MSA by using regions of residues with conserved properties to construct a statistical model such as a Position Specific Scoring Matrix or perhaps a Hidden Markov Model. These models are used to identify conserved regions in newly sequenced genomes, or they are used to construct databases such as PROSITE (Hulo et al., 2004) or PFAM (Bateman et al., 2004)).

The construction of multiple sequence alignments is closely related to phylogenetic analysis. A phylogenetic tree can be inferred by a multiple sequence alignment as shown in Figure 2. The study of molecular evolution is an area where MSA is extensively used.

for the prediction of protein secondary structure gene prediction from comparison of sequenced genomes

MSA: Definition: Unique property: Website hosting the tool: Steps of Algorithm: Advantages: Disadvantages:

Improved Version:

T-Coffee: Tree-based Consistency Objective Function for alignment Evaluation Definition: T-Coffee version 2.03 uses a consistency-based objective function optimized using progressive alignment. It tries to maximize the score between the final multiple alignment and a library of pairwise residue-by-residue scores derived from a mixture of local and global pairwise alignments. T-Coffee has two main features. First, it uses heterogeneous data sources to generate multiple alignments. Second, it carries out progressive alignment in a way that allows it to consider the alignment between all of the pairs during the generation of the MSA. This gives it the speed of a traditional progressive alignment but with far less tendency to misalign

Unique property:

Website hosting the tool: Steps of Algorithm: 1. Generate a primary library of alignments Contains a set of pairwise alignments between all of the sequences to be aligned Two alignment sources--one local, one global--for each pair of sequences are used; yielding two libraries--one local, one global 2. Derive the weights for the primary library A weight is assigned to each pair of aligned residues in the library The weights are assigned according to sequence identity 3. Combination of the libraries The libraries are pooled in a simple addition process Duplicated pairs are merged into a single entry Pairs that did not occur will not be represented (given a weight of zero) 4. Extending the library Combine information so that the final weight, for any pair of residues, reflects some of the information contained in the whole library Based on taking each aligned pair from the library and checking the alignment of the two residues with residues from the remaining sequences (consistency-based alignment) 5. Progressive alignment

Score for the alignments xi to yj is the sum of the weights of the alignments in the library containing the alignment xi to yj The distance matrix and neighbor-joining tree are determined' The initial pair is fixed at this point; any existing gaps cannot be shifted later Advantages: It produces more accurate alignments than the other methods. It is equipped with many different tools and modules such as CORE, Mcoffee and EXPRESSO for structure alignment, evaluation and combining alignments. Tcoffee can deal with many input formats, including FASTA, Swiss-Prot and PIR (Protein Information Resource). Tcoffee produces sequence alignment in various formats so that it can be used as an input for another program. It also produces a colorized alignment where every residue appears on a background that indicates the quality of this alignment in (.html) and (.pdf) format. It can produce true phylogenetic tree in Newick format by using the Neighbor Joining method. It can work with list of DNA, RNA or protein sequences. Tcoffee can evaluate the quality of any multiple sequence alignments using CORE server.

Disadvantages:

It takes longer time to align multiple sequences than other programs. Improved Version: M-Coffee is part of the T-Coffee package. M-Coffee constitutes a simple and efficient platform for the combination of various MSAs into one unique accurate model.

CLUSTAL W: Definition: ClustalW version 1.83 is the most widely used multiple alignment program. It uses a progressive alignment scheme where an initial guide tree (calculated from pairwise alignments) is used to guide a full multiple alignment by progressively incorporating all the sequences into the MSA.

Unique property: Website hosting the tool: Steps of Algorithm: Aligns pair-wise all-against-all the sequences using a heuristic approach (FastA algorithm). The scores of each alignment, which indicates similarity is transformed to a distance measure, and a distance matrix is constructed to construct a tree from the distance matrix with the Neighbour-Joining method This method constructs unrooted trees, so a root is placed in the middle of the largest branch

For each sequence a weight is computed in order to avoid the bias towards a group of very similar sequences. The weights depend on the distance of the sequence from the root, but sequences which have a common branch with other sequences share the weight derived from the shared branch. Using the tree as a guide, progressively the sequences are aligned with dynamic programming. Similar sequences are aligned first, while less similar are aligned later.

Advantages:

Often implemented together with a Neighbor Joining phylogenetic analysis package, including Bootstrap analysis. This provide an easy interface to a means to make a statistically defensible statement about the tree relating the sequences. Implemented within Seaview multiple alignment editor, such that local regions of a larger alignment can be subjected to revision by clustalw. Default parameters widely accepted to produce an acceptable "objective" alignment on sequences >= 25% identity. May do better than HMMER or SAM at relating divergent but well-populated families. ClustalW will allow preserving prior alignments of two or more subsets as the global alignment is formed. The other methods will at most preserve the prior alignment of only one group.

Disadvantages:

Is easily confused by long stretches of unalignable sequences within otherwise well related sequences. Often produces a kind of false objectivity, where the user has fiddled with the program parameters to achieve a subjectively pleasing result, rather than just manually editing the alignment. Has no statistical evaluatory properties. It will produce an alignment whether the provided sequences are related or not.

Improved Version: CLUSTAL X, was introduced in 1997 (Thompson, et. al., 1997). This version uses the same algorithm as CLUSTAL W, but it has features that make it more user friendly (e.g., scrollable windows, pull-menus, etc.)

MUSCLE: Definition: Muscle (5) version 3.52 and version 6.0. Muscle v3.52 uses a progressive alignment algorithm with a Log Expectation score to align sets of sequences along a guide tree. Muscle v6.0 uses the same objective function as in ProbCons to further refine the alignment from Muscle v3.52 There are two distinguishing feature of MUSCLE. In determining distance measures for pairs of sequences, it uses both the kmer distance (for an unaligned pair) and the Kimura distance (for an aligned pair). A kmer is merely a contiguous sequence of letters of length k, also known as a word or ktuple. Sequences that are related will have more kmers in common than expected by 6 chance. The kmer distance is derived from the fraction of kmers in common in a compressed alphabet, which is related to fractional identity. Because this measure does not require an alignment, MUSCLE has a significant speed advantage over other MSA algorithms. The second distinguishing feature is that at the completion of any stage of the algorithm, a MSA is available and the algorithm can be terminated.

Unique property:

Website hosting the tool: Steps of Algorithm: Stage 1 introduces error in the form of the approximate kmer distance, which produces a suboptimal tree. Stage 2, the improved progression stage, improves on the alignment generated in stage 1. Stage 3 is the refinement stage. Here, an edge of the tree from stage 2 is deleted. This divides the tree into two sub-trees; the profile of the multiple alignment for each sub-tree is calculated. The profiles from the two sub-trees are re-aligned, producing a new MSA. If the new sum-of-pairs score is improved, the new alignment is kept. Otherwise it is discarded. Advantages:

Disadvantages: Improved Version:

COBALT: constraint-based alignment tool

Definition:

COBALT has a general framework that uses progressive multiple alignment to combine pairwise constraints from different sources into a multiple alignment. COBALT does not attempt to use all available constraints but uses only a high scoring consistent subset that can change as the alignment progresses, where a set of constraints is called consistent if all of the constraints in the set can be simultaneously satisfied by a multiple alignment. Furthermore, CDD also contains auxiliary information that allows COBALT to create

partial profiles for input sequences before progressive alignment begins, and this avoids computationally expensive procedures for building profiles. Unique property: Website hosting the tool: Steps of Algorithm: Step 1: Find alignments for generating constraints. Step 2: Find partial profiles and a pairwise consistent set of constraints. Step 3: Generate a guide tree. Step 4: Create a multiple alignment using the current set of constraints and guide tree. Step 5: Create bipartitions and realign. Advantages:

Disadvantages: Improved Version:

Dialign2 (28) version 2.2.1 is a local multiple alignment method and is an improvement on the original segment to-segment based approach of Dialign (11,29). Dialign-T (30) v0.1.3 is a new version of Dialign, which incorporates the Dialign objective function in a progressive alignment algorithm.
ProbCons (15) version 1.09 is, like T-Coffee, a consistency based method. Alignments are generated using a library of paired hidden Markov models. It is currently the most accurate method as benchmarked on the HOMSTRAD dataset

También podría gustarte