Está en la página 1de 27

ProteinShop: A Tool for Protein

Structure Prediction and


Modeling
Silvia Crivelli

Computational Research Division


Lawrence Berkeley National Laboratory
The Protein Structure Prediction
Problem
To determine how proteins, the building
blocks of living cells, fold themselves into
three-dimensional shapes that define the
role they play in life.
Importance of Protein Structure
Prediction
• The shape of a protein determines its function.
• Knowledge of structure is used in many ways:
– Drug design
– Design of synthetic proteins
– Re-engineering defective proteins
• Genome projects are providing sequences for
many proteins whose structure will need to be
determined.
Protein Structures
Proteins consist of a long chain of
Gly Leu Ser Pro amino acids, the primary structure

Side chain Amino acid


H R O H H R O H
N N
Backbone N N
O H R H O H R H

H O R H H O R H
H-bond
N N
N N
R H H O R H H O
Protein Structures
Proteins consist of a long chain of
Gly Leu Ser Pro amino acids, the primary structure

The constituent amino acids may


encourage hydrogen bonding that
α -helix β -sheet form regular structures, called
secondary structures

The secondary structures fold


together to form a compact
3-dimensional shape, called
the tertiary structure
Ab Initio Approach

Our Goal: To provide an approach that relies more on physical


principles than on information from known proteins

The problem can be formulated as a global


minimization problem, as it is assumed that the
tertiary structure occurs at the global minimum of
the free energy function of the primary sequence
Ab Initio Method
Tertiary structure is
believed to minimize potential energy:
Min VMM (x)
where x = atom coordinates

Difficulties: Proposed energy function may not


match nature
O(en2) local minima
Very large parameter space
e.g., modestly sized protein
100 amino acids
~ 1,600 atoms
~ 4,800 variables
The Search Algorithm
Given the amino acid sequence of a
protein, find the global minimum of
the free energy function.

Generate
Global
Starting
Optimization
Configurations
Phase 1 Phase 2
Secondary Structure Predictions
in Phase 1
Sequence: SKIGIDGFGRIGRLVLRAALSCGAQ

Servers predict secondary


structure likely to be in a
target protein based on a
large database of known
proteins.

Sequence:
SKIGIDGFGRIGRLVLRAALSCGAQ
Type:
CBBBB BCCCAAAAAAACCCBBBBBC
Weight:
1135522356789992888566733
Matching the predicted strands is a
combinatorial problem

Which strands are paired?

? ? ?

Which orientation?

parallel anti-parallel

Which residues are paired?

odd even
There are n!2 n-2 possible
n-stranded motifs

96 motifs for n=4


960 motifs for n=5

It takes weeks to
create some of these
configurations using
constrained local
minimizations!

Distribution of Beta Sheets in Proteins with Applications to Structure Prediction


Ruckzinski, Kooperberg, Bonneau, and Baker, Proteins 48,2002
CASP4 Competition
• Fourth community-wide experiment on the
Critical Assessment of Techniques for Protein
Structure Prediction (2000)

• Our group predicted 8 proteins


•Largest protein had 240 aa
•Most complex fold had 2 β-strands
ProteinShop
• Interactive tool for protein manipulation
• Designed to quickly create initial configurations
• It takes weeks to create a number of configurations
using constrained minimizations
• It takes a few hours to create the same
configurations with ProteinShop
Phase 1 with ProteinShop
Amino Acid Sequence 2nd ary Structure
Prediction

Phase 1 Structure Sequence

Initial Configurations Geometry


Generation
ProteinShop
Phase 2 Pre-configuration
takes minutes

Direct
Final Configuration
Manipulation

Initial Configurations
CASP4 Competition (before ProteinShop)
•Our group predicted 8 proteins
•Largest protein had 240 aa
•Most complex fold had 2 β-strands

CASP5 Competition (with ProteinShop)


•Our group predicted 20 proteins
•Largest protein had 417 aa
•Most complex fold had 13 β-strands
Phase 2

Initial Configurations
Amino Acid Sequence
Subspace
Phase 1 Selection
Takes months to
Subspace converge using
Initial Configurations
Optimization hundreds of
processors on
Phase2: Global Seaborg!
Optimization Candidate
Selection
Final Configuration
Final Configuration
Phase 2 with ProteinShop

Initial Configurations Will reduce


Amino Acid Sequence computation time
Subspace
Phase 1 Selection
Monitoring
System
Initial Configurations Subspace
Optimization
Phase2: Global Direct
Optimization Candidate Manipulation
Selection
Final Configuration
Steering
Final Configuration
System
Monitoring System
• Monitor progress of overall optimization/each
optimization process
Monitoring System
• Monitor progress of overall optimization/each
optimization process

• Alert user to important events during optimization


• A sudden drop in internal energy
• A group of processes getting stuck

• Test new heuristics for expanding nodes of the


tree
Steering System
• Change configurations during optimization to
account for developments not anticipated during
Phase 1

• Manipulate proteins that don’t seem to be realistic


or that are stuck in a local minimum

• Allow pruning of the optimization tree


•Assign multiple processes to a configuration that just had
a drop in internal energy
•Assign stuck processes to other configurations
Plans for the Future
Use of the monitoring and steering
features to develop and test a new
method for protein structure prediction

Compete in CASP6 (Critical Assessment


of Techniques for Protein Structure Prediction)

Expand and enhance ProteinShop


ProteinShop
O. Kreylos, N. Max, B. Hamann,
S. Crivelli, and W. Bethel.
Interactive Protein Manipulation,
Winner of the Best Application
Award IEEE Visualization 2003,
Seattle.

Available to academic and non-profit organizations


proteinshop.lbl.gov

También podría gustarte