Written in March, 1998 by David Yee+
One of the greatest challenges in
science is predicting the three-dimensional structure of proteins from simply
knowing their liner sequence. Currently successful prediction of protein tertiary structure
mainly relies on a knowledge-based modeling of side chains in proteins with
sequence homologous to one with known structure (Mehta, et al., 1995).
But we still are waiting for the development of a highly accurate
algorithm that only requires the knowledge of the amino acid sequence and its
environment. Countless applications
await the accomplishment of such a feat, since the function of a protein is
conferred by its tertiary structure. Once
the rules for folding are elucidated and refined, it is possible, for example,
to synthesize artificial proteins that could accomplish any task efficiently and
accurately- designer proteins that bind to specific DNA sequences to affect the
transcription of target genes would be particularly useful.
Section I. Factors Involved in Protein Folding
The answer to the question why
proteins fold is simply because they want to minimize their energy.
It appears that numerous factors must work concertedly in order for
proteins to fold and exist in a minimized energy state.
The possible influences in the protein folding process include the
hydrophobic effect, hydrogen-bonding, steric effects, and electrostatic effects
produced by polar/charged residues. These
effects can partly be attributed to the properties of the unique side chains of
the amino acids. Table 1. summarizes some
of the properties of the twenty amino acids.
Through the analysis of mutation rates in protein sequences, amino acids
have been described as belonging in four exchangeable groups (Srinivasan, 1996).
This information is displayed in Table 2. Note that the amino acids containing large aliphatic chain
form a group along with methionine and cysteine.
Preservation of the same secondary structure may be a key reason why
exchanges between amino acids belonging to the same group within a protein are
less likely to significantly alter the function of the protein.
Table 2. Amino acids can be classified into four groups.
In a particular protein, exchanges between the amino acids of a
particular group are less likely to drastically alter protein function.
methionine, leucine, isoleucine, valine
||phenylalanine, tyrosine, tryptophane
||aspartic acid, asparagine, alanine, glutamic
acid, glutamine, proline, serine, threonine
||arginine, histidine, lysine
Most peptides first fold into secondary structures
before the amino acid residues interact further to form tertiary configurations.
Therefore it is worthwhile to briefly examine the factors involved in the
formation of the secondary structures. There
exist several types of secondary structures: the alpha helix, the beta sheet,
and the turns (Lodish et al., 1995).
The important facts in predicting secondary structures include the
propensity of certain amino acids to form particular secondary structures, the
inclination for specific residues to be found at certain positions within
secondary structures, and that certain patterns of hydrophobicity correlate with
particular secondary structures (King, 1996).
The limited range of backbone
conformations that each different amino acid is able to adopt is an important
reason why certain amino acids have propensities for particular secondary
structures. Usually, each amino
acid in a protein has three angles of rotations- phi, psi, and omega.
Phi is the degree of rotation for the bond between the nitrogen and the
alpha carbon. The degree of
rotation termed psi is designated for the bond between the alpha carbon and the
carbonyl carbon, and omega is the degree of rotation of the peptide bond.
The peptide bond has considerable double-bond characteristics and is,
under normal circumstances, fixed. Figure
1 demonstrates the three angles of rotation.
Different amino acids have different psi and phi values, however, and
these restrict the possible conformations that can be undertaken.
For instance, the phi value for proline is fixed about �65�,
and thus a proline residue has a much-reduced ranged of allowed conformations (Streyer,
1995). On the other end of the spectrum, glycine allows the
polypeptide backbone to make turns that would not be possible with another
The ability of certain amino acids
to form hydrogen bonds is also important for the formation of secondary
structures. There is extensive
hydrogen bonding in the alpha helix. Alanine,
glutamine, and leucine residues favor alpha helix (Streyer, 1995).
Serine, aspartate, and asparagine disrupt alpha helices because their
side chains contain a hydrogen-bond donor and acceptor in close proximity to the
backbone, where they compete for main-chain NH and CO groups (Streyer, 1995).
Valine and isoleucine also disfavor the formation of the alpha-helix, but
for a different reason- the branching at the beta carbon of their side chains
sterically hinders alpha-helix formation. Proline
disfavors alpha helix formation because of steric hindrance and that it lacks an
amide H atom for hydrogen bonding (Streyer, 1995).
The hydrogen side group in glycine allows too much flexibility around the
alpha carbon, and thus glycine is also not frequently found in alpha helices (Lodish,
et al., 1995).
Beta sheets are composed of beta
strands. Each beta strand is about 5-8 residues long and the backbone atoms of
each residue are able to hydrogen bond (Lodish, et al., 1995).
Through association via hydrogen bonding, individual strands are able to
form beta sheets, with the side chains of residues protruding perpendicular to
Turns are compact, U-shaped
structures composed of three or four residues stabilized by a hydrogen bond
between their end residues (Lodish, et al., 1995).
The turns are located on protein surfaces and form a sharp bend that
redirects the polypeptide backbone back toward the hydrophobic interior (Lodish,
et al., 1995). Glycine and proline
tend to favor the formation of turns. Table
1 shows the relative frequencies of occurrence of particular residues in alpha
helices, beta sheets, and beta turns.
It is important to note that
particular short peptide sequences can form an alpha helix in one protein but
adopt a beta strand conformation in another.
These so-called �chameleon� sequences have been discovered in the
immunoglobulin-binding domain of protein G, ribonuclease, and erythrocruorin (Perutz,
1997 & Streyer, 1995). These
findings imply that the context is often critical in determining secondary
structure. Tertiary interactions, the interactions between residues that
are far apart in the linear sequence, may be decisive in specifying the
secondary structure of some (but not all) segments (Streyer, 1995).
This is a problem that needs to be addressed in predicting folding.
The structure of most proteins
varies significantly when exposed to different pH.
The change in structure can drastically alter the behavior of proteins.
It is therefore worthwhile to examine the
isoelectric point of the twenty common amino acids.
An isoelectric point (pI) of an amino acid is the hydrogen ion
concentration of the solution in which the amino acid does not migrate under the
influence of an electric field (Morrison & Boyd, 1992).
Table 1 and 1b contain the pI, calculated using the web-based ExPASy
PI/MW tool, for the twenty amino acids and some short peptides, including eleven
tumor antigens, activation segment of pepsinogen (which exists in the acidic
condition of the stomach), and chymotrypsin 1 precursor (which exists in the
alkaline environment of the small intestines).
It is noteworthy that an amino acid
usually shows its lowest solubility in a solution at the isoelectric point
(Morrison & Boyd, 1992). This
can be significant because insoluble proteins are implicated in a variety of
diseases, including Alzheimer�s (Perutz, 1997).
A nonsynonymous point mutation of a single amino acid may contribute to
alter the default isoelectric point of a protein and promotes its precipitation-
although other consequences that occur when an amino acid substitution takes
place may be more significant in protein aggregation.
It is not known in detail how pH
affects the structure of proteins, however.
Electrostatic interactions do not generally contribute to stability of
the tertiary protein structure, but they are vital in proper ligand-recognition
for many proteins that bind to other substrates. Calculations for a number of
protein complexes show that the net effect of electrostatic interactions is
generally to destabilize the docking of two pre-conformed molecules (Chong, et
al., 1998). Although the folded
state of a protein results in numerous favorable interactions within the
protein, the large electrostatic desolvation penalty due to polar and charged
groups often still lingers (Chong, et
al., 1998). In addition, it has
been observed that identically charged residues that are in close proximity to
one another serve to destabilize the backbone.
It has widely
been known that the protein folding process is highly sensitive to temperature.
This relationship between folding and temperature appears to stem mainly
from the effects of free energy upon hydrophobic interactions (Scalley and
Baker, 1997). It has been recently the burial of hydrophobic residues
involves a large change in their heat capacity and, thus, the associated
free energy changes are strongly dependent on temperature.
This phenomenon reflects a large decrease in solvent-accessible surface
area in the transition state of proteins relative to their unfolded
state (Scalley and Baker, 1997). In
general, thermal energy is required for most proteins to fold, but excess heat
often disrupts the native protein structure.
surprisingly, the turn and loop connections between elements of secondary
structure often do not contribute a major role in specifying the structure and
properties of a protein (Munson, et al, 1996).
Instead, the packing of residues in the hydrophobic core, or the
desolvation of the nonpolar groups, is vital for the structure, stability, and
properties of the final, folded protein (Munson, et al., 1996).
The investigation of the protein Rop by Mary Munson et al. demonstrates
the importance of hydrophobic packing. Rop
is a homodimeric (two 63-residue helix-turn-helix monomers), RNA-binding protein
that controls the replication of ColE1 plasmids (Munson, et al. 1996).
The hydrophobic core of Rop consists of eight layers of heptad repeats of
mainly alanine and leucine. In the
study, amino acids were substituted in the hydrophobic core of to generate Rop
mutants, whose properties were subsequently analyzed. Better repacking of the core via mutants that contained only
alternating alanine and leucine residues created more stable proteins that also
retained similar RNA-binding affinities of the wild-type.
�Underpacked� mutants (e.g. only alanines in the core) were
predominantly in the unfolded state, and lacked the ability to bind RNA.
Interestingly, although overpacking with leucine residues created a
protein that lacked RNA binding property, the mutant is extremely stable.
This phenomenon was attributed to an increase in the burial of
hydrophobic surface area, which compensated for the energetic cost of poor
hydrophobic packing (Munson, et al, 1996).
Further support for hydrophobic packing as the main driving force for
protein folding comes from the studies of other polymers that also fold in
solution. The polymer phenylacetylene, studied by a team lead by organic chemist
Jeffrey Moore of the University of Illinois, readily shapes into a helix, and
forms a easily modifiable cavity (Pennisi, 1997).
Phenylacetylene has no hydrogen-bonding capability because of the lack of
oxygen or nitrogen in the compound, but polymers of the molecule still is able
to fold. The phenyl rings in a
phenylacetylene polymer must, however, do their best to avoid contact with the
polar solvent, and they accomplish this by twisting the polymer so that it
resembles a helix (Pennisi, 1997). But
although the hydrophobic effect is the driving force of folding for
phenylacetylene polymers, it should be kept it mind that some amino acids are
charged, and the interactions they have with one another and with the uncharged
amino acids can play a significant role to the protein folding pathway, like
destabilizing the backbone as previously mentioned.
Figure 2. Phenylacetylene
Consequences of poor hydrophobic
packing include: reduced stability, little or no thermal denaturation
transitions (little or no enthalpic component to stability), low cooperativity
(evident by chemical denaturation transitions), lack of well-defined tertiary
structure (results in poor chemical shift dispersion of NMR spectra and rapid
exchange rates of backbone amides with solvent).
These are, as presented in Table 3, characteristics of molten structures.
It can thus be inferred that the immediate events that occur after molten
structure formation likely involve extensive hydrophobic interactions between
the non-polar residues.
Table 3. Similarity and differences between molten globules and
native structure of proteins.
|Similar Property Between Folded Proteins and
high levels of secondary structure
Properties that Molten Globules Possess
packing of the hydrophobic core not well defined
little or no enthalpic component to stability
lack of well-defined tertiary structure
poor chemical shift dispersion of NMR spectra
rapid exchange rates of backbone amides with solvent
There are also findings that indicate backbone
interactions are the major stabilizing force in the protein folding process.
Backbone interactions have both electrostatic and solvation components
(Zhang et al., 1997). Upon
analyzing and estimating the backbone entropy of 17 single-domain proteins
structures, Zhang et al. found that backbone interactions provided 90%
(backbone-backbone interactions contribute 64% of the total energy, while
backbone-side-chain interactions account for 26%) of the stability, with
side-chain-side-chain interactions mainly providing specificity (Zhang, et al,
1997). Zhang�s team achieved this conclusion by using a new
algorithm involving atomic contact energies to estimate the desolvation effects
and side-chain entropy changes, which allowed for the determination of backbone
entropy change from experimental data. Further
evidence suggesting the importance of backbone interactions is the behavior of
polyalanine peptides in solution (Chakrabatty & Baldwin, 1995).
Researchers found that these entities tend to spontaneously form
alpha-helices in solution. These
results correlate with the findings previously mentioned that polar and charged
residues mainly provide for electrostatic interactions with other
macromolecules, and only contributes marginally to the stability of the folded
protein, although they can easily serve to destabilize the native state.
In addition to these findings, the search for residues that form a stable
backbone is the first step employed by Stephan Mayo in his de novo protein
design technique. Thus it can be
concluded that backbone interactions are indeed very important in determining
the final tertiary structure of proteins. The backbone is surprisingly flexible in the types of
residues it can accommodate, however, and that concerted backbone movements
adopted by a protein are able to accommodate potentially disruptive residues (Su
& Mayo, 1997). Unfortunately,
this flexibility complicates the tertiary prediction process.
II. The Protein Folding Process
The linear amino acid sequence
folds into its native, thermodynamically and kinetically stable unique structure
(the protein) that is stable at physiological temperature.
This process is often completed in a timeframe ranging from milliseconds
to minutes (Dinner, et al. 1996). Immediately
after (or occurring simultaneously with) the formation of secondary structures
is the condensation of the polypeptide into a molten globule.
The molten globule is the partially folded intermediate of a protein.
Though possessing most of the secondary structures of a folded protein,
it lacks the well-packed tertiary structure of the native state (Streyer, 1995).
Some of the differences between molten globules and folded proteins were
outlined in Table 3.
Researchers have actually captured
the act of protein folding and provided time course descriptions of the events
that occur in the process. These experiments have revealed that proteins fold either by
two-state kinetics in which no stable intermediates accumulate on the
pathway or by multistate kinetics through one or more detectable
intermediates (N�lting, 1997). Therefore,
it is possible to detect the presence of a relatively stable intermediate if a
protein folds via the multistate kinetics pathway.
Usually small proteins are more likely to fold in the highly cooperative,
multistate pathway (Scalley & Baker, 1997).
N�lting et al. have characterized the multistate folding
pathway of the protein barstar from
a well-characterized denatured state at the level of individual
residues on a microsecond time scale. Their studies have revealed support for the
nucleation-condensation mechanism. The
mechanism states that there is first the formation of a diffuse nucleus that
consists of some neighboring residues whose conformations are
stabilized by long-range interactions with residues that are distant
in sequence. An essential component of the mechanism is that the
nucleus and its stabilizing interactions elsewhere in the protein
develop concurrently (N�lting, 1997). The alpha
-helix is unstable in the absence of long-range interactions, and the
rest of the structure is unstable without interactions with the helix,
and thus there is cooperative formation of the nucleus and the surrounding structure
(N�lting, 1997). Specifically, for
barstar, it was found that a peptide corresponding to part of a nucleus is
mainly random under folding conditions. This region in the denatured barstar has
flickering native-like structure, showing that it is stabilized by
long-range interactions, even in the denatured state (N�lting,
1997). In the first transition state, a nucleus centered on helix1 is
almost completely formed in the transition state for the formation of
the folding intermediate on the microsecond time scale.
In the second transition state, many of the remaining residues in the
protein make weak interactions in the early formed intermediate,
which are then more highly consolidated because of those interactions.
Figure 3. (Figure 1 of The
folding pathway of a protein at high
resolution from microseconds to seconds by N�lting)
Structure of barstar. Barstar
is a relatively small, 89-residue protein that has evolved to be the
specific intracellular inhibitor of the ribonuclease barnase that is
secreted from Bacillus
helices. There are three
strands of parallel
sheet helix of barstar. Barstar
is on the borderline of being a single- or two-module protein.
||Figure 4. (Figure 5 of The folding
pathway of a protein
at high resolution from microseconds to seconds
by N�lting). General nucleation-condensation model. CI-2 is
chemotrypsin inhibitor-2, which was also studied by N�lting et al.
Dinner et al., like the N�lting
team, also found that folding of small proteins occurs in two discreet stages.
The first stage involves formation of a core that serves as a nucleus for
folding to a near native (approximately 80%) structure, and the second stage
involves rearrangement around the core to form the native structure. Core
formation is facilitated by both the stabilities of its contacts and the
presence of cooperative secondary structure with effective initiation sites
(Dinner, et al., 1996).
experimenting and observing the folding behavior of cytochrome c, Sosnick and
his team of researchers also support the two stage folding process of small
proteins, although they reveal additional details about the initial step.
They contend that the core formation/condensation nucleation stage is
rate- limiting and energetically uphill. This
step includes the time-consuming conformational search for some relatively
specific transition state. Once the
transition state is �found,� the forward folding step is in an energetically
downhill manner (Sosnick,
the folding of small proteins lasts approximately 1 millisecond or more (Sosnick, 1997).
A similar, more pioneering study
was conducted by Martin Gruebele and his team of researchers strongly supports
the importance of local interactions early in the folding process. By rapidly �supercooling� and heating the protein
apomyoglobin in solution, the scientists were able to detect the events that
lead to the final native structure of the protein.
Supercooling denatures the protein, while small increments of energy
introduced by laser beam partially fold it.
Apomyoglobin's final shape includes a trio of helixes (designated A, G,
and H) on different parts of the molecule, with the H and G helixes spiraling
parallel to one another and roughly perpendicular to the A helix (Service,
1996). The results of the
experiment suggest that local interactions between neighboring residues induce
the protein to very rapidly adopt some secondary structure (the coil in the A
helix) before the more global interactions push it to undertake the 3D structure
that moves the A helix next to the H (Service, 1996).
This counters the views of some theorists, who contends that global
structures take shape either before or simultaneously with local structures
(Service, 1996). The existence of
chameleon sequences, however, suggests that in certain instances global
interactions are required in order to induce the formation of secondary
structures. Therefore it is
probable that both possibilities are correct, depending on the protein in
The research conducted by
Dinner, Sali, and Karplus has provided one of the most in-depth views into the
folding process of larger proteins. They
engineered sequences with high secondary structure content in the native state
to investigate the role of native structure in the folding process.
It was found that folding begins with a rapid collapse followed by a slow
search through the semi-compact globule for a sequence-dependent stable core
with about 30% native contacts and serves as the transition state for folding to
a near-native structure (Dinner, et al. 1996).
This slow process is indicative of the magnitude of the number of
conformations possible; the semi-compact random globule must locate the one
single correct structure that will allow the formation of a native-like
transition state conformation. The
hydrophobic core formation via searching is dependent on the structural features
of the native structure. Sequences
that fold have largely stable, cooperative structure that is accessible through
short-range initiation sites, such as those in anti-parallel sheets connected by
turns. Contacts are
�cooperative� if formation of any one contact increases the probability of
formation of the others (Dinner, et al., 1996).
Before folding is completed, the system can encounter a second
bottleneck, involving condensation and rearrangement of surface residues.
Overly stable local structure of the surface residues slows this stage of
the folding process.
Figure 5. This is a simplified version of the protein folding
process for medium to large sized
yet another important fact to keep in mind when considering the process of
protein folding: the folding of most proteins in cells is assisted by enzymes (Streyer,
1995). These enzymes are the
chaperonins and isomerases, and they are present in both bacteria and eukaryotes.
The enzyme, protein disulfide isomerase, for example, catalyzes the
formation and breaking of disulfide bonds for many proteins to find the optimal
pairings (Streyer, 1995).
protein GroEL is a widely studied chaperonin. A member of the Hsp60 class of
proteins, it is
composed of 14 identical subunits arranged as two stacked heptameric
rings, and each ring contains a large cavity into which a substrate protein
binds and subsequently folds (Frieden & Clark, 1997).
At least 40 proteins, many of which are from eukaryotes, form stable
complexes with GroEL (Frieden & Clark, 1997).
This lack of specificity is due to the preferential binding of small
unfolded structures to form the chaperonin-unfolded protein complex (Frieden
& Clark, 1997). Surprisingly,
Frieden and Clark have found that GroEL does not increase or catalyze the
folding of its substrate proteins. Rather,
the chaperonin increases the probability of the protein refolding
correctly by decreasing the probability of aggregation that may occur
in the bulk solution (Frieden & Clark, 1997).
The ability of GroEL to fold proteins
correctly without aggregation is a consequence of the stability of
the complex formed between GroEL and the protein
conformation just prior to the final folding
step (Frieden & Clark, 1997). It
is possible that this phenomenon also occurs in higher animals, and that
mutations in chaperonin genes in humans may contribute to diseases involving
Section III. Predicting Protein Folding
There are two major problems in
predicting protein folding that need to be solved.
The first problem is to determine an energy function that can
discriminate, for a protein, between the set of native or native-like
conformations and other conformations (Elofsson et al., 1996). The
conformational space for a protein is enormous because there are at least 3
possible backbone conformations per residue, leading to 3100 possible
backbone conformations for a 100 residue protein (Elofsson et al., 1996). The
second problem is to develop an algorithm that can find the lowest energy
structure in this conformational space (Elofsson et al., 1996). These two
problems are closely coupled, as the energy function also needs to provide a
guided path to the native structure (Elofsson et al., 1996).
Despite the large number of possible backbone conformations possible, it
is clear that many if not most of them can be ruled out due to the different
steric and chemical constraints imposed by each unique residue.
Several methods have been applied to search the conformational space of a
protein for low energy conformations. These methods include molecular dynamics,
Monte Carlo simulations, genetic algorithms and diffusive methods (Elofsson et
In modeling protein folding, it
should also be kept in mind that proteins can either fold sequentially or
combinatorially. If a protein folds
sequentially, it means that nascent structures are not affected by structures
that materialize later in the folding process.
On the other hand, combinatorial folding indicates that secondary
structures formed later in the folding process influence the already-formed
secondary structures. In this model the conformation of the protein must be
recalculated after the formation of a new secondary structure.
For instance, in combinatorial folding, for a protein with n
= 20 degrees of freedom, calculation at every 1�
gives rise to the possibility of 360n, or greater than 1050
conformers (Sisser, 1997). In
sequential folding, there are only 360 * 20, or 720 conformers (Sisser, 1997). Sequential
folding models are mostly used because of the lack of massive computing power,
although combinatorial folding may be more representative of the true folding
The critical aspect in predicting
protein folding is the evaluation of the feasibility of the possible conformers
that a particular sequence of amino acids may possibly undertake.
All of the factors discussed in section I become the framework for this
difficult task. Hydrophobic packing
is critical in tertiary structure formation, and one relatively simple model for
simulating the protein folding process, the hydrophobic zipper hypothesis,
depends completely on hydrophobic interactions.
The theory is derived from the concept of cooperativity (the probability
that a peptide chain undertaking a particular conformation is increased if it
had previously been in certain predecessor conformations) (Toma, 1996). In this model, the amino acids of the polypeptide chain are
designated only as either H (hydrophobic) or P (nonhydrophobic).
Each amino acid occupies a �lattice site�, connected to its neighbors
and unable to occupy any other site filled by any other residue.
At each site, the protein chain can either continue ahead or turn 90�
up, down, left, or right. The
energy of a nascent chain in this model is calculated by the summation of the
favorable energy contributions of �1 units (Note that more negative energy
contributions means greater stability) between two nonbonded
hydrophobic-hydrophobic residues occupying neighboring nondiagonal lattice
points (Toma, 1996). On the other hand, PP and PH contacts do not account for any
energy contribution. Algorithms
based upon the hydrophobic zipper hypothesis lead to a compact chain
conformation that has a least one hydrophobic core.
A protein folding model, called the
Contact Interaction Method (CI), utilizes the hydrophobic zipper hypothesis
(HZ), though with modifications. In CI, there is heterogeneity in the mobility, or the ability
of residues to adopt different conformations, of a sequence.
Residues within a loop created by two HH contacts are less mobile than
residues outside the loop (Toma, 1996). The
algorithm that describes CI is as follows: (1) Start from an extended, linear
amino acid sequence; (2) select a random residue to be moved, for example the ith;
(3) use the criterion of mobility to decide if it has to be moved or not, and
move it if Rnd < exp[f(i)/
ck), note that ck
is temperature, Rnd is a random number between 0 and 1, and f(i) is the mobility of
the ith residue, where the value is 0
if the residue has free movement, and a negative for restricted mobility; if the
residue is not moved, then go to 2 (the criterion is always satisfied for
residues not belonging to loops defined by HH contacts); (4) random choice of
the movement, i.e., random choice of the value of theta(i) while taking as invariant all the other theta coordinates (this
corresponds to a pivot move), note that theta specifies direction the residue
will undertake (e.g. turn right, left, or go ahead in a 2-D lattice); (5)
control of the validity of the structure deriving from the movement; if not, go
to 2; (6) if the structure is valid, the new conformation is accepted and its
energy evaluated; a time step is counted, and the function f(i) assumes the values
deriving from the loops present in the conformation; go to 2. Figures 6 through
8 make the process easier to understand. The
CI algorithm has proven to be very efficient both in two and three dimensions
and allowed the localization of energy minima not localized by other
conformational search algorithms described in the literature (Toma, 1996).
6 (Figure 2 of Contact Interactions Method by Toma). Three possible
conformations of the HPPHPHP sequence on the 2D square lattice.
Conformation A has no HH contact; conformation B has one HH contact;
conformation C has two HH contacts. Black and white boxes represent H
(hydrophobic) and P (polar) monomers, respectively.
(Figure 2 of Contact Interactions
Method by Toma). Minimum energy (native) conformation of a sequence on
the 2D square lattice. Ten HH contacts are present.
(Figure 1 of Contact Interactions Method by Toma). Heterogeneity of
mobility specified by CI. All
the residues in the loop from i to j have less mobility than the residues
9. (Modified version of Figure 6 of Contact Interactions Method by Toma).
The different structures, as determined by CI, of two identical sequences
in a 3D lattice. The one on
the right has 2 more HH contacts than the structure on the left.
There exist, however, amphiphilic residues such as
lysine, arginine, and tyrosine (DeGrado, 1997).
Their apolar atoms can cap the hydrophobic core, while their polar groups
engage in electrostatic and hydrogen-bonding interactions (DeGrado, 1997).
Therefore algorithms based upon the hydrophobic zipper hypothesis such as
the CI can possibly be improved with the incorporation of A (amphiphilic)
residues. A scoring function such
as the one used by Dahiyat and Mayo takes in van der Waals potential to account
for steric constraints and an atomic solvation potential to favor the burial and
penalize the exposure of nonpolar surface area (Dahiyat & Mayo, 1997).
This may an excellent way to evaluate the possible conformers- all but
the highest scoring conformer are eliminated.
During the past few years there have been advances toward solving the
protein folding problem. Detailed
insights of the actual folding process and the factors that determine native
structure provide the foundation for both protein design and prediction from
linear sequence. Hydrophobic
interactions between residues and polypeptide backbone stabilization are the
main driving forces that reduce free energy in the intermediate,
partially-folded structures on their way to forming the final native structure. Electrostatic forces do not tend to lower free energy but
rather have significant impact upon protein-ligand specificity.
The folding process differs between small and larger proteins.
Small proteins tend to proceed folding via multistate kinetics that
occurs in two discreet steps. Larger
proteins tend to form molten globules and then a relatively lengthy transition
state where there is extensive �searching� for the near-native state. Hydrogen bonding plays a major role in determining secondary
structures, but its role in determining the tertiary structure is unclear.
Brute force statistical analysis of
existing, known protein structures may be able to predict protein folding
accurately in the future, but it is important that we understand how proteins
fold. Comprehending the factors and
the processes involved in the formation of protein native structure will
eventually allow realistic, time-based simulations of folding.
In addition, understanding the fundamental basis behind folding will
enable, for example, the design of proteins with novel functions and the
creation of drugs capable of blocking or facilitating folding.
I believe that further investigation of the atoms and bonds of an amino
acid is necessary if exact native structures are to be consistently modeled from
primary sequence. By knowing
precisely the parameters that define the property of atoms of residues, it may
be possible to vectorize the different factors contributed by each atom.
Looking at folding from each atom�s perspective may the key.
Asking, for example, why particular psi and phi bonds of a residue want
to rotate in a particular direction a specific degree is important. Is it
because there is a large, sterically-hindering group nearby?
Or is it because a neighboring residue contains a similarly charged
group? Not only the psi and phi
bonds are important, but also the bonds between the atoms side-chains.
Only by keeping in mind that every atom and bond can be vital influences
(or vitally influenced) in the folding process can the perfect algorithm be
derived. Theories and algorithms
such as the hydrophobic zipper hypothesis and CI that approximate and make
assumptions about bond angles are likely dead-ends.
The perfect algorithm should make only the most fundamental assumptions.
If developed, it is likely that such an algorithm can be applied not just
to the folding of proteins, but in predicting all sorts of organic reactions as
well. It would probably indeed be
one of the greatest scientific advances ever.
References and Notes
Chakrabatty A, Baldwin R. L. Stability of alpha-helices. Adv Protein Chem 46, 141-176 (1995).
Chong L. T., Dempster S. E., Zachary, H.
S., Lee L., Tidor, B. Computation of electrostatic complements to proteins: A case of charge
stabilized binding, Protein Science 7,
Dahiyat, B. I. & Mayo, S. L. De novo protein design: Fully automated sequence selection. Science 278,
DeGrado, W. F. Enhanced: Proteins from
scratch. Science 278, 80-81
Dinner, A. R., Sali, A, & Karplus, M. The
folding mechanism of larger model proteins: Role of native structure. Proc.
Natl. Acad. Sci. USA 93, 8356-8361
Elofsson, A., Grand S. L., Eisenberg, D. Local
moves: an efficient algorithm for simulation of protein folding. http://rune.biokemi.su.se/~arne/papers/grail/gr_rev.pro.html#Heading2
Frieden, C. & Clark, A. C. Protein
folding: How the mechanism of GroEL action is defined by kinetics. Proc.
Natl. Acad. Sci. USA 94, 5535-5538
King, R. D. & Sternberg, M. J. E. Identification and the application of
the concepts important for accurate and reliable protein secondary structure
prediction, Protein Science 5, 2298-2310 (1996).
Lodish, H., Baltimore, D., Berk A., Zipursky, S. L., Matsudaira P., Darnell,
J. Molecular Cell Biology, Third Edition, 63-68 (1995).
Mehta, P. K., Heringa, J., Argos, P. A simple and fast approach to prediction of protein secondary structure
from multiply aligned sequences with accuracy above 70%, Protein Science 4,
Morrison, R. & Boyd, R. Organic Chemistry, Sixth Edition, 1211-1212
Munson, et al. What makes a protein a
protein, Protein Science 5, 1584-1593 (1996).
N�lting, et al. The
folding pathway of a protein
at high resolution from microseconds to seconds. Proc. Natl. Acad. Sci. USA 94,
Pennisi, E. Polymer folds just like a
protein, Science 277, 1764
Perutz, M.F. Mutations make enzyme
polymerize, Nature 385, 773-775
Scalley, M. L. & Baker, D. Protein
kinetics exhibit an Arrhenius temperature dependence when corrected for the
temperature dependence of protein
stability, Proc. Natl. Acad. Sci.
USA 94, 10636-10640 (1997)
Service, R. F. Folding proteins caught
in the act, Science 273, 29-30
Sisser, Adrian. Protein Structure
Manipulation and Folding through Solid Geometry, World Wide Web, http://www.chem.duke.edu/~ajs7/summer/paper/master.html
Sosnick, T.R., Shtilerman M. D.,
Mayne L., & Englander S. W. Ultrafast
signals in protein
and the polypeptide contracted state,
Proc. Natl. Acad. Sci. USA 94,
Srinivasan, R. Properties of amino acids, World Wide Web, http://cherubino.med.jhu.edu/~raj/Research/Linus/bkground03.html
Streyer, L. Biochemistry, Fourth Edition, 417-438 (1995).
Su, A., & Mayo, A. L.
Coupling backbone flexibility and amino acid sequence selection in protein
design, Protein Science 6,
Toma, L. & Toma, S. Contact interactions method: A new algorithm for protein folding,
Protein Science 5, 147-153 (1996).
Zhang, C. Consistency in structural
energetics of protein folding and peptide recognition, Protein Science 6,