Introduction to Protein Folding - The Process and Factors Involved

(7848 total words in this text)
(13057 Reads)   Printer-friendly page

Written in March, 1998 by David Yee+


One of the greatest challenges in science is predicting the three-dimensional structure of proteins from simply knowing their liner sequence. Currently successful prediction of protein tertiary structure mainly relies on a knowledge-based modeling of side chains in proteins with sequence homologous to one with known structure (Mehta, et al., 1995). But we still are waiting for the development of a highly accurate algorithm that only requires the knowledge of the amino acid sequence and its environment. Countless applications await the accomplishment of such a feat, since the function of a protein is conferred by its tertiary structure. Once the rules for folding are elucidated and refined, it is possible, for example, to synthesize artificial proteins that could accomplish any task efficiently and accurately- designer proteins that bind to specific DNA sequences to affect the transcription of target genes would be particularly useful.

Section I. Factors Involved in Protein Folding

The answer to the question why proteins fold is simply because they want to minimize their energy. It appears that numerous factors must work concertedly in order for proteins to fold and exist in a minimized energy state. The possible influences in the protein folding process include the hydrophobic effect, hydrogen-bonding, steric effects, and electrostatic effects produced by polar/charged residues. These effects can partly be attributed to the properties of the unique side chains of the amino acids. Table 1. summarizes some of the properties of the twenty amino acids. Through the analysis of mutation rates in protein sequences, amino acids have been described as belonging in four exchangeable groups (Srinivasan, 1996). This information is displayed in Table 2. Note that the amino acids containing large aliphatic chain form a group along with methionine and cysteine. Preservation of the same secondary structure may be a key reason why exchanges between amino acids belonging to the same group within a protein are less likely to significantly alter the function of the protein.

Table 2. Amino acids can be classified into four groups. In a particular protein, exchanges between the amino acids of a particular group are less likely to drastically alter protein function.

Group 1 Group 2 Group 3 Group 4
cysteine, methionine, leucine, isoleucine, valine phenylalanine, tyrosine, tryptophane aspartic acid, asparagine, alanine, glutamic acid, glutamine, proline, serine, threonine arginine, histidine, lysine

Most peptides first fold into secondary structures before the amino acid residues interact further to form tertiary configurations. Therefore it is worthwhile to briefly examine the factors involved in the formation of the secondary structures. There exist several types of secondary structures: the alpha helix, the beta sheet, and the turns (Lodish et al., 1995). The important facts in predicting secondary structures include the propensity of certain amino acids to form particular secondary structures, the inclination for specific residues to be found at certain positions within secondary structures, and that certain patterns of hydrophobicity correlate with particular secondary structures (King, 1996).

The limited range of backbone conformations that each different amino acid is able to adopt is an important reason why certain amino acids have propensities for particular secondary structures. Usually, each amino acid in a protein has three angles of rotations- phi, psi, and omega. Phi is the degree of rotation for the bond between the nitrogen and the alpha carbon. The degree of rotation termed psi is designated for the bond between the alpha carbon and the carbonyl carbon, and omega is the degree of rotation of the peptide bond. The peptide bond has considerable double-bond characteristics and is, under normal circumstances, fixed. Figure 1 demonstrates the three angles of rotation. Different amino acids have different psi and phi values, however, and these restrict the possible conformations that can be undertaken. For instance, the phi value for proline is fixed about �65�, and thus a proline residue has a much-reduced ranged of allowed conformations (Streyer, 1995). On the other end of the spectrum, glycine allows the polypeptide backbone to make turns that would not be possible with another residue.

Figure 1. (from
) Graphical representation and definition of psi, phi, and omega.

The ability of certain amino acids to form hydrogen bonds is also important for the formation of secondary structures. There is extensive hydrogen bonding in the alpha helix. Alanine, glutamine, and leucine residues favor alpha helix (Streyer, 1995). Serine, aspartate, and asparagine disrupt alpha helices because their side chains contain a hydrogen-bond donor and acceptor in close proximity to the backbone, where they compete for main-chain NH and CO groups (Streyer, 1995). Valine and isoleucine also disfavor the formation of the alpha-helix, but for a different reason- the branching at the beta carbon of their side chains sterically hinders alpha-helix formation. Proline disfavors alpha helix formation because of steric hindrance and that it lacks an amide H atom for hydrogen bonding (Streyer, 1995). The hydrogen side group in glycine allows too much flexibility around the alpha carbon, and thus glycine is also not frequently found in alpha helices (Lodish, et al., 1995).

Beta sheets are composed of beta strands. Each beta strand is about 5-8 residues long and the backbone atoms of each residue are able to hydrogen bond (Lodish, et al., 1995). Through association via hydrogen bonding, individual strands are able to form beta sheets, with the side chains of residues protruding perpendicular to the sheets.

Turns are compact, U-shaped structures composed of three or four residues stabilized by a hydrogen bond between their end residues (Lodish, et al., 1995). The turns are located on protein surfaces and form a sharp bend that redirects the polypeptide backbone back toward the hydrophobic interior (Lodish, et al., 1995). Glycine and proline tend to favor the formation of turns. Table 1 shows the relative frequencies of occurrence of particular residues in alpha helices, beta sheets, and beta turns.

It is important to note that particular short peptide sequences can form an alpha helix in one protein but adopt a beta strand conformation in another. These so-called �chameleon� sequences have been discovered in the immunoglobulin-binding domain of protein G, ribonuclease, and erythrocruorin (Perutz, 1997 & Streyer, 1995). These findings imply that the context is often critical in determining secondary structure. Tertiary interactions, the interactions between residues that are far apart in the linear sequence, may be decisive in specifying the secondary structure of some (but not all) segments (Streyer, 1995). This is a problem that needs to be addressed in predicting folding.

The structure of most proteins varies significantly when exposed to different pH. The change in structure can drastically alter the behavior of proteins. It is therefore worthwhile to examine the isoelectric point of the twenty common amino acids. An isoelectric point (pI) of an amino acid is the hydrogen ion concentration of the solution in which the amino acid does not migrate under the influence of an electric field (Morrison & Boyd, 1992). Table 1 and 1b contain the pI, calculated using the web-based ExPASy PI/MW tool, for the twenty amino acids and some short peptides, including eleven tumor antigens, activation segment of pepsinogen (which exists in the acidic condition of the stomach), and chymotrypsin 1 precursor (which exists in the alkaline environment of the small intestines).

It is noteworthy that an amino acid usually shows its lowest solubility in a solution at the isoelectric point (Morrison & Boyd, 1992). This can be significant because insoluble proteins are implicated in a variety of diseases, including Alzheimer�s (Perutz, 1997). A nonsynonymous point mutation of a single amino acid may contribute to alter the default isoelectric point of a protein and promotes its precipitation- although other consequences that occur when an amino acid substitution takes place may be more significant in protein aggregation.

It is not known in detail how pH affects the structure of proteins, however. Electrostatic interactions do not generally contribute to stability of the tertiary protein structure, but they are vital in proper ligand-recognition for many proteins that bind to other substrates. Calculations for a number of protein complexes show that the net effect of electrostatic interactions is generally to destabilize the docking of two pre-conformed molecules (Chong, et al., 1998). Although the folded state of a protein results in numerous favorable interactions within the protein, the large electrostatic desolvation penalty due to polar and charged groups often still lingers (Chong, et al., 1998). In addition, it has been observed that identically charged residues that are in close proximity to one another serve to destabilize the backbone.

It has widely been known that the protein folding process is highly sensitive to temperature. This relationship between folding and temperature appears to stem mainly from the effects of free energy upon hydrophobic interactions (Scalley and Baker, 1997). It has been recently the burial of hydrophobic residues involves a large change in their heat capacity and, thus, the associated free energy changes are strongly dependent on temperature. This phenomenon reflects a large decrease in solvent-accessible surface area in the transition state of proteins relative to their unfolded state (Scalley and Baker, 1997). In general, thermal energy is required for most proteins to fold, but excess heat often disrupts the native protein structure.

Somewhat surprisingly, the turn and loop connections between elements of secondary structure often do not contribute a major role in specifying the structure and properties of a protein (Munson, et al, 1996). Instead, the packing of residues in the hydrophobic core, or the desolvation of the nonpolar groups, is vital for the structure, stability, and properties of the final, folded protein (Munson, et al., 1996). The investigation of the protein Rop by Mary Munson et al. demonstrates the importance of hydrophobic packing. Rop is a homodimeric (two 63-residue helix-turn-helix monomers), RNA-binding protein that controls the replication of ColE1 plasmids (Munson, et al. 1996). The hydrophobic core of Rop consists of eight layers of heptad repeats of mainly alanine and leucine. In the study, amino acids were substituted in the hydrophobic core of to generate Rop mutants, whose properties were subsequently analyzed. Better repacking of the core via mutants that contained only alternating alanine and leucine residues created more stable proteins that also retained similar RNA-binding affinities of the wild-type. �Underpacked� mutants (e.g. only alanines in the core) were predominantly in the unfolded state, and lacked the ability to bind RNA. Interestingly, although overpacking with leucine residues created a protein that lacked RNA binding property, the mutant is extremely stable. This phenomenon was attributed to an increase in the burial of hydrophobic surface area, which compensated for the energetic cost of poor hydrophobic packing (Munson, et al, 1996).

Further support for hydrophobic packing as the main driving force for protein folding comes from the studies of other polymers that also fold in solution. The polymer phenylacetylene, studied by a team lead by organic chemist Jeffrey Moore of the University of Illinois, readily shapes into a helix, and forms a easily modifiable cavity (Pennisi, 1997). Phenylacetylene has no hydrogen-bonding capability because of the lack of oxygen or nitrogen in the compound, but polymers of the molecule still is able to fold. The phenyl rings in a phenylacetylene polymer must, however, do their best to avoid contact with the polar solvent, and they accomplish this by twisting the polymer so that it resembles a helix (Pennisi, 1997). But although the hydrophobic effect is the driving force of folding for phenylacetylene polymers, it should be kept it mind that some amino acids are charged, and the interactions they have with one another and with the uncharged amino acids can play a significant role to the protein folding pathway, like destabilizing the backbone as previously mentioned.

Figure 2. Phenylacetylene

Consequences of poor hydrophobic packing include: reduced stability, little or no thermal denaturation transitions (little or no enthalpic component to stability), low cooperativity (evident by chemical denaturation transitions), lack of well-defined tertiary structure (results in poor chemical shift dispersion of NMR spectra and rapid exchange rates of backbone amides with solvent). These are, as presented in Table 3, characteristics of molten structures. It can thus be inferred that the immediate events that occur after molten structure formation likely involve extensive hydrophobic interactions between the non-polar residues.

Table 3. Similarity and differences between molten globules and native structure of proteins.

Similar Property Between Folded Proteins and Molten Globules

high levels of secondary structure

Different Properties that Molten Globules Possess

packing of the hydrophobic core not well defined

lower stability

little or no enthalpic component to stability

low cooperativity

lack of well-defined tertiary structure

poor chemical shift dispersion of NMR spectra

rapid exchange rates of backbone amides with solvent

There are also findings that indicate backbone interactions are the major stabilizing force in the protein folding process. Backbone interactions have both electrostatic and solvation components (Zhang et al., 1997). Upon analyzing and estimating the backbone entropy of 17 single-domain proteins structures, Zhang et al. found that backbone interactions provided 90% (backbone-backbone interactions contribute 64% of the total energy, while backbone-side-chain interactions account for 26%) of the stability, with side-chain-side-chain interactions mainly providing specificity (Zhang, et al, 1997). Zhang�s team achieved this conclusion by using a new algorithm involving atomic contact energies to estimate the desolvation effects and side-chain entropy changes, which allowed for the determination of backbone entropy change from experimental data. Further evidence suggesting the importance of backbone interactions is the behavior of polyalanine peptides in solution (Chakrabatty & Baldwin, 1995). Researchers found that these entities tend to spontaneously form alpha-helices in solution. These results correlate with the findings previously mentioned that polar and charged residues mainly provide for electrostatic interactions with other macromolecules, and only contributes marginally to the stability of the folded protein, although they can easily serve to destabilize the native state. In addition to these findings, the search for residues that form a stable backbone is the first step employed by Stephan Mayo in his de novo protein design technique. Thus it can be concluded that backbone interactions are indeed very important in determining the final tertiary structure of proteins. The backbone is surprisingly flexible in the types of residues it can accommodate, however, and that concerted backbone movements adopted by a protein are able to accommodate potentially disruptive residues (Su & Mayo, 1997). Unfortunately, this flexibility complicates the tertiary prediction process.

II. The Protein Folding Process

The linear amino acid sequence folds into its native, thermodynamically and kinetically stable unique structure (the protein) that is stable at physiological temperature. This process is often completed in a timeframe ranging from milliseconds to minutes (Dinner, et al. 1996). Immediately after (or occurring simultaneously with) the formation of secondary structures is the condensation of the polypeptide into a molten globule. The molten globule is the partially folded intermediate of a protein. Though possessing most of the secondary structures of a folded protein, it lacks the well-packed tertiary structure of the native state (Streyer, 1995). Some of the differences between molten globules and folded proteins were outlined in Table 3.

Researchers have actually captured the act of protein folding and provided time course descriptions of the events that occur in the process. These experiments have revealed that proteins fold either by two-state kinetics in which no stable intermediates accumulate on the pathway or by multistate kinetics through one or more detectable intermediates (N�lting, 1997). Therefore, it is possible to detect the presence of a relatively stable intermediate if a protein folds via the multistate kinetics pathway. Usually small proteins are more likely to fold in the highly cooperative, multistate pathway (Scalley & Baker, 1997). N�lting et al. have characterized the multistate folding pathway of the protein barstar from a well-characterized denatured state at the level of individual residues on a microsecond time scale. Their studies have revealed support for the nucleation-condensation mechanism. The mechanism states that there is first the formation of a diffuse nucleus that consists of some neighboring residues whose conformations are stabilized by long-range interactions with residues that are distant in sequence. An essential component of the mechanism is that the nucleus and its stabilizing interactions elsewhere in the protein develop concurrently (N�lting, 1997). The alpha -helix is unstable in the absence of long-range interactions, and the rest of the structure is unstable without interactions with the helix, and thus there is cooperative formation of the nucleus and the surrounding structure (N�lting, 1997). Specifically, for barstar, it was found that a peptide corresponding to part of a nucleus is mainly random under folding conditions. This region in the denatured barstar has flickering native-like structure, showing that it is stabilized by long-range interactions, even in the denatured state (N�lting, 1997). In the first transition state, a nucleus centered on helix1 is almost completely formed in the transition state for the formation of the folding intermediate on the microsecond time scale. In the second transition state, many of the remaining residues in the protein make weak interactions in the early formed intermediate, which are then more highly consolidated because of those interactions.

Figure 3. (Figure 1 of The folding pathway of a protein at high resolution from microseconds to seconds by N�lting)

Structure of barstar. Barstar is a relatively small, 89-residue protein that has evolved to be the specific intracellular inhibitor of the ribonuclease barnase that is secreted from Bacillus amyloliquefaciens. Barstar has four helices. There are three strands of parallel sheet helix of barstar. Barstar is on the borderline of being a single- or two-module protein.

Figure 4. (Figure 5 of The folding pathway of a protein at high resolution from microseconds to seconds by N�lting). General nucleation-condensation model. CI-2 is chemotrypsin inhibitor-2, which was also studied by N�lting et al.

Dinner et al., like the N�lting team, also found that folding of small proteins occurs in two discreet stages. The first stage involves formation of a core that serves as a nucleus for folding to a near native (approximately 80%) structure, and the second stage involves rearrangement around the core to form the native structure. Core formation is facilitated by both the stabilities of its contacts and the presence of cooperative secondary structure with effective initiation sites (Dinner, et al., 1996).

By experimenting and observing the folding behavior of cytochrome c, Sosnick and his team of researchers also support the two stage folding process of small proteins, although they reveal additional details about the initial step. They contend that the core formation/condensation nucleation stage is rate- limiting and energetically uphill. This step includes the time-consuming conformational search for some relatively specific transition state. Once the transition state is �found,� the forward folding step is in an energetically downhill manner (Sosnick, 1997). Typically, the folding of small proteins lasts approximately 1 millisecond or more (Sosnick, 1997).

A similar, more pioneering study was conducted by Martin Gruebele and his team of researchers strongly supports the importance of local interactions early in the folding process. By rapidly �supercooling� and heating the protein apomyoglobin in solution, the scientists were able to detect the events that lead to the final native structure of the protein. Supercooling denatures the protein, while small increments of energy introduced by laser beam partially fold it. Apomyoglobin's final shape includes a trio of helixes (designated A, G, and H) on different parts of the molecule, with the H and G helixes spiraling parallel to one another and roughly perpendicular to the A helix (Service, 1996). The results of the experiment suggest that local interactions between neighboring residues induce the protein to very rapidly adopt some secondary structure (the coil in the A helix) before the more global interactions push it to undertake the 3D structure that moves the A helix next to the H (Service, 1996). This counters the views of some theorists, who contends that global structures take shape either before or simultaneously with local structures (Service, 1996). The existence of chameleon sequences, however, suggests that in certain instances global interactions are required in order to induce the formation of secondary structures. Therefore it is probable that both possibilities are correct, depending on the protein in question.

The research conducted by Dinner, Sali, and Karplus has provided one of the most in-depth views into the folding process of larger proteins. They engineered sequences with high secondary structure content in the native state to investigate the role of native structure in the folding process. It was found that folding begins with a rapid collapse followed by a slow search through the semi-compact globule for a sequence-dependent stable core with about 30% native contacts and serves as the transition state for folding to a near-native structure (Dinner, et al. 1996). This slow process is indicative of the magnitude of the number of conformations possible; the semi-compact random globule must locate the one single correct structure that will allow the formation of a native-like transition state conformation. The hydrophobic core formation via searching is dependent on the structural features of the native structure. Sequences that fold have largely stable, cooperative structure that is accessible through short-range initiation sites, such as those in anti-parallel sheets connected by turns. Contacts are �cooperative� if formation of any one contact increases the probability of formation of the others (Dinner, et al., 1996). Before folding is completed, the system can encounter a second bottleneck, involving condensation and rearrangement of surface residues. Overly stable local structure of the surface residues slows this stage of the folding process.

Figure 5. This is a simplified version of the protein folding process for medium to large sized proteins.

There is yet another important fact to keep in mind when considering the process of protein folding: the folding of most proteins in cells is assisted by enzymes (Streyer, 1995). These enzymes are the chaperonins and isomerases, and they are present in both bacteria and eukaryotes. The enzyme, protein disulfide isomerase, for example, catalyzes the formation and breaking of disulfide bonds for many proteins to find the optimal pairings (Streyer, 1995).

The protein GroEL is a widely studied chaperonin. A member of the Hsp60 class of proteins, it is composed of 14 identical subunits arranged as two stacked heptameric rings, and each ring contains a large cavity into which a substrate protein binds and subsequently folds (Frieden & Clark, 1997). At least 40 proteins, many of which are from eukaryotes, form stable complexes with GroEL (Frieden & Clark, 1997). This lack of specificity is due to the preferential binding of small unfolded structures to form the chaperonin-unfolded protein complex (Frieden & Clark, 1997). Surprisingly, Frieden and Clark have found that GroEL does not increase or catalyze the folding of its substrate proteins. Rather, the chaperonin increases the probability of the protein refolding correctly by decreasing the probability of aggregation that may occur in the bulk solution (Frieden & Clark, 1997). The ability of GroEL to fold proteins correctly without aggregation is a consequence of the stability of the complex formed between GroEL and the protein conformation just prior to the final folding step (Frieden & Clark, 1997). It is possible that this phenomenon also occurs in higher animals, and that mutations in chaperonin genes in humans may contribute to diseases involving protein aggregation.

Section III. Predicting Protein Folding

There are two major problems in predicting protein folding that need to be solved. The first problem is to determine an energy function that can discriminate, for a protein, between the set of native or native-like conformations and other conformations (Elofsson et al., 1996). The conformational space for a protein is enormous because there are at least 3 possible backbone conformations per residue, leading to 3100 possible backbone conformations for a 100 residue protein (Elofsson et al., 1996). The second problem is to develop an algorithm that can find the lowest energy structure in this conformational space (Elofsson et al., 1996). These two problems are closely coupled, as the energy function also needs to provide a guided path to the native structure (Elofsson et al., 1996). Despite the large number of possible backbone conformations possible, it is clear that many if not most of them can be ruled out due to the different steric and chemical constraints imposed by each unique residue. Several methods have been applied to search the conformational space of a protein for low energy conformations. These methods include molecular dynamics, Monte Carlo simulations, genetic algorithms and diffusive methods (Elofsson et al., 1996).

In modeling protein folding, it should also be kept in mind that proteins can either fold sequentially or combinatorially. If a protein folds sequentially, it means that nascent structures are not affected by structures that materialize later in the folding process. On the other hand, combinatorial folding indicates that secondary structures formed later in the folding process influence the already-formed secondary structures. In this model the conformation of the protein must be recalculated after the formation of a new secondary structure. For instance, in combinatorial folding, for a protein with n = 20 degrees of freedom, calculation at every 1 gives rise to the possibility of 360n, or greater than 1050 conformers (Sisser, 1997). In sequential folding, there are only 360 * 20, or 720 conformers (Sisser, 1997). Sequential folding models are mostly used because of the lack of massive computing power, although combinatorial folding may be more representative of the true folding process.

The critical aspect in predicting protein folding is the evaluation of the feasibility of the possible conformers that a particular sequence of amino acids may possibly undertake. All of the factors discussed in section I become the framework for this difficult task. Hydrophobic packing is critical in tertiary structure formation, and one relatively simple model for simulating the protein folding process, the hydrophobic zipper hypothesis, depends completely on hydrophobic interactions. The theory is derived from the concept of cooperativity (the probability that a peptide chain undertaking a particular conformation is increased if it had previously been in certain predecessor conformations) (Toma, 1996). In this model, the amino acids of the polypeptide chain are designated only as either H (hydrophobic) or P (nonhydrophobic). Each amino acid occupies a �lattice site�, connected to its neighbors and unable to occupy any other site filled by any other residue. At each site, the protein chain can either continue ahead or turn 90� up, down, left, or right. The energy of a nascent chain in this model is calculated by the summation of the favorable energy contributions of �1 units (Note that more negative energy contributions means greater stability) between two nonbonded hydrophobic-hydrophobic residues occupying neighboring nondiagonal lattice points (Toma, 1996). On the other hand, PP and PH contacts do not account for any energy contribution. Algorithms based upon the hydrophobic zipper hypothesis lead to a compact chain conformation that has a least one hydrophobic core.

A protein folding model, called the Contact Interaction Method (CI), utilizes the hydrophobic zipper hypothesis (HZ), though with modifications. In CI, there is heterogeneity in the mobility, or the ability of residues to adopt different conformations, of a sequence. Residues within a loop created by two HH contacts are less mobile than residues outside the loop (Toma, 1996). The algorithm that describes CI is as follows: (1) Start from an extended, linear amino acid sequence; (2) select a random residue to be moved, for example the ith; (3) use the criterion of mobility to decide if it has to be moved or not, and move it if Rnd < exp[f(i)/ ck), note that ck is temperature, Rnd is a random number between 0 and 1, and f(i) is the mobility of the ith residue, where the value is 0 if the residue has free movement, and a negative for restricted mobility; if the residue is not moved, then go to 2 (the criterion is always satisfied for residues not belonging to loops defined by HH contacts); (4) random choice of the movement, i.e., random choice of the value of theta(i) while taking as invariant all the other theta coordinates (this corresponds to a pivot move), note that theta specifies direction the residue will undertake (e.g. turn right, left, or go ahead in a 2-D lattice); (5) control of the validity of the structure deriving from the movement; if not, go to 2; (6) if the structure is valid, the new conformation is accepted and its energy evaluated; a time step is counted, and the function f(i) assumes the values deriving from the loops present in the conformation; go to 2. Figures 6 through 8 make the process easier to understand. The CI algorithm has proven to be very efficient both in two and three dimensions and allowed the localization of energy minima not localized by other conformational search algorithms described in the literature (Toma, 1996).

Figure 6 (Figure 2 of Contact Interactions Method by Toma). Three possible conformations of the HPPHPHP sequence on the 2D square lattice. Conformation A has no HH contact; conformation B has one HH contact; conformation C has two HH contacts. Black and white boxes represent H (hydrophobic) and P (polar) monomers, respectively.

Figure 7. (Figure 2 of Contact Interactions Method by Toma). Minimum energy (native) conformation of a sequence on the 2D square lattice. Ten HH contacts are present.

Figure 8. (Figure 1 of Contact Interactions Method by Toma). Heterogeneity of mobility specified by CI. All the residues in the loop from i to j have less mobility than the residues outside them

Figure 9. (Modified version of Figure 6 of Contact Interactions Method by Toma). The different structures, as determined by CI, of two identical sequences in a 3D lattice. The one on the right has 2 more HH contacts than the structure on the left.

There exist, however, amphiphilic residues such as lysine, arginine, and tyrosine (DeGrado, 1997). Their apolar atoms can cap the hydrophobic core, while their polar groups engage in electrostatic and hydrogen-bonding interactions (DeGrado, 1997). Therefore algorithms based upon the hydrophobic zipper hypothesis such as the CI can possibly be improved with the incorporation of A (amphiphilic) residues. A scoring function such as the one used by Dahiyat and Mayo takes in van der Waals potential to account for steric constraints and an atomic solvation potential to favor the burial and penalize the exposure of nonpolar surface area (Dahiyat & Mayo, 1997). This may an excellent way to evaluate the possible conformers- all but the highest scoring conformer are eliminated.


During the past few years there have been advances toward solving the protein folding problem. Detailed insights of the actual folding process and the factors that determine native structure provide the foundation for both protein design and prediction from linear sequence. Hydrophobic interactions between residues and polypeptide backbone stabilization are the main driving forces that reduce free energy in the intermediate, partially-folded structures on their way to forming the final native structure. Electrostatic forces do not tend to lower free energy but rather have significant impact upon protein-ligand specificity. The folding process differs between small and larger proteins. Small proteins tend to proceed folding via multistate kinetics that occurs in two discreet steps. Larger proteins tend to form molten globules and then a relatively lengthy transition state where there is extensive �searching� for the near-native state. Hydrogen bonding plays a major role in determining secondary structures, but its role in determining the tertiary structure is unclear.

Brute force statistical analysis of existing, known protein structures may be able to predict protein folding accurately in the future, but it is important that we understand how proteins fold. Comprehending the factors and the processes involved in the formation of protein native structure will eventually allow realistic, time-based simulations of folding. In addition, understanding the fundamental basis behind folding will enable, for example, the design of proteins with novel functions and the creation of drugs capable of blocking or facilitating folding.

I believe that further investigation of the atoms and bonds of an amino acid is necessary if exact native structures are to be consistently modeled from primary sequence. By knowing precisely the parameters that define the property of atoms of residues, it may be possible to vectorize the different factors contributed by each atom. Looking at folding from each atom�s perspective may the key. Asking, for example, why particular psi and phi bonds of a residue want to rotate in a particular direction a specific degree is important. Is it because there is a large, sterically-hindering group nearby? Or is it because a neighboring residue contains a similarly charged group? Not only the psi and phi bonds are important, but also the bonds between the atoms side-chains. Only by keeping in mind that every atom and bond can be vital influences (or vitally influenced) in the folding process can the perfect algorithm be derived. Theories and algorithms such as the hydrophobic zipper hypothesis and CI that approximate and make assumptions about bond angles are likely dead-ends. The perfect algorithm should make only the most fundamental assumptions. If developed, it is likely that such an algorithm can be applied not just to the folding of proteins, but in predicting all sorts of organic reactions as well. It would probably indeed be one of the greatest scientific advances ever.

References and Notes

Chakrabatty A, Baldwin R. L. Stability of alpha-helices. Adv Protein Chem 46, 141-176 (1995).

Chong L. T., Dempster S. E., Zachary, H. S., Lee L., Tidor, B. Computation of electrostatic complements to proteins: A case of charge stabilized binding, Protein Science 7, 206-210 (1998)

Dahiyat, B. I. & Mayo, S. L. De novo protein design: Fully automated sequence selection. Science 278, 82-86 (1997).

DeGrado, W. F. Enhanced: Proteins from scratch. Science 278, 80-81 (1997).

Dinner, A. R., Sali, A, & Karplus, M. The folding mechanism of larger model proteins: Role of native structure. Proc. Natl. Acad. Sci. USA 93, 8356-8361 (1996).

Elofsson, A., Grand S. L., Eisenberg, D. Local moves: an efficient algorithm for simulation of protein folding. (1996)

Frieden, C. & Clark, A. C. Protein folding: How the mechanism of GroEL action is defined by kinetics. Proc. Natl. Acad. Sci. USA 94, 5535-5538 (1997).

King, R. D. & Sternberg, M. J. E. Identification and the application of the concepts important for accurate and reliable protein secondary structure prediction, Protein Science 5, 2298-2310 (1996).

Lodish, H., Baltimore, D., Berk A., Zipursky, S. L., Matsudaira P., Darnell, J. Molecular Cell Biology, Third Edition, 63-68 (1995).

Mehta, P. K., Heringa, J., Argos, P. A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%, Protein Science 4, 2517-2525 (1995).

Morrison, R. & Boyd, R. Organic Chemistry, Sixth Edition, 1211-1212 (1992).

Munson, et al. What makes a protein a protein, Protein Science 5, 1584-1593 (1996).

N�lting, et al. The folding pathway of a protein at high resolution from microseconds to seconds. Proc. Natl. Acad. Sci. USA 94, 826-830 (1997).

Pennisi, E. Polymer folds just like a protein, Science 277, 1764 (1997).

Perutz, M.F. Mutations make enzyme polymerize, Nature 385, 773-775 (1997).

Scalley, M. L. & Baker, D. Protein folding kinetics exhibit an Arrhenius temperature dependence when corrected for the temperature dependence of protein stability, Proc. Natl. Acad. Sci. USA 94, 10636-10640 (1997)

Service, R. F. Folding proteins caught in the act, Science 273, 29-30 (1996).

Sisser, Adrian. Protein Structure Manipulation and Folding through Solid Geometry, World Wide Web, (1997).

Sosnick, T.R., Shtilerman M. D., Mayne L., & Englander S. W. Ultrafast signals in protein folding and the polypeptide contracted state, Proc. Natl. Acad. Sci. USA 94, 8545-8550 (1997).

Srinivasan, R. Properties of amino acids, World Wide Web, (1996).

Streyer, L. Biochemistry, Fourth Edition, 417-438 (1995).

Su, A., & Mayo, A. L. Coupling backbone flexibility and amino acid sequence selection in protein design, Protein Science 6, 1701-1707 (1997).

Toma, L. & Toma, S. Contact interactions method: A new algorithm for protein folding, Protein Science 5, 147-153 (1996).

Zhang, C. Consistency in structural energetics of protein folding and peptide recognition, Protein Science 6, 1057-1064 (1997)


[ Back to Contributed Papers / Essays | Sections index ]