Cooperativity of Hydrogen Bonding in Amyloids

Last year,  I wrote a post on polyglutamine (polyQ) aggregation. **

** PolyQ tract is a series of consecutive glutamine residues present in a protein. It is associated with trinucleotide repeat disorder (a genetic disorder), such as Huntington’s disease (HD). In the case of  HD, people with more than 36 repeats of Q/glutamine, have a mutant form of the Huntingtin protein (mHt). We don’t fully understand the nature or behavior of this mutant form, however, mHt is known to form protein aggregates, rather than folding into functional forms. These aggregates accumulate with time and eventually interfere with cell function and intercellular communication.

Here is the post: What makes polyglutamine aggregates so toxic?

Quick summary: polyglutamine stretches aggregate due to (a) water being a poor solvent for polyQ and (b) the ability of both the backbone and side chains to form hydrogen bonds, thus contributing energetically to the stability of the aggregate.

Writing that post made me think about the relevance of hydrogen bonding in biological systems. So, this post is about the influence of H-bonding on the energetics of protein folding.

Role of Hydrogen Bonds (H-Bonds):
Hydrogen bonding is one of the most crucial inter and intra molecular interaction in biological systems. Whether it is interaction with solvent or protein folding, energetics is largely driven by hydrogen bonds. The directional nature of these interactions also gives rise to a multitude of spectroscopic properties. This paper published in 2000, explored the cooperative nature of hydrogen bonds in peptide systems, thus suggesting that the strength of H-bond should increase (asymptotically) with the extent of H-bond network. This does seems a little intuitive, especially if you are considering the folding of a helix, where the barrier is usually the formation of the first few turns.

Role of H-bonds in amyloids:

  • Formation of protein fibrils (stacks of beta sheets) from monomers is a characteristic of a big class of diseases known as protein aggregation diseases [more on protein aggregation  diseases].  Structures of some amyloid fibers have been identified [1], and the process of amyloid formation has been extensively studies [2-4].
  • One of the more popular theory is the formation of a nucleus seed for amyloid fibril formation.

Figure 1: cartoon representation of the aggregation process from [5]


  • There have been studies in the last few years, suggesting that the typical nucleus size is 3 to 4 peptides. [6]
  • As this is the minimal size that would not dissociate quickly due to slower diffusion.
  • One of the more important finding is the role of H-bonds and its cooperativity in this nucleus size of 3-4 peptides.
  • A 2006 study, explored this cooperative H-bond effect in a prion protein, using classical electrostatics and quantum DFT calculations.[7]
  • They were able to show that the strength or contribution of H-bonds between peptides increases nonlinearly up to 4 peptides, and then levels off. Thus suggesting the cooperative nature of H-bonds within β sheets of a fibril. 


Figure 2: (a) one layer of two peptides, (b) 3 peptides stacked one below the other, (c) energy per monomer in a fibril (d) binding energy of a layer to a preexisting fibril. [7]


  • From figure 2d, you can clearly see the leveling off of energy beyond a fiber length of four monomers.
  • This effect has also been validated in a polyQ system, where the cooperative effect is shown to have an effect of the geometry of the aggregate [8]

To summarize:

  • Hydrogen bonding, a partially covalent interaction plays a very significant role in determining energetics of protein folding and aggregation.
  • The directional nature of the interaction makes modeling the hydrogen bonding energy landscape computationally challenging.
  • Empirical molecular mechanics (MM) force fields have much less accuracy and QM electronic structure calculations cannot be adopted to biological systems.
  • With the advent of polarizable MM forcefields, there might be hope for  more consistency with both electronic structure calculations and other experimental data.


[1] Structures for amyloid fibrils.
[2]On the nucleation and growth of amyloid beta-protein fibrils: detec…
[3] Simulations as analytical tools to understand protein aggregation a…
[4]  Interpreting the aggregation kinetics of amyloid peptides.
[5]Page on
[6]Page on
[7]Page on
[8]Hydrogen Bonding Cooperativity in polyQ β-Sheets from First Principle Calculations


Why We Need To Study PTMs

I heard Philip Selenko give a great talk a couple of years back  [1]  and have since been (very slowly) getting acquainted with the literature on post-translational modifications (PTMs) in proteins.  Last year I wrote about the role of PTMs in protein folding and how we are now starting to look at disordered regions in proteins differently [ Sai Janani Ganesan’s answer to How do post-translational modifications affect protein folding? ]. PTMs also occur in highly structured regions of the protein. There are over 400 different types of PTMs  [2] , each with the potential to drastically change the conformational space of the protein and hence its function. The high diversity of  PTMs and their reversible nature make them a crucial part of understanding protein function, signaling pathways, allostery, binding and even protein energy landscapes.

Among the hundreds of different types of PTMs, phosphorylation (On serine, threonine and tyrosine) is one of the most well studied (mass spectrometry(MS)-based proteomics is now a pretty large field  [3] ).  Although a complete list of phosphosites is not yet available, the central question remains how to link the known PTM sites to conformational changes and therefore function.  Conservation of PTM is one  way to identify functionally relevant sites (this is not to say all functionally relevant sites are conserved) and hence understand protein regulation and their role in protein interaction network.  For example,  kinases have preferences for certain specific residues near the target phosphorylated site, and identifying the conservation of such sequences can be used to predict regulated sites  [4] .

A recent article from the Sali lab [5] touches on some aspects of correlating (conserved) PTMs with function. The study (s) uses  MS to identify phosphorylation sites in Xenopus laevis, (b) compares the obtained data with information from 13 other species to identify conserved sites, (c) uses predictive analysis to estimate conserved kinase-protein interactions for a set of cell-cycle kinases across species, and correlate degree of conservation with known kinase-protein regulatory interactions  [6] . They also model phosphosites to gain structural insights.
Some of their findings are very cool, and seem almost intuitive as you read along:
  • Only 39.8% of phosphosites were found to be conserved in one or more species. (The data on identified phosphosites is largely incomplete and hence must be kept in mind while interpreting all data related PTM studies. )
  • The fraction of sites with known function increased with the level of conservation across species, thus suggesting that conserved sites are more likely to have function.
  • For example, a phosphosite in the activation loop of GSK3B is one of the more conserved sites across species. Similarly, the conserved site in NDP Kinase A is located near the active site.
Figure 1: Example comparative models with highly conserved phosphorylation sites. The phosphorylation site is highlighted in red. For the NDP kinase A, the structure represents the homo-oligomeric complex. One of the subunits is indicated in blue, with the phosphosite position in red and the substrate in the ball-and-stick representation. [from footnote 6]
  • About 20% of the phosphosites identified appeared to be less solvent exposed, although intuitively, adding a phosphate should make the protein more exposed. The authors suggest that conformational flexibility might play a role, as it is well known that PTMs can change function by altering the conformational space of the protein. If that is the case, then we can use structural information to identify PTMs that can regulate protein conformation. However, if conformational flexibility is playing a role, these regions could also be poorly modeled. As an MD-person, I really think it is a good idea to integrate these structural results with multiple MD studies to get a more complete understanding.
Figure 2: An explanation for the sites that appear to be less solvent exposed. [from footnote 6]

We still need a whole lot of experimental data before we draw any major conclusions. Considering the fact that PTMs play such a major role in our understanding of molecular processes, I think more work should be done on correlating identified PTM sites (obtained under distinct conditions) to conformational changes and function.

[1] Welcome to the Selenko Lab

[2] The Universal Protein Resource (UniProt) in 2010

[3] Status of Large-scale Analysis of Post-translational Modifications by Mass Spectrometry

[4] Deciphering a global network of functionally associated post-translational modifications

[5] Andrej Sali Lab

[6] Prediction of Functionally Important Phospho-Regulatory Events in  Xenopus laevis  Oocytes