Abstract
Intrinsically Disordered Proteins (IDPs) play crucial roles in numerous diseases like Alzheimer's and ALS by forming irreversible amyloid fibrils. The effectiveness of force fields (FFs) developed for globular proteins and their modified versions for IDPs varies depending on the specific protein. This study assesses 13 FFs, including AMBER and CHARMM, by simulating the R2 region of the FUS-LC domain (R2-FUS-LC region), an IDP implicated in ALS. Due to the flexibility of the region, we show that utilizing multiple measures, which evaluate the local and global conformations, and combining them together into a final score are important for a comprehensive evaluation of force fields. The results suggest c36m2021s3p with mTIP3p water model is the most balanced FF, capable of generating various conformations compatible with known ones. In addition, the mTIP3P water model is computationally more efficient than those of top-ranked AMBER FFs with four-site water models. The evaluation also reveals that AMBER FFs tend to generate more compact conformations compared to CHARMM FFs but also more non-native contacts. The top-ranking AMBER and CHARMM FFs can reproduce intra-peptide contacts but underperform for inter-peptide contacts, indicating there is room for improvement.
Similar content being viewed by others
Introduction
Intrinsically disordered proteins (IDPs) are proteins that can form different conformations depending on the environment and their binding partners1. Some IDPs can self-aggregate to form amyloid fibrils which take on the cross-β structure2. The cross-β structure consists of beta-strand proteins/peptides that are stacked along the length of the fiber forming long beta-sheets called protofibril. Finally, complexes of protofibrils form amyloid fibrils3.
Amyloid fibrils are associated with diseases4,5 such as Alzheimer’s, Parkinson’s, type II diabetes, Amyotrophic Lateral Sclerosis (ALS)4,6,7 and others. ALS is a rare neurodegenerative disease8,9 where in 50% of cases, death occurs within three years of the first clinical manifestation10. In ALS patients, amino acid mutations have been found in the Low-Complexity (LC) region of the Fused in Sarcoma (FUS) protein11,12,13,14,15,16. Irreversible amyloid fibril aggregation has been observed in mutated FUS-LC region, whereas reversible fibrils are observed in the wild-type11,16,17.
The human FUS protein (526 residues) is involved in mRNA splicing and transcription. The FUS-LC-core33–96 is involved in amyloid fibril formation11,17,18,19,20,21 and contains four repeat motifs (R1-R2-R3-R4)17 (Fig. S1, violet boxes). Within R1/R2, the tandem [S/G]Y[S/G] motifs have been implicated in the formation of Reversible Amyloid fibril Cores (RAC)17 (Fig. 1 and Fig. S1). The R2 region has been known to be more important for fibril formation than R122,23. The structures of the FUS-LC-core33–9616,24 (Fig. 1) show the R2 region has few long-distance contacts with the rest of the LC-core domain (Fig. S2). Therefore, the R2-FUS-LC50–65 region is a good candidate for studying amyloid fibrillation.
Our current understanding of the mechanism of amyloid fibril aggregation is poor. Furthermore, some of the transient intermediate structures of IDPs have been reported to be toxic26. One approach to study fibril formation and its intermediates is by performing all-atom Molecular Dynamics (MD) simulations. Amyloid β-peptides that form the amyloid fibrils in Alzheimer’s are commonly studied using MD simulations27,28,29.
However, experimental data and simulation results often show discrepencies27,30. One of the main reasons for this is that the all-atom force fields (FFs) such as AMBER, CHARMM31,32, OPLS-AA33,34 and GROMOS35,36 were developed to reproduce the properties of stable globular proteins. In contrast, IDPs adopt multiple unstable conformations where nonpolar residues are often exposed to the solvent. Multiple research groups have tuned FFs to better reproduce experimental data27,37,38,39,40,41,42 [Table S1 and Sup. Info. Section Benchmark of Force Fields (FFs) and Water Models (WMs)]. It is unclear if these modified FFs are generalizable across all IDPs. Note that throughout the text, when we refer to the FFs’, we are referring to the result or properties observed from performing simulations with the respective FF.
In this work, we apply widely used MD force fields for studying IDPs to the R2-FUS-LC (Fig. 1 and Fig. S1) to evaluate whether they can sample conformational ensembles of fibrils that are consistent with experimental data. We select ten FFs and water models (WMs) recently developed for IDPs. For comparison, we included c27s3p, a99sb4pew and a14sb3p FFs which were used to develop some of the IDP FFs. We score these thirteen FFs based on three criteria: the compactness of the fibrils, the intra-peptide contacts in the cross-β state and the secondary structure propensity. Surprisingly, most FFs fail to reproduce the experimental data. Our scoring method suggests that CHARMM36m2021 FF with the mTIP3P water model is the best for studying R2-FUS-LC fibrillation.
Results
We conducted six MD simulations, each lasting 500 ns, totaling 3 μs, with each of thirteen FFs (Table S1). To evaluate the FFs against the reversible amyloid fibril R2-FUS-LC region (trimer of 16 residue peptides), we employed three measures: radius of gyration (Rg), secondary structure propensity (SSP) and intra-peptide contact map. Rg measures the global compactness/extension of the trimer and individual peptides (Fig. S3). SSP and intra-peptide contact map both concern the local contact details of the R2-FUS-LC region. We ranked the thirteen FFs based on a combined score derived from these three measures (Table 1, Final Score).
Based on the final score (Table 1), the thirteen FFs can be separated into three distinct groups: top (“*”), middle (“•”) and bottom (“#”) ranking groups according to the order of scores.
FFs in the “top” group have medium (0.3–0.7) to high (> 0.7) scores for all measures. FFs in the “bottom” group like c27s3p and a03ws, have low scores (< 0.3) in all three measures. However, a14sb3p stands out with relatively good scores for SSP and contact map, but a low Rg score. On the other hand, c36m3pm has the best intra-peptide contact map score but poor SSP score. FFs in “middle” ranking group tend to have low scores for at least one of the three measures but have medium agreement for the remaining. Details of the three measures will be explained in the following sections.
Global compactness of R2-FUS-LC tripeptides
The Rg score measures the ability of the FFs to sample both compact and unfolded states of the R2-FUS-LC region. The reference data for the folded cross-β structure state comprises two distinct conformations of the R2-FUS-LC region: “U-shaped” conformation (PDB: 5W3N16) and “L-shaped” conformation (PDB: 7VQQ24), as shown in Fig. 1. Both conformations form a cross-β structure. Twenty U-shaped models with different loop conformations were solved by NMR with an average Rg of 10.0 Å (trimer of R2-FUS-LC) and the less compact (Rg: 14.4 Å) L-shaped conformation (trimer of R2-FUS-LC) was solved by cryo-EM. The reversibility of the cross-β structure11,17 suggests that the R2-FUS-LC region could adopt the unfolded state. To estimate the Rg of the unfolded state, we employed Flory's random coil polymer model with optimized parameters for IDPs43. We thus have three different measures of Rg: L-shaped, U-shaped, and Unfolded Rg’s.
The distributions of Rg for snapshots from the simulation data are presented in Fig. 2 and Fig. S4. In order to assess how frequently the FFs can generate conformations that closely resemble the reference structures in terms of Rg, we fitted the Rg distributions with two Gaussian distributions. The distance from the i’th mean to the k’th reference Rg was computed as the absolute number of standard deviations (\({\text{Z-score}}_{FF,k,i}\)) of the i’th Gaussian. The lowest \({\text{Z-score}}_{FF,k,i}\) was chosen, inverted and normalized by linearly scaling it from 0.00001 to 1.0 (Table 2). The final Rg score was calculated by multiplying the three normalized scores and rescaling them in the same manner (Table 2).
The thirteen FFs can be separated into three groups. Comparing the ranking of the final score (Table 1) and the final Rg score (Table 2), the top four FFs (c36m2021s3p, a99sb4pew, a19sbopc and c36ms3p) are the same although the order is slightly different.
Analyzing Table 2, we identified two distinct types of FFs among the top four. The first type includes c36m2021s3p and a19sbopc, both of which exhibited consistent performance across the three reference Rg’s. On the other hand, the second type consists of c36ms3p and a99sb4pew, which demonstrated a preference for flexible and compact conformations, respectively. We also observed bias towards the U-shaped Rg in a99sb4pew, a99sbCufix3p, and a14sb3p (Fig. S4), while c36m2021s3pm and c36m3pm (Fig. S4) favored the unfolded Rg. However, it is worth noting that the final Rg rank of these FFs is inversely correlated with the strength of their bias of sampled conformations. This is because our scoring scheme penalizes FFs that fit well to only one specific reference Rg but perform poorly for the others. As a result, the two worst-performing FFs are c36m3pm and a14sb3p (Fig. 2).
Overall, CHARMM FFs tend to generate more extended conformations than AMBER FFs (Figs. S3 and S4), except for a03ws.
Rg is a measure of the global compactness of a conformation, but it is not suitable for evaluating the details of the conformation. To assess the sampled conformations in more detail, we introduced two measures in the next sections: the intra-peptide contact map and the secondary structure propensity (SSP). These measures evaluate the intra-peptide interactions.
Intra-peptide contacts of the R2-FUS-LC region
The U-shaped and L-shaped conformations contain 20 and 15 intra-peptide contacts, respectively (Fig. 3). In the L-shaped conformation, there were no observed contacts between residues j and > j + 5 (medium-distance contacts) within a 5 Å cutoff. However, in the U-shaped conformation, medium-distance contacts are found between Tyr50–Tyr55, Tyr50–Thr64, Tyr50–Gly65, Tyr55–Asn63 and Ser57–Ser61. Therefore, we will only consider the U-shaped conformation for evaluating the FFs.
For each of the thirteen FF’s MD simulations, we calculated the average intra-peptide contacts for the trimers across all snapshots. The contacts from the FFs and the U-shaped conformation were compared using the Matthews Correlation coefficient (MCC)45 score (Table 3). The MCC score ranges from − 1 to 1, with a value of 1 indicating perfect correlation, − 1 indicating perfect anti-correlation, and 0 indicating no correlation. MCC scoring penalizes false predictions, which means that FFs that predicted the most native contacts (True Positives, TP), such as a99sb4pew, c27s3p, and a14sb3p (Table S2), may not necessarily be the top scoring FFs (Table 3). The process of normalization was carried out in a similar manner as described in the previous section.
In the contact map score ranking (Table 3), there are some surprising results. Despite being classified as a bottom performer based on the final score (Table 1), the c36m2021s3pm and c36m3pm FFs were ranked at the top in the contact map score. On the other hand, c36m2021s3p, which had the highest final score, was only ranked 4th. Additionally, a99sbCufix3p was placed at the bottom even though it had middle ranking in the final score.
Upon examining the confusion matrices for the four FFs (Table 4 and Table S2), we have a few observations. Firstly, c36m2021s3pm, c36m3pm, and c36m2021s3p, which have the highest Unfolded Rg scores, tend to favor more extended conformations. This is evident from the fact that < 20% of residues pairs are in contact. In contrast, a99sbCufix3p prefers more compact conformations, with 23.16% of residue pairs in contact.
It is worth noting that while CHARMM FFs with extended conformations have fewer overall contacts (Table 4, Total Pred. True), they also have significantly lower non-native contacts (Table 4, bolded). Conversely, a99sbCufix3p has more contacts overall, but a larger proportion of these contacts are non-native (Table 4). After excluding these four FFs, the contact map score rankings of the remaining FFs (Table 3) were found to be consistent with the final score rankings (Table 1).
The L-shaped conformation primarily consists of β-strands but does not form β-sheets within the peptide. As a result, its conformations cannot be determined from the intra-peptide contact maps. This motivates us to assess the Secondary Structure Propensity (SSP) of these FFs in the following section.
Secondary structure propensity (SSP) of R2-FUS-LC region
In each snapshot, Dictionary of Protein Secondary Structure (DSSP)46,47 is used to assign the secondary structure type (α-helix, β-strand, or coil) to each residue. For each residue, we define its SSP/probability for all secondary structure types by counting the number of occurrences in all snapshots and dividing it with the total number of snapshots. To evaluate the FFs’ SSP for the R2-FUS-LC region, we define the SSP score as the log likelihood of observing the experimental (U-shaped and L-shaped) secondary structures given the SSP probability distributions obtained from the simulation snapshots (details in Methods). Normalization was performed in a similar fashion as outlined previously.
In Table 5, we observe that the SSP score is higher for a14sb3p and a99sbCufix3p, but lower for c36ms3p and c36m2021s3pm compared to their final rank. The FFs with higher SSP scores tend to produce more compact conformations, as evidenced by their Rg scores (Table 2). Conversely, the FFs with lower rankings tend to favor more extended conformations. This suggests that the formation of native secondary structures is necessary for the development of compact fibrils.
The a03ws and c27s3p FFs, which have the lowest SSP scores, exhibit a strong tendency to generate conformations with α-helices around the RAC2 motif (Fig. 4). However, in both the L-shaped24 and U-shaped16 conformations, this region forms β-strands. NMR chemical shift data also confirms the absence of α-helices in the R2-FUS-LC region16,17,18,22,48.
Discussion
Our observation indicates that AMBER FFs, specifically a14sb3p and a99sbCufix3p, generate more compact conformations compared to the CHARMM FFs, c36ms3p and c36m2021s3pm, which tend to produce more extended conformations. FFs that have more extended conformations can better replicate the properties of both U- and L-shaped conformations when compared to FFs generating more compact conformations that fit well only to one of these conformations.
Both AMBER and CHARMM FFs do not properly reproduce the inter-peptide interactions required to form fibrils (Table S3), although AMBER FFs perform relatively better than CHARMM FFs. The preference of CHARMM FFs for extended conformations extends to their inter-peptide interactions: the peptides spend approximately half the time as dimers or monomers in contrast to AMBER FFs (except a03ws) which mainly stay as trimers (Table S4). The inter-peptide interaction scores were not included in the final score as all FFs perform poorly and would just add noise to the final ranking.
In our study, the limited sampling time (only 3 µs) may have contributed to the poor performance observed. Fibril formation typically occurs over a much longer timescale of hours to days16,24. To alleviate but not resolve this issue, we have increased the peptide concentration from 0.16 mM (as used in NMR structure determination16) to 10 mM by decreasing the ratio of the number of peptides to the number of water molecules (add more waters) and initiated simulations from the U-shaped conformation in fibril form. Higher concentration of FUS peptides increases the chance of these peptides interacting with one another and forming fibrils. Additionally, we observed that most FF simulations have nearly converged within 300 ns by monitoring the average Rg values over time (Fig. S6).
It is possible that the chosen fragment is not capable of forming fibrils independently. To investigate this possibility, we analyzed the intra-protein interactions of the full-length LC domain (residues 1–214). The contact maps show that there are limited long-distance contacts between the R2-FUS-LC regions and the rest of the protein in the U/L-shaped conformations (Fig. S2), indicating that the R2-FUS-LC region functions as a distinct domain within the larger LC domain, at least in the fibril.
Our study corroborates previous research conducted by Lao et al. who utilized the a99sbildn4pd with TIP3P water model instead of TIP4PD to simulate a longer region of FUS that included the R2-FUS-LC region23. Our findings show similar contact map patterns as their study, and we also observed ~ 5% propensity of α-helices, which is consistent with their work. However, we noted a discrepancy in the β-strand propensity between our study and theirs, with their study observing much higher β-strand propensity than we did (Fig. S5). This difference may be attributed to the length of the peptide used in each study, with their study using a peptide length of 60 residues compared to our study's length of 16 residues.
Our findings are also consistent with other studies on Amyloid-β proteins, where different FFs produced different conformational ensembles, some of which are compatible with fibril aggregation42,49,50. For example, Pedersen et al. found that a19sbopc produces more compact conformations and forms more β-strand than a99disp49, while Samantray et al. demonstrated c36ms3p gave promising results for sampling and giving conformational ensembles with random coil and β-strand structures50. Finally, our final score ranking is the same as the Amyloid-β peptide (Aβ40) results from Robustelli et al. showing that c36ms3p is the best, except for c22s3p which underperformed a99disp and a99sbildn4pd in our ranking27. Additionally, they also observed that c22s3p and a03ws overestimates the α-helix propensity (Fig. 4).
Additionally, our results are consistent with Piana et al., who tested three and four-site water models with AMBER and CHARMM FFs42. Where using the TIP3P (3-site) water model, CHARMM FFs were found to be more flexible than AMBER FFs, and when using 4-site water models, AMBER FFs became a lot more flexible, except for a99sb4pew (Fig. S4).
In this study, we investigated the effectiveness of force fields developed for globular proteins and their modified versions for intrinsically disordered proteins. We found that these force fields, with their adapted water models, produce distinct conformational ensembles. Our results highlight the importance of utilizing multiple measures for a comprehensive evaluation of force fields due to the intrinsic flexibility of this system which can form β-strand fibrils and adopt random coiled conformations.
Our evaluation of thirteen force fields from CHARMM and AMBER families revealed that c36m2021s3p is the most balanced in terms of the three measures used. This force field can generate various conformations that are compatible with U/L-shaped conformations, and it showed good agreement in terms of the SSP and intra-peptide contact map. Additionally, its mTIP3P water model is computationally less expensive than those of top-ranked AMBER FFs with four-site water models.
Methods
Preparation of initial conformations for molecular dynamics simulations
Multiple 3D-structures of FUS-LC domain have been solved by X-ray crystallography17,51, electron crystallography17,52, cryo-Electron Microscopy (cryo-EM)24, and NMR16. The details of the structures are listed in the Table S5.
From the 20 U-shaped NMR models (PDB ID: 5W3N16), the region of residues 50–65 (R2-FUS-LC region, Fig. 1, red square) of the first three chains (trimer) was used. The 20 structures were clustered into six groups using the GROMACS tool gmx cluster53 with a RMSD cutoff of 1.7 Å (Cα-only). The six representative structures were used to run six independent MD simulations.
Molecular dynamics simulations
Systems
All-atom MD simulations were performed using GROMACS 2020.454,55. The six initial conformations of R2-FUS-LC (663 atoms) were solvated in a cubic box of 80 × 80 × 80 Å3. To replicate the conditions of the NMR experiment22, 137 mM NaCl ions were added (~ 17,000 water molecules). The final systems contain 52,000–68,000 atoms.
Force fields
The 13 IDP force fields (FFs) and their corresponding water models (WMs) (details in Table S1 and Sup. Info. Section Benchmark of Force Fields (FFs) and Water Models (WMs)): a03ws, a14sb3p, a19sbopc, a99disp, a99sb4pew, a99sbCufix3p, a99sbildn4pd, c22s3p, c27s3p, c36m2021s3p, c36m2021s3pm, c36m3pm and c36ms3p.
Molecular dynamics simulations
Non-bonded cutoff is set to 12 Å. Electrostatic interactions were treated with the smooth particle mesh Ewald method56 for long range interactions and Coulomb for short range interactions. For CHARMM FFs, the Lennard–Jones potential is modified by GROMAC’s force-switching function54,55 between 8 and 12 Å.
The length of solute and water covalent bonds involving hydrogen atoms were kept constant using the LINCS57 and SETTLE58 algorithms, respectively, allowing integration of equations of motion with a 2 fs time step.
For each system, energy minimization followed by a constant pressure and temperature (NPT) 1 ns equilibration run were performed at 1 bar and 300 K. Temperature is controlled by the v-rescale thermostat and pressure by the Berendsen barostat59,60 with the time coupling constants τT = 0.1 ps and τP = 0.5 ps, respectively. A second equilibration run was performed with Nose–Hoover thermostat and Parrinello–Rahman barostat61,62,63 with the time coupling constants τT = 0.5 ps and τP = 2.5 ps for 1 ns.
For the production run, each of the six systems were simulated for 500 ns under the same conditions as in the second equilibration run but with a velocity random seed. Snapshots were saved every 20 ps. The first 100 ns of the production run were discarded, and the remaining 400 ns was used for further analysis.
The above protocol was applied to each of the 13 force fields, giving an accumulated trajectory of 3 μs.
Data analysis
Analysis of snapshots were performed with MDAnalysis64,65, MDtraj66, DSSP46,47 and our own scripts. All figures were plotted with Matplotlib44 (Python module).
Normalization of raw scores
All the raw scores were linearly rescaled such that they are between 0.0001 and 1 by:
where \({S}_{norm,\mathrm{FF}}\) and \({S}_{raw,\mathrm{FF}}\) are the normalized and raw scores for a specific force field (FF), respectively.
Radius of gyration
For each snapshot, the radius of gyration (Rg) of the heavy-atoms of the 16-residue trimer and the individual peptides were computed with MDAnalysis.
The Rg distribution of the trimer (Fig. 2 and Fig. S4) was fitted to two Gaussians using the gaussian mixture model.
To evaluate the FFs, the \({\text{Rg-Score}}_{FF,k}\) (raw score) compares the simulation results to the experimental \({Rg}_{exp,k}\) of k (U-shaped [averaged over 20 models]: 10 Å or L-shaped: 14.4 Å). It is computed as:
where \(\langle {Rg}_{FF,i}\rangle \) and \({SD}_{FF,i}\) are the average and standard deviation, respectively, of the i’th Gaussian.
The Rg of the individual peptides was compared to the predicted \(R{g}_{FL}\) (10.8 Å) from Flory’s polymer theory with parameters optimized for IDPs by Bernado and Blackledge43:
where \({R}_{0}=\) 2.54 Å is the persistence length, ν = 0.522 is the exponential scaling factor, N is the number of residues, \(R{g}_{FF,med}\) is the median Rg of the FF and q3 and q1 are the 75th and 25th percentile, respectively.
The raw scores from (2) and (3) were normalized. The final Rg score is the normalized product of the normalized U-shaped, L-shaped, and Unfolded Rg (Table 2).
Intra-peptide contact map analysis
To analyze the heavy atom contacts in each peptide, we developed custom code to generate contact maps. A contact was defined as two atoms being within a 5 Å distance, except for neighboring residues along the protein sequence. We counted intra-peptide contacts across all snapshots and filtered out contact frequencies < 1%. Average contact frequencies were then calculated across all three peptides.
Comparing the average contact map from each snapshot to the representative U-shaped conformation (Fig. 3), contacts were classified into one of four groups [Table 6; True(T)/False(F) Positive(P)/Negative(N)]. A contact (no contact) is considered positive (negative). The contacts from all snapshots of the six replicas were accumulated in the confusion matrix (Table S2).
The Matthews correlation coefficient (MCC)45 was used to measure the agreement of the contact maps:
Note that the MCC score is between − 1 and 1. An MCC score of 1 shows perfect correlation with the reference, 0 with no correlation and − 1 with perfectly negative correlation. The raw MCC score is then normalized to obtain the normalized intra-peptide Contact Map score (Table 3).
Inter-peptide contact map analysis
We classified the peptide's conformation in each snapshot as monomer, dimer, or trimer based on their contacts with other peptides (Table S4). The monomer was defined as having no contact with other peptides, while the dimer had exactly two peptides in contact. The trimer was characterized by each peptide being in contact with another peptide. Two peptides are considered in contact if they have at least one contact. We use the same contact definition as in the intra-peptide analysis.
To identify the middle peptide in trimers, we determined the peptide with the highest inter-peptide contacts in the fibril. We then compared the computed contact map with the experimental contact map of the middle peptide with one of the other two peptides.
For experimental structures, the contact maps of the middle peptide with either of the edge peptides were similar so we chose chain A-B for comparison with the computed contact maps (see Fig. S7). The same contact map was used for dimers.
All contacts between two edge peptides in the ensembles were considered false positives (FPs). We employed the same method as the previous section to classify contacts and compute Matthews Correlation Coefficient (MCC) scores (Table S3).
Secondary structure propensity
To assign secondary structures, we utilized the Dictionary of Protein Secondary Structure (DSSP)46,47. Each residue was classified as either α-helix (H), β-strand (E), or coil (C).
Each residue’s secondary structure propensity/probability (\({p}_{i,ss}\), SSP) is computed from the snapshots, combining the data from all three peptides from the same FF:
where: i is the residue position in peptide j at snapshot t from the simulation performed with the forcefield. ssi,j,t is 1 when assigned the respective secondary structure type ss by DSSP or 0 otherwise. ss represents the secondary structure type: α-helix (H), β-strand (E), or coil (C).
The reference secondary structures were obtained from the U-shaped (20 models) and L-shaped (single structure) structures, as shown in Table S6. The FF’s secondary structure score with respect to the experimental structure k (U/L-shaped) is:
where, i represents the residue position, and \({exptl}_{k,i}\) is the secondary structure of residue i in the experimental structure k.
For each FF, the SSP-Score is the log likelihood of observing the experimental secondary structures. We derived the probabilities from the observed propensities from the simulations, assuming that the positions are independent of one another.
The raw FF’s SSP-Scores for the U-shaped and L-shaped conformations were independently renormalized. We took the product of these two normalized SSP-scores and re-normalized them to get the final SSP score (Table 5).
Final scoring
Using the three normalized Rg score, SSP score and intra-peptide MCC score, we calculated the final score for each FF by multiplying the scores and re-normalizing the product (Table 1).
Data availability
Data available on reasonable request to HK.
References
Zhou, J., Oldfield, C. J., Yan, W., Shen, B. & Dunker, A. K. Intrinsically disordered domains: Sequence → disorder → function relationships. Protein Sci. 28(9), 1652–1663. https://doi.org/10.1002/pro.3680 (2019).
Tompa, P. Intrinsically unstructured proteins. Trends Biochem. Sci. 27(10), 527–533. https://doi.org/10.1016/S0968-0004(02)02169-2 (2002).
Taylor, A. I. P. & Staniforth, R. A. General principles underpinning amyloid structure. Front. Neurosci. 16, 8869 (2022).
Chiti, F. & Dobson, C. M. Protein misfolding, functional amyloid, and human disease. Annu. Rev. Biochem. 75(1), 333–366. https://doi.org/10.1146/annurev.biochem.75.101304.123901 (2006).
Uversky, V. N., Oldfield, C. J. & Dunker, A. K. Intrinsically disordered proteins in human diseases: Introducing the D2 concept. Annu. Rev. Biophys. 37(1), 215–246. https://doi.org/10.1146/annurev.biophys.37.032807.125924 (2008).
Burdick, D. et al. Assembly and aggregation properties of synthetic Alzheimer’s A4/beta amyloid peptide analogs. J. Biol. Chem. 267(1), 546–554. https://doi.org/10.1016/S0021-9258(18)48529-8 (1992).
Uversky, V. N. Intrinsic disorder in proteins associated with neurodegenerative diseases. Front. Biosci. 14, 5188–5238 (2009).
Xu, L. et al. Global variation in prevalence and incidence of amyotrophic lateral sclerosis: A systematic review and meta-analysis. J. Neurol. 267(4), 944–953. https://doi.org/10.1007/s00415-019-09652-y (2020).
Jones, C. M. & Coleman, S. Neurodegenerative diseases. In Palliative Care (eds Emanuel, L. L. & Librach, S. L.) 382–395 (W. B. Saunders, 2007). https://doi.org/10.1016/B978-141602597-9.10026-2.
Mitchell, J. & Borasio, G. Amyotrophic lateral sclerosis. Lancet 369(9578), 2031–2041. https://doi.org/10.1016/S0140-6736(07)60944-1 (2007).
Kato, M. et al. Cell-free formation of RNA granules: Low complexity sequence domains form dynamic fibers within hydrogels. Cell 149(4), 753–767. https://doi.org/10.1016/j.cell.2012.04.017 (2012).
Patel, A. et al. A liquid-to-solid phase transition of the ALS protein FUS accelerated by disease mutation. Cell 162(5), 1066–1077. https://doi.org/10.1016/j.cell.2015.07.047 (2015).
Scekic-Zahirovic, J. et al. Toxic gain of function from mutant FUS protein is crucial to trigger cell autonomous motor neuron loss. EMBO J. 35(10), 1077–1097. https://doi.org/10.15252/embj.201592559 (2016).
Kwiatkowski, T. J. et al. Mutations in the FUS/TLS gene on chromosome 16 cause familial amyotrophic lateral sclerosis. Science 323(5918), 1205–1208. https://doi.org/10.1126/science.1166066 (2009).
Vance, C. et al. Mutations in FUS, an RNA processing protein, cause familial amyotrophic lateral sclerosis type 6. Science 323(5918), 1208–1211. https://doi.org/10.1126/science.1165942 (2009).
Murray, D. T. et al. Structure of FUS protein fibrils and its relevance to self-assembly and phase separation of low-complexity domains. Cell 171(3), 615-627.e16. https://doi.org/10.1016/j.cell.2017.08.048 (2017).
Luo, F. et al. Atomic structures of FUS LC domain segments reveal bases for reversible amyloid fibril formation. Nat. Struct. Mol. Biol. 25(4), 341–346. https://doi.org/10.1038/s41594-018-0050-8 (2018).
Burke, K. A., Janke, A. M., Rhine, C. L. & Fawzi, N. L. Residue-by-residue view of in vitro FUS granules that bind the C-terminal domain of RNA polymerase II. Mol. Cell 60(2), 231–241. https://doi.org/10.1016/j.molcel.2015.09.006 (2015).
Wootton, J. C. & Federhen, S. Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17(2), 149–163. https://doi.org/10.1016/0097-8485(93)85006-X (1993).
Golding, G. B. Simple sequence is abundant in eukaryotic proteins. Protein Sci. 8(6), 1358–1361. https://doi.org/10.1110/ps.8.6.1358 (1999).
Haerty, W. & Golding, G. B. Low-complexity sequences and single amino acid repeats: Not just “junk” peptide sequences. Genome 53(10), 753–762. https://doi.org/10.1139/g10-063 (2010).
Ding, X. et al. Amyloid-forming segment induces aggregation of FUS-LC domain from phase separation modulated by site-specific phosphorylation. J. Mol. Biol. 432(2), 467–483. https://doi.org/10.1016/j.jmb.2019.11.017 (2020).
Lao, Z. et al. Insights into the atomistic mechanisms of phosphorylation in disrupting liquid-liquid phase separation and aggregation of the FUS low-complexity domain. J. Chem. Inf. Model https://doi.org/10.1021/acs.jcim.2c00414 (2022).
Sun, Y. et al. Molecular structure of an amyloid fibril formed by FUS low-complexity domain. Science 25(1), 103701. https://doi.org/10.1016/j.isci.2021.103701 (2022).
Humphrey, W., Dalke, A. & Schulten, K. VMD: Visual molecular dynamics. J. Mol. Graph. 14(1), 33–38. https://doi.org/10.1016/0263-7855(96)00018-5 (1996).
Chen, X.-Q. & Mobley, W. C. Alzheimer disease pathogenesis: Insights from molecular and cellular biology studies of oligomeric Aβ and tau species. Front. Neurosci. 13, 659 (2019).
Robustelli, P., Piana, S. & Shaw, D. E. Developing a molecular dynamics force field for both folded and disordered protein states. PNAS 2018, 00690. https://doi.org/10.1073/pnas.1800690115 (2018).
Man, V. H. et al. Effects of all-atom molecular mechanics force fields on amyloid peptide assembly: The case of Aβ16–22 dimer. J. Chem. Theory Comput. 15(2), 1440–1452. https://doi.org/10.1021/acs.jctc.8b01107 (2019).
Carballo-Pacheco, M., Ismail, A. E. & Strodel, B. On the applicability of force fields to study the aggregation of amyloidogenic peptides using molecular dynamics simulations. J. Chem. Theory Comput. 14(11), 6063–6075. https://doi.org/10.1021/acs.jctc.8b00579 (2018).
Rahman, M. U., Rehman, A. U., Liu, H. & Chen, H.-F. Comparison and evaluation of force fields for intrinsically disordered proteins. J. Chem. Inf. Model. 60(10), 4912–4923. https://doi.org/10.1021/acs.jcim.0c00762 (2020).
MacKerell, A. D. et al. CHARMM: The energy function and its parameterization. in Encyclopedia of Computational Chemistry (American Cancer Society, 2002). https://doi.org/10.1002/0470845015.cfa007.
Brooks, B. R. et al. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 4(2), 187–217. https://doi.org/10.1002/jcc.540040211 (1983).
Jorgensen, W. L. & Tirado-Rives, J. The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. J. Am. Chem. Soc. 110(6), 1657–1666. https://doi.org/10.1021/ja00214a001 (1988).
Kaminski, G. A., Friesner, R. A., Tirado-Rives, J. & Jorgensen, W. L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B 105(28), 6474–6487. https://doi.org/10.1021/jp003919d (2001).
Gunsteren, W. F. & Berendsen, H. J. C. Biomolecular Simulation: The GROMOS Software (Biomos, 1987).
van Gunsteren, W. F. et al. Biomolecular Simulation: The GROMOS96 Manual and User Guide (Biomos, 1996).
Chan-Yao-Chong, M., Durand, D. & Ha-Duong, T. Molecular dynamics simulations combined with nuclear magnetic resonance and/or small-angle X-ray scattering data for characterizing intrinsically disordered protein conformational ensembles. J. Chem. Inf. Model. 59(5), 1743–1758. https://doi.org/10.1021/acs.jcim.8b00928 (2019).
Mu, J., Liu, H., Zhang, J., Luo, R. & Chen, H.-F. Recent force field strategies for intrinsically disordered proteins. J. Chem. Inf. Model. 61(3), 1037–1047. https://doi.org/10.1021/acs.jcim.0c01175 (2021).
Best, R. B., Zheng, W. & Mittal, J. Balanced protein–water interactions improve properties of disordered proteins and non-specific protein association. J. Chem. Theory Comput. 10(11), 5113–5124. https://doi.org/10.1021/ct500569b (2014).
Huang, J. et al. CHARMM36m: An improved force field for folded and intrinsically disordered proteins. Nat. Methods 14(1), 71–73. https://doi.org/10.1038/nmeth.4067 (2017).
Tian, C. et al. Ff19SB: Amino-acid-specific protein backbone parameters trained against quantum mechanics energy surfaces in solution. J. Chem. Theory Comput. 16(1), 528–552. https://doi.org/10.1021/acs.jctc.9b00591 (2020).
Piana, S., Donchev, A. G., Robustelli, P. & Shaw, D. E. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. J. Phys. Chem. B 119(16), 5113–5123. https://doi.org/10.1021/jp508971m (2015).
Bernadó, P. & Blackledge, M. A self-consistent description of the conformational behavior of chemically denatured proteins from NMR and small angle scattering. Biophys. J. 97(10), 2839–2845. https://doi.org/10.1016/j.bpj.2009.08.044 (2009).
Hunter, J. D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55 (2007).
Matthews, B. W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta Protein Struct. 405(2), 442–451. https://doi.org/10.1016/0005-2795(75)90109-9 (1975).
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637. https://doi.org/10.1002/bip.360221211 (1983).
Touw, W. G. et al. A series of PDB-related databanks for everyday needs. Nucleic Acids Res. 43(D1), D364–D368. https://doi.org/10.1093/nar/gku1028 (2015).
Schwartz, J. C., Wang, X., Podell, E. R. & Cech, T. R. RNA seeds higher-order assembly of FUS protein. Cell Rep. 5(4), 918–925. https://doi.org/10.1016/j.celrep.2013.11.017 (2013).
Pedersen, K. B., Flores-Canales, J. C. & Schiøtt, B. Predicting molecular properties of α-synuclein using force fields for intrinsically disordered proteins. Proteins https://doi.org/10.1002/prot.26409 (2022).
Samantray, S., Yin, F., Kav, B. & Strodel, B. Different force fields give rise to different amyloid aggregation pathways in molecular dynamics simulations. J. Chem. Inf. Model. 60(12), 6462–6475. https://doi.org/10.1021/acs.jcim.0c01063 (2020).
Hughes, M. P. et al. Atomic structures of low-complexity protein segments reveal kinked β sheets that assemble networks. Science 359(6376), 698–701. https://doi.org/10.1126/science.aan6398 (2018).
Zhou, H. et al. Programming conventional electron microscopes for solving ultrahigh-resolution structures of small and macro-molecules. Anal. Chem. 91(17), 10996–11003. https://doi.org/10.1021/acs.analchem.9b01162 (2019).
Daura, X. et al. Peptide folding: When simulation meets experiment. Angew. Chem. Int. Ed. 38(1–2), 236–240. https://doi.org/10.1002/(SICI)1521-3773(19990115)38:1/2%3c236::AID-ANIE236%3e3.0.CO;2-M (1999).
Abraham, M. J. et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2, 19–25. https://doi.org/10.1016/j.softx.2015.06.001 (2015).
Lindahl, E., Abraham, M. J., Hess, B. & van der Spoel, D. GROMACS 2020.4 Manual. (2020). https://doi.org/10.5281/zenodo.4054996.
Essmann, U. et al. A smooth particle Mesh Ewald method. J. Chem. Phys. 103(19), 8577–8593. https://doi.org/10.1063/1.470117 (1995).
Hess, B. P-LINCS: A parallel linear constraint solver for molecular simulation. J. Chem. Theory Comput. 4(1), 116–122. https://doi.org/10.1021/ct700200b (2008).
Miyamoto, S. & Kollman, P. A. Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models. J. Comput. Chem. 13(8), 952–962. https://doi.org/10.1002/jcc.540130805 (1992).
Berendsen, H. J. C. et al. Interaction Models for Water in Relation to Protein Hydration (Springer, 1981). https://doi.org/10.1007/978-94-015-7658-1_21.
Berendsen, H. J. C., van der Spoel, D. & van Drunen, R. GROMACS: A message-passing parallel molecular dynamics implementation. Comput. Phys. Commun. 91(1), 43–56. https://doi.org/10.1016/0010-4655(95)00042-E (1995).
Nosé, S. A unified formulation of the constant temperature molecular dynamics methods. J. Chem. Phys. 81(1), 511–519. https://doi.org/10.1063/1.447334 (1984).
Hoover, W. G. Canonical dynamics: equilibrium phase-space distributions. Phys. Rev. A 31(3), 1695–1697. https://doi.org/10.1103/PhysRevA.31.1695 (1985).
Parrinello, M. & Rahman, A. Polymorphic transitions in single crystals: A new molecular dynamics method. J. Appl. Phys. 52(12), 7182–7190. https://doi.org/10.1063/1.328693 (1981).
Gowers, R. et al. MDAnalysis: A Python package for the rapid analysis of molecular dynamics simulations. In Proceedings of the 15th Python in Science Conference (eds Benthall, S. & Rostrup, S.) 98–105 (Springer, 2016). https://doi.org/10.25080/Majora-629e541a-00e.
Michaud-Agrawal, N., Denning, E. J., Woolf, T. B. & Beckstein, O. MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. J. Comput. Chem. 32(10), 2319–2327. https://doi.org/10.1002/jcc.21787 (2011).
McGibbon, R. T. et al. MDTraj: A modern open library for the analysis of molecular dynamics trajectories. Biophys. J. 109(8), 1528–1532. https://doi.org/10.1016/j.bpj.2015.08.015 (2015).
Acknowledgements
MD simulations were performed using JAEA resources from QST institute and local resources. This work was supported by the JSPS fellowship 2020 (FY2020 JSPS Postdoctoral Fellowship for Research in Japan (Short-term) to MC). This work was also supported in part by KAKENHI (JP18H05534), MEXT as “Program for Promoting Researches on the Supercomputer Fugaku” (Biomolecular dynamics in a living cell, JPMXP1020200101, hp220177) and Japan Agency for Medical Research and Development (AMED) (23ama121024j0002).
Author information
Authors and Affiliations
Contributions
M.C., J.C. and H.K. planed the research project. M.C. carried out all the simulations. M.C. and J.C. analyzed the data. M.C., J.C. and H.K. wrote and reviewed the manuscript text. The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. These authors contributed equally.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chan-Yao-Chong, M., Chan, J. & Kono, H. Benchmarking of force fields to characterize the intrinsically disordered R2-FUS-LC region. Sci Rep 13, 14226 (2023). https://doi.org/10.1038/s41598-023-40801-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-40801-6
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.