3pHLA-score improves structure-based peptide-HLA binding affinity prediction

Conev, Anja; Devaurs, Didier; Rigo, Mauricio Menegatti; Antunes, Dinler Amaral; Kavraki, Lydia E.

doi:10.1038/s41598-022-14526-x

Download PDF

Article
Open access
Published: 24 June 2022

3pHLA-score improves structure-based peptide-HLA binding affinity prediction

Anja Conev¹,
Didier Devaurs²,
Mauricio Menegatti Rigo¹,
Dinler Amaral Antunes³ &
…
Lydia E. Kavraki¹

Scientific Reports volume 12, Article number: 10749 (2022) Cite this article

3722 Accesses
5 Citations
62 Altmetric
Metrics details

Subjects

Abstract

Binding of peptides to Human Leukocyte Antigen (HLA) receptors is a prerequisite for triggering immune response. Estimating peptide-HLA (pHLA) binding is crucial for peptide vaccine target identification and epitope discovery pipelines. Computational methods for binding affinity prediction can accelerate these pipelines. Currently, most of those computational methods rely exclusively on sequence-based data, which leads to inherent limitations. Recent studies have shown that structure-based data can address some of these limitations. In this work we propose a novel machine learning (ML) structure-based protocol to predict binding affinity of peptides to HLA receptors. For that, we engineer the input features for ML models by decoupling energy contributions at different residue positions in peptides, which leads to our novel per-peptide-position protocol. Using Rosetta’s ref2015 scoring function as a baseline we use this protocol to develop 3pHLA-score. Our per-peptide-position protocol outperforms the standard training protocol and leads to an increase from 0.82 to 0.99 of the area under the precision-recall curve. 3pHLA-score outperforms widely used scoring functions (AutoDock4, Vina, Dope, Vinardo, FoldX, GradDock) in a structural virtual screening task. Overall, this work brings structure-based methods one step closer to epitope discovery pipelines and could help advance the development of cancer and viral vaccines.

Accurate structure prediction of biomolecular interactions with AlphaFold 3

Article 08 May 2024

Cancer therapy with antibodies

Article 13 May 2024

Structure prediction of protein-ligand complexes from sequence information with Umol

Article Open access 28 May 2024

Introduction

Human Leukocite Antigen (HLA) class I molecules are an important part of human cellular immune response^1,2. HLAs are involved in the intracellular antigen presentation pathway; they are responsible for the transport and display of peptide antigens for T-cell scrutiny^3,4. Therefore, the possibility of exploiting the HLA role in this pathway to engineer immune responses has shown great promise⁵, as highlighted by efforts on personalized peptide vaccine development⁶. When designing peptide vaccines, a pool of potential peptide targets is identified from a protein of interest. Targets are then filtered to identify those most likely to induce an immune response. This whole process is referred to as epitope discovery⁷. Discovered immunogenic epitopes are able to bind HLA receptors, create stable peptide-HLA (pHLA) complexes (Fig. S1) and induce an immunological response⁸. Unfortunately, epitope discovery is made challenging by the high diversity of HLA molecules. This diversity is a reflection of the high number of HLA alleles: more than 24,000 HLA-I alleles have been identified to date⁹. Each allele codes for a specific HLA receptor (e.g., HLA-A0201, HLA-B0702) with different peptide binding preferences. Fast and accurate computational evaluation of pHLA binding can speed up the search for epitopes and is an important part of epitope discovery pipelines.

So far computational pHLA binding affinity prediction efforts have been largely dominated by sequence-based approaches^{10,11,12,13,14,15,58}. While these methods provide good accuracy and are a part of many existing pipelines, they have some inherent drawbacks¹⁶. For instance, they rely on a predefined amino acid alphabet to represent the pHLA. Most existing tools have canonical amino acids in their alphabet^10,11,12 and are thus unable to process phosphorylated peptides, although these peptides can be displayed by HLAs¹⁷. While recent efforts¹⁸ expand the alphabet to include phosphorylation, the problem of the predefined alphabet persists. The presence of other post-translational modifications or small molecules within the binding site cannot be taken into account by such approaches. In addition, sequence-based predictors are highly dependent on the quality and composition of the training set^19,20. This represents an important limitation because of the aforementioned high diversity of HLA alleles²¹. All these challenges indicate that sequence-based methods alone can not identify all relevant epitopes, which motivates further exploration and development of complementary approaches²².

Structure-based methods use three-dimensional arrangements (i.e., conformations) of receptors and ligands²³. They are not restricted to a predefined amino acid alphabet and can be used in docking or structural virtual screening tasks²⁴. In the context of these tasks, structure-based scoring functions are used to approximate the free energy of a molecular system. Most scoring functions are generic and can be used to score any complex of interest (including pHLAs), but their performance is often system-dependent²⁵. To tailor scoring functions to a specific protein family, machine learning (ML) efforts are emerging^26,27. As reliable pHLA modeling tools arise^23,28, and more data become available, we see a potential for pHLA ML scoring functions and structure-based methods to enter epitope discovery pipelines and complement existing sequence-based methods.

Under the hood, most scoring functions (such as Rosetta’s ref2015²⁹) approximate independent energy terms for a molecular complex and rely on the assumption that binding affinity can be described as a weighted sum of these terms³⁰. Standard ML training protocols use the same assumption. GradDock³¹, for example, involves ref2015 standard energy terms and redefines their weights to better fit the HLA system while keeping the additive formulation. However, this additive functional form of classical (and ML-derived) scoring functions has been challenged in previous studies^32,33. SIEVE-Score³⁴ recently considered binding site residues and exemplified the benefit of decomposing the energy terms associated with binding site residues for interaction-energy-based learning. The idea of assessing peptide binding affinity via decomposition into peptide residues has also been applied in the context of other computational approaches with mixed results³⁵, such as quantitative structure-activity relationship (QSAR) studies involving amino acid descriptors³⁶.

In our approach, we decompose the energy terms of a pHLA complex into separate contributions for all residues at each position in the peptide; we then use these energy terms as input to train ML models for binding affinity prediction. We call this approach the per-peptide-position training protocol. Our rationale is that structural information that is important for pHLA binding prediction gets lost when standard scoring functions (involving the additive formulation) are applied to the pHLA complex. We use our per-peptide-position protocol in the context of the Rosetta framework³⁷, which leads to our novel 3pHLA-score. The main novelty of our work resides in the combination of two complementary ideas in an innovative fashion: (1) tuning the weights of Rosetta’s scoring function to more accurately assess pHLA binding; (2) keeping the energy terms associated with peptide’s residue positions separate to not lose information through aggregation.

We evaluate the predictive power of our per-peptide-position protocol in the first set of experiments where we compare 3pHLA-score with the baseline ref2015-score and the standard-HLA-score trained using the standard additive protocol. Our results show a clear lead of the per-peptide-position protocol over the standard training protocol and the default ref2015 scoring function. We then validate 3pHLA-score on two independent datasets and compare it to six widely used scoring functions: AutoDock4³⁸, Vina³⁹, Vinardo⁴⁰, GradDock³¹, DOPE⁴¹, FoldX⁴². 3pHLA-score outperforms the other scoring function in the virtual screening setting and shows the ability to generalize well on the independent datasets. This work provides a guideline for future development of ML structure-based scoring functions. Furthermore, it brings structure-based methods closer to epitope discovery pipelines, which could help advance the development of peptide vaccines.

Methods

In this work, we train ML models on pHLA energy terms that are decomposed into specific contributions associated with each residue position within a peptide. We call this approach the per-peptide-position protocol and we apply it to Rosetta’s ref2015 energy terms to build our 3pHLA-score. Hence, in order to explain our work, we need to first describe the ref2015-score. In addition, we describe a score that we call standard-pHLA-score which uses an intermediate protocol between ref2015 and 3pHLA-score, as it is trained for the pHLA system using the original ref2015 energy terms without decomposition.

Baseline ref2015-score

The 3D conformation of a given pHLA complex is stored in a PDB (Protein DataBank⁴³) file containing the coordinates of all the atoms in this molecular complex (Fig. 1a). Rosetta’s ref2015 scoring function feeds this all-atom information into pre-parametrized mathematical and physical models to calculate different energy terms²⁹. These energy terms are based on predefined equations that model different chemical and physical aspects of a molecular system, such as electrostatics, hydrogen bonding, and van der Waals interactions. The ref2015 scoring function contains 19 energy terms listed in Supplementary Table S1. The total energy of the input structure is approximated as a linear weighted sum of these energy terms. The default weights of ref2015 have been optimized on a wide range of scientific benchmarks to bring Rosetta calculations in agreement with small-molecule thermodynamic data and high-resolution structural features²⁹. In this study, we approximate binding energy using ref2015-score with the equation⁴⁴:

$$\begin{aligned} E_{binding} = E_{complex} - (E_{receptor}+E_{peptide}) \end{aligned}$$

(1)

where $E_{complex}$ is the ref2015 energy of the whole complex, $E_{receptor}$ is the ref2015 energy of the HLA receptor alone and $E_{peptide}$ is the ref2015 energy of the peptide (Fig. 1a).

Standard-pHLA-score

ML models can be used to refine scoring functions and tailor them to a specific system of interest. However, they do not have priors on physical and chemical properties of the molecular system. If all-atom coordinates are used as features, they can introduce noise which slows down the training and makes the learning process more difficult. This is why an initial step of transforming the structural information into compact features is needed. A standard protocol is to use the energy terms provided by traditional scoring functions as features (i.e., inputs to the models) and tune their weights to fit a particular system^31,45. We formulate the standard ref2015 features as a vector containing the 19 ref2015 energy terms. We train non-linear ML models (see Machine learning models subsection) using these standard features to develop the standard-pHLA-score (Fig. 1b).

3pHLA-score

To develop 3pHLA-score, we go beyond the standard featurization. We decompose ref2015 energy terms into energy contributions associated with each residue position in the peptide, which we call per-peptide-position features. This protocol is inspired by the domain knowledge about the pHLA complex. Experimental findings on peptide anchors suggest that important information about the binding can be retrieved by zooming into the energy of the binding pocket at specific regions surrounding different positions in the peptide⁴⁶. To extract the per-peptide-position features, we first scored the whole pHLA complex with Rosetta’s ref2015 (as explained in the subsection above). Next, we applied PyRosetta’s⁴⁷ residue_total_energies_array function. This function allows us to see how the structural energy of the complex breaks down into per-peptide-position contributions. The output of residue_total_energies_array is an array of energy terms (Table S1) for each peptide residue position, which we stack to form the input vector (see Supplementary Material subsection Per-peptide-position feature vector). This vector is used as input to the non-linear ML models (see Machine learning models subsection) to create 3pHLA-score (Fig. 1c).

Machine learning models

For standard-pHLA-score and 3pHLA-score we used the same dataset and settings to train our ML models - they only differ in the input features extracted from molecular structures.

We trained Random Forest Regression models⁴⁸ on a per-HLA-allele basis. For each featurization, we trained 28 models - one for each HLA allele in the dataset. We built regression trees using the CART algorithm⁴⁹ with the mean absolute error as the split criterion. To create ensembles of regression trees we used bootstrap aggregation. We scaled experimental binding affinities into the [0,1] range^11,12 (Eq. S.3) and used them as prediction targets.

We compiled the training set by extracting 90% of binders and 90% of non-binders with equally distributed binding affinities out of Dataset 1 (see below). The rest of the data constitutes the test set, which was left out of the training and cross-validation phase. We stratified the training set into 5 folds (each with equal distribution of binding affinities) for hyperparameter tuning in a 5-fold cross-validation setting. Using randomized search and the 5-fold cross-validation we tuned the following parameters: number of trees, number of features per tree, maximum tree depth, and minimum samples per leaf. After tuning, we evaluated the performance of the final models on the left-out test set.

Note that our main experiments describe the use of Random Forest Regression models for training the standard-pHLA-score and 3pHLA-score. However, we assessed other regression techniques: linear regression, support vector machine regression, and partial least squares regression. We provide related results and discussion in the Alternative ML regression techniques subsection of the Supplementary Material.

Dataset 1

This dataset consists of 77,581 pHLA structures modeled by the APE-Gen modeling tool^28,50. It involves 28 HLA alleles (13 HLA-A, 12 HLA-B, and 3 HLA-C alleles). Peptides included in this dataset are all of the length 9 (9-mers). The experimental binding affinity of each pHLA complex was extracted from MHCFlurry¹⁰, which used IEDB⁵¹ as its main source of information. As mentioned above, Dataset 1 was split into non-overlapping training and test portions to separately train and evaluate 3pHLA-score and standard-ref2015-score.

Dataset 2

Dataset 2 is an evaluation dataset containing 100 strong binders experimentally identified and curated in related work¹⁰ along with 2000 additional pHLA decoys extracted from the NetMHC dataset¹¹. Selected pHLA complexes have no overlap with the training set (which is a subset of Dataset 1) and were modeled with APE-Gen using the methodology proposed in the reference study²⁸. Dataset 2 was composed to mimic an epitope discovery setting where a large pool of peptide targets is screened, but only a small portion of the targets are true binders.

Dataset 3

Dataset 3 is an evaluation set containing 11 pHLA complexes for the HLA-A0201 allele with different levels of known experimental binding affinity (strong [0-5] nM, medium [50-500] nM and weak [500-25,000] nM) for which there exist crystal structures in the PDB. Three out of 11 peptides are 10-mers while the others are 9-mers. We collected crystal structures for each of the pHLA complexes (note that there were multiple entries for some complex complexes, see Supplementary Table S4). Multiple biological assemblies sometimes with alternative side chain positions were extracted from each PDB file and treated as separate structures. This led to the inclusion of 77 structures in Dataset 3. Preprocessing of the crystals was done using PyMol⁵² (to remove water molecules and hydrogen atoms) and pdbfixer⁵³ (to add missing atoms). Since crystal structures of complexes involving non-binder peptides do not exist, five additional structures of experimentally determined non-binding peptides⁵⁰ for the HLA-A0201 allele were modeled with Docktope⁵⁴ and added to Dataset 3. The complete dataset is outlined in Supplementary Table S4; it contains 82 structures of pHLA complexes involving 16 peptides and the HLA-A0201 receptor. These pHLA complexes do not appear in the training set (which is a subset of Dataset 1). Dataset 3 is a good test of the generalizability of 3pHLA-score because it strongly differs from the training dataset - structures are not modeled by APE-Gen and some involve peptides of length 10.

Comparison of scoring functions

Several evaluation metrics were used to compare the performance of scoring functions (see Supplementary Material section Evaluation Metrics). Because we focused on assessing how well the functions could reproduce peptide rankings in terms of HLA-binding affinity, we used Pearson’s correlation coefficient r and Spearman’s correlation coefficient $\rho$ to evaluate the regression performance. To assess classification power, we used the Area Under the Receiver Operator Curve (AUROC) and the Area Under the Precision-Recall Curve (AUPRC). The performance of 3pHLA-score on Dataset 2 and Dataset 3 was compared to other widely used scoring functions which use different techniques (Table S5). When visualized, scores were scaled using max normalization to fit [0-1] range, but inverted such that values closer to 1 represent stronger binders for all investigated scoring functions, while values closer to 0 represent weaker binders.

Human and animal rights

No human or animal data samples were used in this study.

Results

We investigate the benefits of our per-peptide-position protocol by assessing the predictive power of 3pHLA-score on the test portion of Dataset 1 (see Methods subsection Dataset 1). We then compare the performance of the 3pHLA-score to six other widely used scoring functions in two different settings using independent datasets: Dataset 2 and Dataset 3.

Per-peptide-position featurization shows superior predictive power

First, we compare the regression and classification power of the following scoring functions on the test portion of Dataset 1: ref2015-score, standard-pHLA-score, and 3pHLA-score. We are interested to see how well the rank of predicted binding affinities matches the rank of the true binding affinity values for tested pHLA complexes. On the other hand, with the classification metrics (AUROC, AUPRC), we want to test how well predicted binding affinities separate the known binders from non-binders. The regression power of the scoring functions is evaluated on the test set using Pearson’s correlation coefficient r (Fig. 2). 3pHLA-score outperformed both ref-2015 and standard-pHLA-score: while 3pHLA-score achieves an average Pearson’s correlation of 0.75 on the test set, ref2015-score and standard-pHLA-score achieve a significantly lower correlation of 0.09 and 0.46, respectively (Table 1). Figure S3 shows in detail the correlation between predicted and experimental scores for the best and worst performing 3pHLA-score models. The same pattern is observed for all individual HLA alleles across all investigated metrics (Fig. S2, Table S2). Additionally, we provide the same analysis for standard-pHLA-score and 3pHLA-score that are trained using alternative ML regression techniques (Supplementary Material subsection Alternative ML regression techniques). 3pHLA-score consistently outperforms standard-pHLA-score across all ML regression techniques we assessed.

Table 1 Results of scoring functions obtained using different training protocols on the test set averaged across all HLA alleles for all four evaluated metrics (Pearson’s correlation coefficient |r|, Spearman’s correlation coefficient $|\rho |$, the Area Under the Receiver Operator Curve AUROC and the Area Under the Precision Recall Curve AUPRC) . The highest values and best performing values in each column are bolded.

Full size table

The predictive power of the per-peptide-position protocol varies depending on the choice of positions

We know that different residue positions in a peptide (i.e., peptide positions) have different contributions to HLA binding and T-cell recognition. While middle positions are usually more exposed and therefore involved in the recognition by T-cells, the anchor positions are usually buried in the HLA groove and play a more direct role in pHLA binding⁵⁵. For this reason, we conducted an ablation study to investigate the influence of different peptide positions on the performance of 3pHLA-score. 3pHLA-score was trained with three different position sets: all nine positions, anchor positions (1, 2, 3, 8, 9), or middle positions (4, 5, 6, 7). We generate binding affinity predictions for the test set using these different versions of 3pHLA-score and we investigate how well the affinities are ranked compared to the true affinities as well as how well the predictions separate binders from non-binders. We observe that the choice of positions in 3pHLA-score has a substantial influence on its performance on the test set according to Pearson’s correlation (Fig. 3 , Table S3) and other metrics (Figure S4). The performance of training with anchor positions only and all nine positions is comparable, with r values higher than 0.8 for most HLA alleles. The r values drop below 0.7 when middle positions only are used. The only exception is the HLA-B0801 allele, for which closer inspection of the binding motif in IEDB (iedb.org/mhc/252) clearly indicates the importance of position 5 for peptide binding, as reflected in the HLA-B0801 predictor’s performance.

3pHLA-score outperforms well-validated structure-based scoring functions in an epitope discovery setting

The goal of structure-based virtual screening for epitope discovery is to distinguish true binder peptides from non-binders, which can be seen as a classification problem. To evaluate 3pHLA-score in an epitope discovery scenario, we compare it to a variety of widely used structural scoring functions (Table S5) on a dataset containing 100 strong binders and 2,000 decoys across 16 HLA alleles (Dataset 2). Our results show that 3pHLA-score clearly outperforms other evaluated scoring functions in this virtual screening setting, with an average AUPRC of 0.71 compared to the second-best scoring function (Vinardo) with AUPRC of 0.35 (Table 2). This is consistent with 3pHLA-score achieving higher values of both AUROC and AUPRC for all investigated HLA alleles individually (Tables S7, S8), and 3pHLA-score separating binders from non-binders more clearly than other scoring functions (Figure S5). It is also important to note that Dataset 2 was not used in the training phase. Therefore, this experiment also demonstrates the capacity of 3pHLA-score to generalize to new datasets.

Table 2 AUROC and AUPRC values aggregated for the virtual screening experiment across HLA alleles. The highest values and best performing values in each column are bolded.

Full size table

In the context of epitope discovery, current pipelines use sequence-based scoring functions. Therefore, we evaluate how 3pHLA-score compares to sequence-based methods and present the details of this analysis in Supplementary Material. Overall, 3pHLA-score has comparable performance to selected sequence-based methods with an average AUROC of 0.977 compared to the best achieved AUROC of 0.993 with MHCFlurry2.0¹⁰. Note that we do not know if MHCFlurry2.0 has had a part of our test dataset in their training, which might give it a slight advantage.

3pHLA-score can generalize to an independent dataset

We tested the ability of 3pHLA-score to generalize to other “types“ of structural data with the independent Dataset 3. Dataset 1 and Dataset 2 contain structures that were all modeled by APE-Gen²⁸ with peptides of 9 residues in length (9-mers). With an independent dataset, we can investigate the possible biases towards this modeling tool and explore how to generalize to the peptides of length 10. Dataset 3 contains experimentally resolved three-dimensional pHLA structures involving binders and non-binders modeled by Docktope⁵⁴. Importantly, it contains 10-mer peptides. As 3pHLA-score was trained on 9-mers, the size of the input of the model is $9 \times 19$ (i.e., 9 peptide positions times 19 energy terms). To score 10-mers, we excluded the energy terms of the middle position (i.e., position 6) of the peptide. The rationale for this approach lies in the aforementioned experimental findings on peptide anchors⁴⁶.

Since Dataset 3 contains peptides with a wide range of experimental binding affinities (strong, medium, weak binders, and non-binders), two tasks were identified for the scoring functions: a regression and a classification task. For the regression task, scoring functions are expected to predict the correct peptide ranking in terms of binding affinities. In this context, it is also interesting to analyze the range of scores predicted for a given peptide within different structures (i.e., same complex, but different crystallography experiments). The smaller the range, the more consistent a scoring function is for scoring a certain peptide. For the classification task, we label peptides with three different binding affinity thresholds: 50 nM (distinguishing strong binders from others), 500 nM (distinguishing strong and medium binders from others), and 25,000 nM (distinguishing binders from non-binders). The classification power of scoring functions was evaluated using AUROC and AUPRC.

The scaled scores aggregated across structures for each peptide are shown in Fig. 4. The scaled score for each structure in Dataset 3 is shown in Figure S6. Pearson’s correlation coefficient between experimental binding affinity and predicted scores is given in Table S6. While DOPE scoring function consistently outperforms others, 3pHLA-score shows competitive performance in this challenging setting and is a runner-up in most of the evaluated tasks. In the regression setting, this fact is reflected by DOPE achieving a correlation of 0.62, while 3pHLA-score achieves a correlation of 0.56 with experimental affinity. However, neither of these correlations are strong. On the other hand, both DOPE and 3pHLA-score produce small variations of the score for different structures of the same peptide which is a desirable property for an epitope discovery task. With respect to the classification task, DOPE produced the best results according to AUROC and AUPRC for most of the thresholds analyzed (Table 3). The 3pHLA-score also occupied a position of relevance, having the best AUPRC value for the 500 nM threshold and the second-best AUPRC values for the 50 nM and 25,000 nM thresholds. When considering AUROC values, Vina and Vinardo are the second best for the 50 nM and 500 nM thresholds; the 3pHLA-score was again the second best for the 25,000 nM threshold.

Table 3 Quantified power of scoring functions to discriminate between peptides of different binding strength on the Dataset 3.

Full size table

Discussion

Motivated by experimental findings of peptide anchors, we hypothesize that important information for training ML pHLA scoring functions is lost in standard training protocols. We try to recover this information using our novel per-peptide-position protocol and we apply it to develop the 3pHLA-score.

In the first set of experiments, we show how energy decoupling of the per-peptide-position protocol (as applied to 3pHLA-score) significantly increases the predictive power of models (Figs. 2, S2, Table 1). Furthermore, we show that the predictive power of 3pHLA-score is highly dependent on the choice of peptide positions to be decoupled (Figs. 3, S4).

Next, we provide an extensive comparison of the 3pHLA-score against other widely used scoring functions. 3pHLA-score shows a clear superior performance to other scoring functions when tested in the epitope discovery setting where we perform structure-based virtual screening of true peptide-binders to HLA receptors (Table 2, Fig. S5).

Note that the training of the 3pHLA-score could not have been done using only experimentally-determined crystal structures, due to the limited number of pHLA crystals available (i.e., less than 800 in the PDB). Therefore, we chose to use models produced by APE-Gen, which is potentially the only currently available pHLA-specific modeling tool with the capacity to model thousands of complexes (e.g., nearly 80,000 complexes modeled for Dataset 1). The choice of the modeling method, however, can introduce a bias in the training of the scoring function. To test that, we used an independent dataset (i.e., Dataset 3) containing crystal structures and models produced by a different tool DockTope. Note that DockTope uses a very different modeling protocol, based on fixed backbone templates. Despite involving different types of structures, our results still show a good overall performance of 3pHLA-score on Dataset 3, being competitive with other popular scoring functions. These results suggest that 3pHLA-score can be used with crystal structures and models produced by other tools, without additional training, although a broader survey with other tools for pHLA modeling and peptide-docking will be needed to further corroborate this point. Interestingly, in this experiment, the most consistent predictions across different structures of the same complex, and the strongest correlation with experimental data, were observed for DOPE (Table 3, Fig. 4). This surprising result might be directly linked to the nature of this dataset and the intended use of DOPE. DOPE scoring function is a statistical potential used to assess the global quality of homology models produced by Modeller⁵⁶. This provides two advantages to DOPE in the experiment with Dataset 3. First, this dataset is mostly composed of crystal structures, and DOPE’s global assessment was observed in our experiment to be more resilient to small differences between different conformations of the same complex. Second, DOPE is well suited to distinguish the non-binders, which were modeled with a docking-based approach, from the experimentally-determined crystal structures used for all other complexes. Our results show that the 3pHLA-score predictions could be generalized to both DockTope models and crystal structures, while the good performance of DOPE did not generalize to other datasets. For instance, 3pHLA-score outperformed DOPE and other scoring functions on Dataset 2 (Table 2, Fig. S5). It is therefore the method that provides the most consistent results across the three different datasets.

The discovered potential of per-peptide-position energy terms for pHLA system opens up many additional opportunities that we discuss here. To build 3pHLA-score we trained separate models for each HLA allele. This limits the use of 3pHLA-score to a fixed set of HLA alleles that is found in the training dataset. However, a bigger pan-allele dataset can be acquired in the future and the same method could be applied to train a more general pan-allele model. APE-Gen, the tool used here to model pHLA structures, is currently limited to modeling the peptides containing only the 20 standard amino acids. Therefore, modeling phosphorylated peptides (or peptides with other post-translational modifications) and assessing the HLA-binding energies of these peptides with 3pHLA-score is another interesting challenge, which would greatly broaden the impact of our methods on ongoing efforts in epitope discovery⁵⁷. 3pHLA-score was trained here with a single conformation per peptide, to predict HLA binding affinity in the context of structural virtual screening. Future studies could investigate the use and refinement of 3pHLA-score to the geometry prediction task (i.e., ranking different conformations of the same pHLA complex). For that task, we would propose using the same per-peptide-position training protocol on a dataset that contains multiple conformations per peptide mapped to a corresponding experimentally determined crystal structures. The baseline scoring function for extracting the energy terms used here was ref2015. Therefore, it remains to be determined how the same training protocol would perform when applied to another existing scoring function which provides energy terms for specific regions of the model. This question is left for future work. As discussed above, our per-peptide-position protocol could provide more opportunities than exemplified by 3pHLA-score. The protocol can be applied beyond the ref2015 energy terms as well as beyond the pHLA system. For that reason, we make a distinction between the 3pHLA-score and the per-peptide-position protocol.

Overall, our results confirm that important structural signal for binding prediction gets lost when the standard energy terms are calculated at the all-peptide-atom level. This could point to the fact that the additive nature of the standard all-atom energy terms is not appropriate for the pHLA system. Our work emphasizes how experimental findings can help engineer more powerful features and train ML models with better predictive power. This can serve as a guideline for future attempts of training custom ML scoring functions for different systems of interest. As more structural pHLA data become available, we hope that our findings will inspire future efforts in training structure-based pHLA binding predictors that could enter epitope discovery pipelines and complement sequence-based methods. 3pHLA-score has direct application to epitope discovery projects, which could help advance the development of vaccines against several types of cancer and viral infections.

Data availability

Sequencing data was not generated in this study.

The code and the data used for running the experiments and training along with the scoring function and datasets is available in the repository: https://github.com/KavrakiLab/3pHLA-score.

References

Neefjes, J., Jongsma, M. L. M., Paul, P. & Bakke, O. Towards a systems understanding of MHC class i and MHC class II antigen presentation. Nat. Rev. Immunol. 11, 823–836. https://doi.org/10.1038/nri3084 (2011).
Article CAS PubMed Google Scholar
Rock, K. L., Reits, E. & Neefjes, J. Present yourself! by MHC class i and MHC class II molecules. Trends Immunol. 37, 724–737. https://doi.org/10.1016/j.it.2016.08.010 (2016).
Article CAS PubMed PubMed Central Google Scholar
Stevanović, S. Structural basis of immunogenicity. Transpl. Immunol. 10, 133–136. https://doi.org/10.1016/s0966-3274(02)00059-x (2002).
Article PubMed Google Scholar
James, K. D., Jenkinson, W. E. & Anderson, G. T-cell egress from the thymus: Should i stay or should i go?. J. Leukoc. Biol. 104, 275–284. https://doi.org/10.1002/jlb.1mr1217-496r (2018).
Article CAS PubMed Google Scholar
Grau, M., Walker, P. R. & Derouazi, M. Mechanistic insights into the efficacy of cell penetrating peptide-based cancer vaccines. Cell. Mol. Life Sci. 75, 2887–2896. https://doi.org/10.1007/s00018-018-2785-0 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lizée, G. et al. Harnessing the power of the immune system to target cancer. Annu. Rev. Med. 64, 71–90. https://doi.org/10.1146/annurev-med-112311-083918 (2013).
Article CAS PubMed Google Scholar
Dudek, N. L., Perlmutter, P., Aguilar, M.-I., Croft, N. P. & Purcell, A. W. Epitope discovery and their use in peptide based vaccines. Curr. Pharm. Des. 16, 3149–3157. https://doi.org/10.2174/138161210793292447 (2010).
Article CAS PubMed Google Scholar
Joglekar, A. V. & Li, G. T cell antigen discovery. Nat. Methods 18, 873–880. https://doi.org/10.1038/s41592-020-0867-z (2020).
Article CAS PubMed Google Scholar
Robinson, J. et al. The IPD and IMGT/HLA database: Allele variant databases. Nucl. Acids Res. 43, D423–D431. https://doi.org/10.1093/nar/gku1161 (2014).
Article CAS PubMed PubMed Central Google Scholar
O’Donnell, T. J., Rubinsteyn, A. & Laserson, U. MHCflurry 2.0: Improved pan-allele prediction of MHC class I-presented peptides by incorporating antigen processing. Cell Syst. 11, 42–48. https://doi.org/10.1016/j.cels.2020.06.010 (2020).
Article CAS PubMed Google Scholar
Andreatta, M. & Nielsen, M. Gapped sequence alignment using artificial neural networks: Application to the MHC class I system. Bioinformatics 32, 511–517. https://doi.org/10.1093/bioinformatics/btv639 (2015).
Article CAS PubMed PubMed Central Google Scholar
O’Donnell, T. J. et al. MHCflurry: Open-source class I MHC binding affinity prediction. Cell Syst. 7, 129–132. https://doi.org/10.1016/j.cels.2018.05.014 (2018).
Article CAS PubMed Google Scholar
Zhang, H., Lund, O. & Nielsen, M. The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: Application to MHC-peptide binding. Bioinformatics 25, 1293–1299. https://doi.org/10.1093/bioinformatics/btp137 (2009).
Article CAS PubMed PubMed Central Google Scholar
Vielhaben, J., Wenzel, M., Samek, W. & Strodthoff, N. USMPep: Universal sequence models for major histocompatibility complex binding affinity prediction. BMC Bioinform.https://doi.org/10.1186/s12859-020-03631-1 (2020).
Article Google Scholar
Venkatesh, G., Grover, A., Srinivasaraghavan, G. & Rao, S. MHCAttnNet: Predicting MHC-peptide bindings for MHC alleles classes I and II using an attention-based deep neural model. Bioinformatics 36, i399–i406. https://doi.org/10.1093/bioinformatics/btaa479 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhao, W. & Sher, X. Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes. PLoS Comput. Biol. 14, e1006457. https://doi.org/10.1371/journal.pcbi.1006457 (2018).
Article CAS PubMed PubMed Central ADS Google Scholar
Alpízar, A. et al. A molecular basis for the presentation of phosphorylated peptides by HLA-b antigens. Mol. Cell. Proteom. 16, 181–193. https://doi.org/10.1074/mcp.m116.063800 (2017).
Article Google Scholar
Refsgaard, C. T., Barra, C., Peng, X., Ternette, N. & Nielsen, M. NetMHCphosPan - pan-specific prediction of MHC class I antigen presentation of phosphorylated ligands. ImmunoInformatics 1–2, 100005. https://doi.org/10.1016/j.immuno.2021.100005 (2021).
Article CAS Google Scholar
Koch, C. P., Pillong, M., Hiss, J. A. & Schneider, G. Computational resources for MHC ligand identification. Mol. Inf. 32, 326–336. https://doi.org/10.1002/minf.201300042 (2013).
Article CAS Google Scholar
Young, S. S., Yuan, F. & Zhu, M. Chemical descriptors are more important than learning algorithms for modelling. Mol. Inf. 31, 707–710. https://doi.org/10.1002/minf.201200031 (2012).
Article CAS Google Scholar
Liao, W. W. P. & Arthur, J. W. Predicting peptide binding affinities to MHC molecules using a modified semi-empirical scoring function. PLoS ONE 6, e25055. https://doi.org/10.1371/journal.pone.0025055 (2011).
Article CAS PubMed PubMed Central ADS Google Scholar
Antunes, D. A., Abella, J. R., Devaurs, D., Rigo, M. M. & Kavraki, L. E. Structure-based methods for binding mode and binding affinity prediction for peptide-MHC complexes. Curr. Top. Med. Chem. 18, 2239–2255. https://doi.org/10.2174/1568026619666181224101744 (2019).
Article CAS Google Scholar
Aranha, M. P. et al. Combining three-dimensional modeling with artificial intelligence to increase specificity and precision in peptide–MHC binding predictions. J. Immunol. 205, 1962–1977. https://doi.org/10.4049/jimmunol.1900918 (2020).
Article CAS PubMed Google Scholar
Devaurs, D. et al. Using parallelized incremental meta-docking can solve the conformational sampling issue when docking large ligands to proteins. BMC Mol. Cell Biol.https://doi.org/10.1186/s12860-019-0218-z (2019).
Article PubMed PubMed Central Google Scholar
Palacio-Rodríguez, K., Lans, I., Cavasotto, C. N. & Cossio, P. Exponential consensus ranking improves the outcome in docking and receptor ensemble docking. Sci. Rep.https://doi.org/10.1038/s41598-019-41594-3 (2019).
Article PubMed PubMed Central Google Scholar
Guedes, I. A. et al. New machine learning and physics-based scoring functions for drug discovery. Sci. Rep.https://doi.org/10.1038/s41598-021-82410-1 (2021).
Article PubMed PubMed Central Google Scholar
Ain, Q. U., Aleksandrova, A., Roessler, F. D. & Ballester, P. J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdisciplinary Reviews: Computational Molecular Science 5, 405–424. https://doi.org/10.1002/wcms.1225 (2015).
Abella, J., Antunes, D., Clementi, C. & Kavraki, L. APE-gen: A fast method for generating ensembles of bound peptide-MHC conformations. Molecules 24, 881. https://doi.org/10.3390/molecules24050881 (2019).
Article CAS PubMed Central Google Scholar
Alford, R. F. et al. The rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048. https://doi.org/10.1021/acs.jctc.7b00125 (2017).
Article CAS PubMed PubMed Central Google Scholar
Schulz-Gasch, T. & Stahl, M. Scoring functions for protein–ligand interactions: A critical perspective. Drug Discov. Today Technol. 1, 231–239. https://doi.org/10.1016/j.ddtec.2004.08.004 (2004).
Article CAS PubMed Google Scholar
Kyeong, H. H., Choi, Y. & Kim, H. S. GradDock: Rapid simulation and tailored ranking functions for peptide-MHC class I docking. Bioinformatics 34, 469–476. https://doi.org/10.1093/bioinformatics/btx589 (2017).
Article CAS Google Scholar
Li, H., Leung, K.-S., Wong, M.-H. & Ballester, P. J. Improving AutoDock Vina using random forest: The growing accuracy of binding affinity prediction by the effective exploitation of larger data sets. Mol. Inf. 34, 115–126. https://doi.org/10.1002/minf.201400132 (2015).
Article CAS Google Scholar
Afifi, K. & Al-Sadek, A. F. Improving classical scoring functions using random forest: The non-additivity of free energy terms’ contributions in binding. Chem. Biol. Drug Des. 92, 1429–1434. https://doi.org/10.1111/cbdd.13206 (2018).
Article CAS PubMed Google Scholar
Yasuo, N. & Sekijima, M. Improved method of structure-based virtual screening via interaction-energy-based learning. J. Chem. Inf. Model. 59, 1050–1061. https://doi.org/10.1021/acs.jcim.8b00673 (2019).
Article CAS PubMed Google Scholar
Zhou, P. et al. Systematic comparison and comprehensive evaluation of 80 amino acid descriptors in peptide QSAR modeling. J. Chem. Inf. Model. 61, 1718–1731. https://doi.org/10.1021/acs.jcim.0c01370 (2021).
Article CAS PubMed Google Scholar
Guan, P., Doytchinova, I. A., Walshe, V. A., Borrow, P. & Flower, D. R. Analysis of peptide-protein binding using amino acid descriptors: Prediction and experimental verification for human histocompatibility complex HLA-A*0201. J. Med. Chem. 48, 7418–7425. https://doi.org/10.1021/jm0505258 (2005).
Article CAS PubMed Google Scholar
Leaver-Fay, A. et al. Chapter nineteen - Rosetta3: An object-oriented software suite for the simulation and design of macromolecules. In Computer Methods, Part C, vol. 487 of Methods in Enzymology (eds Johnson, M. L. & Brand, L.) 545–574 (Academic Press, Cambridge, 2011). https://doi.org/10.1016/B978-0-12-381270-4.00019-6.
Chapter Google Scholar
Morris, G. M. et al. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 30, 2785–2791. https://doi.org/10.1002/jcc.21256 (2009).
Article CAS PubMed PubMed Central Google Scholar
Trott, O. & Olson, A. J. AutoDock vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem.https://doi.org/10.1002/jcc.21334 (2009).
Article Google Scholar
Quiroga, R. & Villarreal, M. A. Vinardo: A scoring function based on autodock vina improves scoring, docking, and virtual screening. PLoS One 11, e0155183. https://doi.org/10.1371/journal.pone.0155183 (2016).
Article CAS PubMed PubMed Central Google Scholar
Shen, M.-Y. & Sali, A. Statistical potential for assessment and prediction of protein structures. Protein Sci. 15, 2507–2524. https://doi.org/10.1110/ps.062416606 (2006).
Article CAS PubMed PubMed Central Google Scholar
Schymkowitz, J. et al. The FoldX web server: An online force field. Nucleic Acids Res. 33, W382–W388. https://doi.org/10.1093/nar/gki387 (2005).
Article CAS PubMed PubMed Central Google Scholar
Berman, H. M. The protein data bank. Nucl. Acids Res. 28, 235–242. https://doi.org/10.1093/nar/28.1.235 (2000).
Article CAS PubMed PubMed Central ADS Google Scholar
Borrman, T., Pierce, B. G., Vreven, T., Baker, B. M. & Weng, Z. High-throughput modeling and scoring of TCR-pMHC complexes to predict cross-reactive peptides. Bioinformatics 36, 5377–5385. https://doi.org/10.1093/bioinformatics/btaa1050 (2020).
Article CAS PubMed Central Google Scholar
Ye, W.-L. et al. Improving docking-based virtual screening ability by integrating multiple energy auxiliary terms from molecular docking scoring. J. Chem. Inf. Model. 60, 4216–4230. https://doi.org/10.1021/acs.jcim.9b00977 (2020).
Article CAS PubMed Google Scholar
Bouvier, M. & Wiley, D. Importance of peptide amino and carboxyl termini to the stability of MHC class I molecules. Science 265, 398–402. https://doi.org/10.1126/science.8023162 (1994).
Article CAS PubMed ADS Google Scholar
Chaudhury, S., Lyskov, S. & Gray, J. J. PyRosetta: A script-based interface for implementing molecular modeling algorithms using rosetta. Bioinformatics 26, 689–691. https://doi.org/10.1093/bioinformatics/btq007 (2010).
Article CAS PubMed PubMed Central Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32. https://doi.org/10.1023/a:1010933404324 (2001).
Article MATH Google Scholar
Breiman, L., Friedman, J., Stone, C. J. & Olshen, R. Classification and Regression Trees (Chapman and Hall/CRC, Boca Raton, 1984).
MATH Google Scholar
Abella, J. R., Antunes, D. A., Clementi, C. & Kavraki, L. E. Large-scale structure-based prediction of stable peptide binding to class I HLAs using random forests. Front. Immunol.https://doi.org/10.3389/fimmu.2020.01583 (2020).
Article PubMed PubMed Central Google Scholar
Vita, R. et al. The immune epitope database (IEDB): 2018 update. Nucl. Acids Res. 47, D339–D343. https://doi.org/10.1093/nar/gky1006 (2018).
Article CAS PubMed Central Google Scholar
Schrödinger, LLC. The PyMOL molecular graphics system, version 1.8 (2015).
Eastman, P. et al. Openmm 4: A reusable, extensible, hardware independent library for high performance molecular simulation. J. Chem. Theory Comput. 9, 461–469. https://doi.org/10.1021/ct300857j (2013).
Article CAS PubMed Google Scholar
Rigo, M. M. et al. DockTope: A web-based tool for automated pMHC-i modelling. Sci. Rep.https://doi.org/10.1038/srep18413 (2015).
Article PubMed Google Scholar
Achour, A. Major histocompatibility complex: Interaction with peptides. eLShttps://doi.org/10.1038/npg.els.0000922 (2001).
Article Google Scholar
Šali, A. & Blundell, T. L. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779–815. https://doi.org/10.1006/jmbi.1993.1626 (1993).
Article PubMed Google Scholar
Alpízar, A. et al. A molecular basis for the presentation of phosphorylated peptides by HLA-B antigens. Mol. Cell. Proteomics 16, 181–193 (2017).
Article Google Scholar
Reynisson, B., Alvarez, B., Paul, S., Peters, B. & Nielsen, M. NetMHCpan-4.1 and NetMHCIIpan-4.0: Improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucl. Acids Res. 48, W449–W454. https://doi.org/10.1093/nar/gkaa379 (2020).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors would like to thank Dr. Jayvee Abella for inspiring this work and making the APE-Gen datasets available and Romanos Fasoulis for his help with running the sequence-based scoring as well as other colleagues from Kavraki Lab for many helpful discussions. Work on this project by A.C. and L.E.K. have been supported in part by National Institutes of Health NIH [U01CA258512]. Other support included: University of Edinburgh and Medical Research Council [MC_UU_00009/2 to D.D.]; Computational Cancer Biology Training Program fellowship [RP170593 to M.M.R.]; University of Houston Funds and Rice University Funds.

Author information

Authors and Affiliations

Department of Computer Science, Rice University, Houston, 77005, USA
Anja Conev, Mauricio Menegatti Rigo & Lydia E. Kavraki
MRC Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
Didier Devaurs
Department of Biology and Biochemistry, University of Houston, Houston, 77004, USA
Dinler Amaral Antunes

Authors

Anja Conev
View author publications
You can also search for this author in PubMed Google Scholar
Didier Devaurs
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio Menegatti Rigo
View author publications
You can also search for this author in PubMed Google Scholar
Dinler Amaral Antunes
View author publications
You can also search for this author in PubMed Google Scholar
Lydia E. Kavraki
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.C., D.D., M.M.R. and D.A.A. conceived the experiment(s), A.C. conducted the experiment(s) and wrote the manuscript, A.C., D.D., M.M.R., D.A.A. and L.E.K. analyzed the results. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Dinler Amaral Antunes or Lydia E. Kavraki.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Conev, A., Devaurs, D., Rigo, M.M. et al. 3pHLA-score improves structure-based peptide-HLA binding affinity prediction. Sci Rep 12, 10749 (2022). https://doi.org/10.1038/s41598-022-14526-x

Download citation

Received: 11 February 2022
Accepted: 08 June 2022
Published: 24 June 2022
DOI: https://doi.org/10.1038/s41598-022-14526-x

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Accurate structure prediction of biomolecular interactions with AlphaFold 3

Cancer therapy with antibodies

Structure prediction of protein-ligand complexes from sequence information with Umol

Introduction

Methods

Baseline ref2015-score

Standard-pHLA-score

3pHLA-score

Machine learning models

Dataset 1

Dataset 2

Dataset 3

Comparison of scoring functions

Human and animal rights

Results

Per-peptide-position featurization shows superior predictive power

The predictive power of the per-peptide-position protocol varies depending on the choice of positions

3pHLA-score outperforms well-validated structure-based scoring functions in an epitope discovery setting

3pHLA-score can generalize to an independent dataset

Discussion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links