Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The raw data and example files are available at https://zhanggroup.org/I-TASSER-MTD/ or from the corresponding author upon reasonable request.
Code availability
The I-TASSER-MTD standalone package is freely available for academic use at https://zhanggroup.org/I-TASSER-MTD/.
References
Sali, A. & Blundell, T. L. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779–815 (1993).
Simons, K. T., Kooperberg, C., Huang, E. & Baker, D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268, 209–225 (1997).
Xu, D. & Zhang, Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 80, 1715–1735 (2012).
Yang, J. et al. The I-TASSER Suite: protein structure and function prediction. Nat. Methods 12, 7–8 (2015).
Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. & Hwa, T. Identification of direct residue contacts in protein–protein interaction by message passing. Proc. Natl Acad. Sci. USA 106, 67–72 (2009).
Marks, D. S. et al. Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6, e28766 (2011).
Mortuza, S. et al. Improving fragment-based ab initio protein structure assembly using low-accuracy contact-map predictions. Nat. Commun. 12, 5011 (2021).
Wang, S., Sun, S., Li, Z., Zhang, R. & Xu, J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 13, e1005324 (2017).
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).
Li, Y., Hu, J., Zhang, C., Yu, D.-J. & Zhang, Y. ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics 35, 4647–4655 (2019).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP)—Round XIV. Proteins 89, 1607–1617 (2021).
Chothia, C., Gough, J., Vogel, C. & Teichmann, S. A. Evolution of the protein repertoire. Science 300, 1701–1703 (2003).
Apic, G., Huber, W. & Teichmann, S. A. Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination. J. Struct. Funct. Genomics 4, 67–78 (2003).
Han, J.-H., Batey, S., Nickson, A. A., Teichmann, S. A. & Clarke, J. J. N. R. M. C. B. The folding and evolution of multidomain proteins. Nat. Rev. Mol. Cell Biol. 8, 319 (2007).
Zhou, X. G., Hu, J., Zhang, C. X., Zhang, G. J. & Zhang, Y. Assembling multidomain protein structures through analogous global structural alignments. Proc. Natl Acad. Sci. USA 116, 15930–15938 (2019).
Xu, D., Jaroszewski, L., Li, Z. & Godzik, A. AIDA: ab initio domain assembly for automated multi-domain protein structure prediction and domain–domain interaction prediction. Bioinformatics 31, 2098–2105 (2015).
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).
Xue, Z., Xu, D., Wang, Y. & Zhang, Y. ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics 29, i247–i256 (2013).
Hong, S. H., Joo, K. & Lee, J. ConDo: protein domain boundary prediction using coevolutionary information. Bioinformatics 35, 2411–2417 (2019).
Zheng, W. et al. FUpred: detecting protein domains through deep-learning based contact map prediction. Bioinformatics 36, 3749–3757 (2020).
Wollacott, A. M., Zanghellini, A., Murphy, P. & Baker, D. Prediction of structures of multidomain proteins from structures of the individual domains. Protein Sci. 16, 165–175 (2007).
Zhang, C., Zheng, W., Freddolino, P. L. & Zhang, Y. MetaGO: predicting Gene Ontology of non-homologous proteins through low-resolution protein structure prediction and protein–protein network mapping. J. Mol. Biol. 430, 2256–2265 (2018).
Yao, S. et al. NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information. Nucleic Acids Res. 49, W469–W475 (2021).
Piovesan, D. & Tosatto, S. C. INGA 2.0: improving protein function prediction for the dark proteome. Nucleic Acids Res. 47, W373–W378 (2019).
Koo, D. C. E. & Bonneau, R. Towards region-specific propagation of protein functions. Bioinformatics 35, 1737–1744 (2019).
Gligorijević, V. et al. Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 12, 1–14 (2021).
Pearce, R. & Zhang, Y. Toward the solution of the protein structure prediction problem. J. Biol. Chem. 297, 100870 (2021).
Zheng, W. et al. Protein structure prediction using deep learning distance and hydrogen‐bonding restraints in CASP14. Proteins 89, 1734–1751 (2021).
Zheng, W. et al. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins 87, 1149–1164 (2019).
Battey, J. N. et al. Automated server predictions in CASP7. Proteins 69 (Suppl.), 68–82 (2007).
Croll, T. I., Sammito, M. D., Kryshtafovych, A. & Read, R. J. Evaluation of template-based modeling in CASP13. Proteins 87, 1113–1127 (2019).
Zhang, Y. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins 69 (Suppl.), 108–117 (2007).
Zhang, Y. I-TASSER: fully automated protein structure prediction in CASP8. Proteins 77 (Suppl.), 100–113 (2009).
Xu, D., Zhang, J., Roy, A. & Zhang, Y. Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement. Proteins 79 (Suppl.), 147–160 (2011).
Zhang, Y. Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10. Proteins 82 (Suppl.), 175–187 (2014).
Zhang, W. et al. Integration of QUARK and I-TASSER for ab initio protein structure prediction in CASP11. Proteins 84 (Suppl.), 76–86 (2016).
Zhang, C., Mortuza, S. M., He, B., Wang, Y. & Zhang, Y. Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins 86 (Suppl.), 136–151 (2018).
Roy, A., Kucukural, A. & Zhang, Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protoc. 5, 725–738 (2010).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition 770–778 (2016).
Zheng, W. et al. LOMETS2: improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins. Nucleic Acids Res. 47, W429–W436 (2019).
Wang, Y. et al. ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly. Nucleic Acids Res. 45, W400–W407 (2017).
Li, Y. et al. Protein inter‐residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14. Proteins 89, 1911–1921 (2021).
Zhang, C., Freddolino, P. L. & Zhang, Y. COFACTOR: improved protein function prediction by combining structure, sequence and protein–protein interaction information. Nucleic Acids Res. 45, W291–W299 (2017).
Sillitoe, I. et al. CATH: increased structural coverage of functional space. Nucleic Acids Res. 49, D266–D273 (2021).
Xu, Y., Xu, D. & Gabow, H. N. Protein domain decomposition using a graph-theoretic approach. Bioinformatics 16, 1091–1104 (2000).
Steinegger, M. & Söding, J. Clustering huge protein sequence sets in linear time. Nat. Commun. 9, 2542 (2018).
Steinegger, M., Mirdita, M. & Söding, J. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat. Methods 16, 603–606 (2019).
Mitchell, A. L. et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 48, D570–D578 (2020).
Chen, I.-M. A. et al. The IMG/M data management and analysis system v. 6.0: new tools and advanced capabilities. Nucleic Acids Res. 49, D751–D763 (2021).
Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, D170–D176 (2017).
Suzek, B. E. et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).
Zhang, C., Zheng, W., Mortuza, S., Li, Y. & Zhang, Y. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 36, 2105–2112 (2020).
Yan, R., Xu, D., Yang, J., Walker, S. & Zhang, Y. A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction. Sci. Rep. 3, 1–9 (2013).
Ekeberg, M., Hartonen, T. & Aurell, E. Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences. J. Comput. Phys. 276, 341–356 (2014).
Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).
Thrun, S. in Advances in Neural Information Processing Systems 640–646 (Morgan Kaufmann Publishers, 1996).
Zheng, W., Zhang, C., Bell, E. W. & Zhang, Y. I-TASSER gateway: a protein structure and function prediction server powered by XSEDE. Future Gener. Comput. Syst. 99, 73–85 (2019).
Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioinforma. 9, 40 (2008).
Zheng, W. et al. Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. Cell Rep. Methods 1, 100014 (2021).
Li, Y., Zhang, C., Bell, E. W., Yu, D. J. & Zhang, Y. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13. Proteins 87, 1082–1091 (2019).
Li, Y. et al. Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLOS Comput. Biol. 17, e1008865 (2021).
He, B., Mortuza, S., Wang, Y., Shen, H.-B. & Zhang, Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics 33, 2296–2306 (2017).
Zhang, Y. & Skolnick, J. SPICKER: a clustering approach to identify near‐native protein folds. J. Comput. Chem. 25, 865–871 (2004).
Huang, X., Pearce, R. & Zhang, Y. FASPR: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics 36, 3758–3765 (2020).
Zhang, J., Liang, Y. & Zhang, Y. Atomic-level protein structure refinement using fragment-guided molecular dynamics conformation sampling. Structure 19, 1784–1795 (2011).
Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins 57, 702–710 (2004).
Ramachandran, G. T. & Sasisekharan, V. in Advances in Protein Chemistry, 23 283–437 (Elsevier, 1968).
Roy, A., Yang, J. & Zhang, Y. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res. 40, W471–W477 (2012).
Yang, J., Roy, A. & Zhang, Y. BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions. Nucleic Acids Res. 41, D1096–D1103 (2012).
Zhou, X. G., Peng, C. X., Liu, J., Zhang, Y. & Zhang, G. J. Underestimation-assisted global-local cooperative differential evolution and the application to protein structure prediction. IEEE Trans. Evol. Comput. 24, 536–550 (2020).
Zhou, X. G. & Zhang, G. J. Abstract convex underestimation assisted multistage differential evolution. IEEE Trans. Cybern. 47, 2730–2741 (2017).
Zhou, X. G. & Zhang, G. J. Differential evolution with underestimation-based multimutation strategy. IEEE Trans. Cybern. 49, 1353–1364 (2018).
Yang, J., Wang, Y. & Zhang, Y. ResQ: an approach to unified estimation of B-factor and residue-specific error in protein structure prediction. J. Mol. Biol. 428, 693–701 (2016).
Glaeser, R. M. How good can cryo-EM become? Nat. Methods 13, 28–32 (2016).
Zhou, X. G. et al. Progressive assembly of multi-domain protein structures from cryo-EM density maps. Nat. Comput. Sci. 2, 265–275 (2022).
Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
Lu, S. et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 48, D265–D268 (2020).
Eickholt, J., Deng, X. & Cheng, J. DoBo: protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinforma. 12, 1–8 (2011).
Tai, C. H., Lee, W. J., Vincent, J. J. & Lee, B. Evaluation of domain prediction in CASP6. Proteins 61, 183–192 (2005).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Pearce, R. & Zhang, Y. Deep learning techniques have significantly impacted protein structure prediction and protein design. Curr. Opin. Struct. Biol. 68, 194–207 (2021).
Born, A., Henen, M. A. & Vögeli, B. Activity and affinity of Pin1 variants. Molecules 25, 36 (2020).
Born, A. et al. Reconstruction of coupled intra-and interdomain protein motion from nuclear and electron magnetic resonance. J. Am. Chem. Soc. 143, 16055–16067 (2021).
Chandonia, J.-M., Fox, N. K. & Brenner, S. E. SCOPe: manual curation and artifact removal in the structural classification of proteins—extended database. J. Mol. Biol. 429, 348–355 (2017).
Lam, S. D. et al. Gene3D: expanding the utility of domain assignments. Nucleic Acids Res. 44, D404–D409 (2015).
Yu, L. et al. Grammar of protein domain architectures. Proc. Natl Acad. Sci. USA 116, 3636–3645 (2019).
Chothia, C. & Lesk, A. M. The relation between the divergence of sequence and structure in proteins. EMBO J. 5, 823–826 (1986).
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D. Biol. Crystallogr. 66, 486–501 (2010).
DiMaio, F. et al. Atomic-accuracy models from 4.5-Å cryo-electron microscopy data with density-guided iterative local refinement. Nat. Methods 12, 361–365 (2015).
Zhang, C. et al. Functions of essential genes and a scale-free protein interaction network revealed by structure-based function and interaction prediction for a minimal genome. J. Proteome Res. 20, 1178–1189 (2021).
Zhang, C., Wei, X., Omenn, G. S. & Zhang, Y. Structure and protein interaction-based gene ontology annotations reveal likely functions of uncharacterized proteins on human chromosome 17. J. Proteome Res. 17, 4186–4196 (2018).
Zhang, C., Lane, L., Omenn, G. S. & Zhang, Y. Blinded testing of function annotation for uPE1 proteins by I-TASSER/COFACTOR pipeline using the 2018–2019 additions to neXtProt and the CAFA3 challenge. J. Proteome Res. 18, 4154–4166 (2019).
Iyer, S., Subramanian, V. & Acharya, K. R. C9orf72, a protein associated with amyotrophic lateral sclerosis (ALS) is a guanine nucleotide exchange factor. PeerJ 6, e5815 (2018).
Skotnicová, P. et al. The cyanobacterial protoporphyrinogen oxidase HemJ is a new b-type heme protein functionally coupled with coproporphyrinogen III oxidase. J. Biol. Chem. 293, 12394–12404 (2018).
Hanson, R. M., Prilusky, J., Renjian, Z., Nakane, T. & Sussman, J. L. JSmol and the next‐generation web‐based representation of 3D molecular structure as applied to proteopedia. Isr. J. Chem. 53, 207–216 (2013).
Hiranuma, N. et al. Improved protein structure refinement guided by deep learning based accuracy estimation. Nat. Commun. 12, 1–11 (2021).
Guo, S.-S., Liu, J., Zhou, X. & Zhang, G. DeepUMQA: ultrafast shape recognition-based protein model quality assessment using deep learning. Bioinformatics 38, 1895–1903 (2022).
Xu, J. & Zhang, Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics 26, 889–895 (2010).
Ellson, J., Gansner, E.R., Koutsofios, E., North, S.C. & Woodhull, G. in Graph Drawing Software 127–148 (Springer, 2004).
Towns, J. et al. XSEDE: acceleratingscientific discovery. Comput. Sci. Eng. 16, 62–74 (2014).
Xu, J. Distance-based protein folding powered by deep learning. Proc. Natl Acad. Sci. USA 116, 16856–16865 (2019).
Källberg, M. et al. Template-based protein structure modeling using the RaptorX web server. Nat. Protoc. 7, 1511–1522 (2012).
Du, Z. et al. The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc. 16, 5634–5651 (2021).
Lobley, A., Sadowski, M. I. & Jones, D. T. pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination. Bioinformatics 25, 1761–1767 (2009).
Zimmermann, L. et al. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J. Mol. Biol. 430, 2237–2243 (2018).
Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 10, 845–858 (2015).
Acknowledgements
This work is supported in part by the National Institute of General Medical Sciences (GM136422 and S10OD026825 to Y.Z.), the National Institute of Allergy and Infectious Diseases (AI134678 to Y.Z.), the National Science Foundation (IIS1901191 and DBI2030790 to Y.Z.), the National Nature Science Foundation of China (62173304 and 61773346 to G.Z.), the ‘New Generation Artificial Intelligence’ major project of Science and Technology Innovation 2030 of the Ministry of Science and Technology of China (2021ZD0150100 to G.Z.) and the Key Project of Zhejiang Provincial Natural Science Foundation of China (LZ20F030002 to G.Z.). This work used the Extreme Science and Engineering Discovery Environment (XSEDE)102, which is supported by the National Science Foundation (ACI1548562).
Author information
Authors and Affiliations
Contributions
Y.Z. conceived and designed the project. X.Z. developed the pipeline and performed the test. W.Z. developed the method for domain boundaries prediction. Y.L. developed the method for contacts and distances prediction. C.Z. developed the method for protein function prediction. Y.Z., W.Z., Y.L., C.Z. and R.P. developed the method for individual domain modeling. X.Z. developed the method for multi-domain protein structure assembly. X.Z. and E.B. tested the server. G.Z. helped supervise the research. X.Z. and Y.Z. wrote the manuscript, and all authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Protocols thanks Ruben Sánchez-García, Beat R. Vogeli and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Zhou, X. et al. Proc. Natl Acad. Sci. USA 116, 15930–15938 (2019): https://doi.org/10.1073/pnas.1905068116
Zhang, C. et al. Nucleic Acids Res. 45, W291–299 (2017): https://doi.org/10.1093/nar/gkx366
Zheng, W. et al. Cell Rep. Methods 1, 100014 (2021): https://doi.org/10.1016/j.crmeth.2021.100014
Hermes, C. et al. Nat. Commun. 12, 144 (2021): https://doi.org/10.1038/s41467-020-20418-3
Supplementary information
Supplementary Information
Supplementary Figs. 1–19, Tables 1 and 2, Notes 1–6, Equations 1–25 and References.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, X., Zheng, W., Li, Y. et al. I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 17, 2326–2353 (2022). https://doi.org/10.1038/s41596-022-00728-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-022-00728-0
This article is cited by
-
Bioinformatics approach for structure modeling, vaccine design, and molecular docking of Brucella candidate proteins BvrR, OMP25, and OMP31
Scientific Reports (2024)
-
Identification and prioritisation of potential vaccine candidates using subtractive proteomics and designing of a multi-epitope vaccine against Wuchereria bancrofti
Scientific Reports (2024)
-
SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues
Communications Biology (2024)
-
SARS-CoV-2 variant with the spike protein mutation F306L in the southern border provinces of Thailand
Scientific Reports (2024)
-
Investigation into in silico and in vitro approaches for inhibitors targeting MCM10 in Leishmania donovani: a comprehensive study
Molecular Diversity (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.